为了账号安全,请及时绑定邮箱和手机立即绑定

使用复合键的多列 Python 聚合

使用复合键的多列 Python 聚合

大话西游666 2022-04-27 13:16:21
我想知道如何使用一个键聚合多个列。我有用于聚合单个列的工作代码,但我想将其扩展到多个列。下面是一些示例数据。实际求和意义不大,数据是为了说明问题。下面的代码在 Tm、Lg、Pos 上创建一个键并总结 PTS。我想总结 PTS 和 G 为同一个键。我可以在熊猫中轻松做到这一点,但想使用 Python 而不是熊猫。$ cat test-file.csvSeason,Age,Tm,Lg,Pos,G,FGA,PTS2003-04,22,MIA,NBA,PG,61,13.1,16.22004-05,23,MIA,NBA,SG,77,17.1,24.12005-06,24,MIA,NBA,SG,75,18.8,27.22006-07,25,MIA,NBA,SG,51,18.9,27.42007-08,26,MIA,NBA,SG,51,18.4,24.62008-09,27,MIA,NBA,SG,79,22.0,30.22009-10,28,MIA,NBA,SG,77,19.6,26.62010-11,29,MIA,NBA,SG,76,18.2,25.52011-12,30,MIA,NBA,SG,49,17.1,22.12012-13,31,MIA,NBA,SG,69,15.8,21.22013-14,32,MIA,NBA,SG,54,14.1,19.02014-15,33,MIA,NBA,SG,62,17.5,21.52015-16,34,MIA,NBA,SG,74,16.0,19.02016-17,35,CHI,NBA,SG,60,15.9,18.32017-18,36,CLE,NBA,SG,46,9.5,11.22017-18,36,MIA,NBA,SG,21,11.8,12.02018-19,37,MIA,NBA,SG,72,13.3,15.0import csvimport refrom collections import namedtupletotals = {}with open ('/home/test-file.csv', 'r') as input_file:    reader = csv.reader(input_file, delimiter=',')    header = next(reader)    record = namedtuple('record', header)    for rec in (record._make(row) for row in reader):        totals[rec.Tm, rec.Lg, rec.Pos] = \            (totals.get((rec.Tm, rec.Lg, rec.Pos), 0.0) + \            float(rec.PTS))    for key, value in sorted(totals.items()):        row = list(key) + [value]        print(row)['CHI', 'NBA', 'SG', 18.3]['CLE', 'NBA', 'SG', 11.2]['MIA', 'NBA', 'PG', 16.2]['MIA', 'NBA', 'SG', 315.4]我正在寻找如下输出,即两个聚合列。['CHI', 'NBA', 'SG', 60, 18.3]['CLE', 'NBA', 'SG', 46, 11.2]['MIA', 'NBA', 'PG', 61, 16.2]['MIA', 'NBA', 'SG', 887, 315.4]编辑:错字,“总和”到“总和不”。
查看完整描述

1 回答

?
撒科打诨

TA贡献1934条经验 获得超2个赞

正如@BlueSheepToken 所建议的那样,来自 itertools 的 group by 是您的朋友。其他 python 本机和高性能解决方案在其中一个funcy或toolz包中实现。这里有一个解决方案toolz


import csv

from operator import itemgetter

import toolz

import toolz.curried


def stream_file(fp):

    with open(fp) as file:

        for line in csv.DictReader(file):

            res = dict(line)

            res['G'] = float(res['G'])

            res['PTS'] = float(res['PTS'])

            yield res


# groups from stream

groups = toolz.groupby(['Tm', 'Lg', 'Pos'], stream_file('test_file.csv'))


# aggregation functions: get some value from list, then sum it up

pts_counter = toolz.compose_left(toolz.curried.map(itemgetter('PTS')), sum)

g_counter = toolz.compose_left(toolz.curried.map(itemgetter('G')), sum)


# apply both functions to the input

aggregations = toolz.juxt(pts_counter, g_counter)


# for each group's value compute aggregations 

toolz.valmap(aggregations, groups)

输出:


{('CHI', 'NBA', 'SG'): (18.3, 60.0),

 ('CLE', 'NBA', 'SG'): (11.2, 46.0),

 ('MIA', 'NBA', 'PG'): (16.2, 61.0),

 ('MIA', 'NBA', 'SG'): (315.4, 887.0)}


查看完整回答
反对 回复 2022-04-27
  • 1 回答
  • 0 关注
  • 110 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信