如何在CSV文件中对多个列进行分组和求和？

Python

RISEBY 2022-08-02 18:16:07

我对python和pandas仍然很陌生，目前正在尝试在CSV文件中获取多个列的总和。我有一个CSV文件，其中包含要求和的列，，：unitCountorderCountinvoiceCount date id name unitCount orderCount invoiceCount 2020-02-12 1 Guitar 200 100 200 2020-02-12 2 Drums 300 200 100 2020-02-12 3 Piano 400 700 300 2020-02-11 1 Guitar 100 500 300 2020-02-11 2 Drums 200 400 400 2020-02-11 3 Piano 300 300 100我想要的输出将是一个CSV文件，其中包含最后3列的总和（分组为），并且仅链接到最晚的日期：ID date id name total_unitCount total_orderCount total_invoiceCount 2020-02-12 1 Guitar 300 600 500 2020-02-12 2 Drums 500 600 500 2020-02-12 3 Piano 700 1000 400有人能帮忙吗？到目前为止，我正在尝试以下方法，但它对我不起作用。是否可以添加到以下代码的第一行？还是我一开始就完全错了？谢谢！groupbydf = pd.read_csv(r'path/to/myfile.csv', sep=';').sum()df.to_csv(r'path/to/myfile_sum.csv')

查看完整描述

3 回答

慕雪6442864

TA贡献1812条经验获得超5个赞

你可以做一些手动：agg

(df.groupby('id', as_index=False)

.agg({'date':'max', 'name':'first',

'unitCount':'sum',

'orderCount':'sum',

'invoiceCount':'sum'})

.to_csv('file.csv')

)

反对回复 2022-08-02

Helenr

TA贡献1780条经验获得超4个赞

您可以执行以下操作

# group rows by 'id' column

df.groupby('id', as_index=False).agg({'date':'max',

'name':'first',

'unitCount':'sum',

'orderCount':'sum',

'invoiceCount':'sum'}

# change the order of the columns

df = df[['date', 'id', 'name', 'unitCount', 'orderCount' ,'invoiceCount']]

# set the new column names

df.columns=['date', 'id', 'name', 'total_unitCount', 'total_orderCount' ,'total_invoiceCount']

# save the dataframe as .csv file

df.to_csv('path/to/myfile_sum.csv')

反对回复 2022-08-02

浮云间

TA贡献1829条经验获得超4个赞

您只需要调用对象，然后相应地重命名列名，最后将生成的数据帧写入文件。sum()groupbycsv

以下操作应该可以解决问题：

df = pd.read_csv(r'path/to/myfile.csv', sep=';')

df.groupby(['id', 'name'])['unitCount', 'orderCount', 'invoiceCount'] \

.sum() \

.rename(columns={'unitCount':'total_unitCount', 'orderCount' : 'total_orderCount', 'invoiceCount': 'total_invoiceCount'}) \

.to_csv('path/to/myoutputfile_sum.csv', sep=';')

反对回复 2022-08-02

热搜

最近搜索清空

如何在CSV文件中对多个列进行分组和求和？

如何在CSV文件中对多个列进行分组和求和？

3 回答

添加回答