Groupby 和仅选定的列

在这里我读了一个文件“userdata.xlsx”：ID Debt Email Age User1 7.5 john@email.com 16 John2 15 john@email.com 15 John3 22 john@email.com 15 John4 30 david@email.com 22 David5 33 david@email.com 22 David6 51 fred@email.com 61 Fred7 11 fred@email.com 25 Fred8 24 eric@email.com 19 Eric9 68 terry@email.com 55 Terry10 335 terry@email.com 55 Terry在这里，我按用户分组并为每个用户创建一个电子表格并将其输出为自己的 .xlsx 文件，如下所示：ID Debt Email Age User1 7.5 john@email.com 16 John2 15 john@email.com 15 John这是整个代码： #!/usr/bin/env python3 import pandas as pd import numpy as np import matplotlib.pyplot as plt import xlrd df = pd.read_excel('userdata.xlsx') grp = df.groupby('User') for group in grp.groups: grouptofile = (grp.get_group(group)) print(grouptofile) print(group) grouptofile.to_excel('%s.xlsx' % group , sheet_name='sheet1', index=False)现在我只想保存选定的列来为每个用户保存。假设我只希望选择“ID”和“电子邮件”列。我学会了如何只选择某些列，如下所示：selected = df[['ID','Email']]我现在认为在这里添加 ID 和电子邮件是有意义的。grp = df.groupby('User')添加了“ID”和“电子邮件”grp = df[['ID', 'Email']].groupby('User')甚至可以组合 groupby 和 select 列吗？#!/usr/bin/env python3 import pandas as pd import numpy as np import matplotlib.pyplot as plt import xlrd df = pd.read_excel('userdata.xlsx') grp = df[['ID', 'Email']].groupby('User') for group in grp.groups: grouptofile = (grp.get_group(group)) print(grouptofile) print(group) grouptofile.to_excel('%s.xlsx' % group , sheet_name='sheet1', index=False)

查看完整描述

2 回答

不负相思意

TA贡献1777条经验获得超10个赞

我认为您需要在子集中指定列：

cols = ['ID', 'Email']

for i, group in df.groupby('User'):

group[cols].to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)

如果得到KeyError: 'User'它意味着你想要选择不存在的列。

因此，如果选择列ID和Email，则链接的 groupby 找不到User列并引发错误：

print (df[['ID', 'Email']])

ID Email

0 1 john@email.com

1 2 john@email.com

2 3 john@email.com

3 4 david@email.com

4 5 david@email.com

5 6 fred@email.com

6 7 fred@email.com

7 8 eric@email.com

8 9 terry@email.com

9 10 terry@email.com

所以有必要选择列也在 groupby 中使用：

for i, group in df[['ID', 'Email', 'User']].groupby('User'):

group.to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)

或者在写入文件之前选择列，就像在第一个解决方案中一样。

for i, group in df[['ID', 'Email', 'User']].groupby('User'):

group[cols].to_excel('{}.xlsx'.format(i), sheet_name='sheet1', index=False)

反对回复 2021-09-11

MMMHUHU

TA贡献1834条经验获得超8个赞

这是可能的......但不是你这样做的方式。

您正在有效地删除除两列之外的所有列，然后尝试按不再存在的第三列进行分组。相反，您需要在选择列之前进行分组（尽管我不知道分组是否numpy是一个变异操作，因此您可能需要先进行复制）。

（可能次优）示例：

grp = df[('ID', 'Email', 'User')].groupby('User')[('ID', 'Email')]

反对回复 2021-09-11

热搜

最近搜索清空

Groupby 和仅选定的列

Groupby 和仅选定的列

2 回答

添加回答