一次性任意数量的不同groupby级别

有没有办法使用一些预先构建的 Pandas 函数一次性计算任意数量的不同 groupby 级别？下面是一个包含两列的简单示例。import pandas as pddf1 = pd.DataFrame( { "name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"], "city" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"], "dollars":[1, 1, 1, 1, 1, 1] })group1 = df1.groupby("city").dollars.sum().reset_index()group1['name']='All'group2 = df1.groupby("name").dollars.sum().reset_index()group2['city']='All'group3 = df1.groupby(["name", "city"]).dollars.sum().reset_index()total = df1.dollars.sum()total_df=pd.DataFrame({ "name" : ["All"], "city" : ["All"], "dollars": [total] })all_groups = group3.append([group1, group2, total_df], sort=False) name city dollars0 Alice Seattle 11 Bob Seattle 22 Mallory Portland 23 Mallory Seattle 10 All Portland 21 All Seattle 40 Alice All 11 Bob All 22 Mallory All 30 All All 6所以我带了本。T 示例并将其从 sum() 重建为 agg()。对我来说，下一步是构建一个选项来传递特定的 groupby 组合列表，以防不需要所有组合。from itertools import combinationsimport pandas as pddf1 = pd.DataFrame( { "name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"], "city" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"], "dollars":[1, 2, 6, 5, 3, 4], "qty":[2, 3, 4, 1, 5, 6] , "id":[1, 1, 2, 2, 3, 3] })col_gr = ['name', 'city']agg_func={'dollars': ['sum', 'max', 'count'], 'qty': ['sum'], "id":['nunique']}def multi_groupby(in_df, col_gr, agg_func, all_value="ALL"): tmp1 = pd.DataFrame({**{col: all_value for col in col_gr}}, index=[0]) tmp2 = in_df.agg(agg_func)\ .unstack()\ .to_frame()\ .transpose()\ .dropna(axis=1) tmp2.columns = ['_'.join(col).strip() for col in tmp2.columns.values] total = tmp1.join(tmp2)

查看完整描述

2 回答

慕沐林林

TA贡献2016条经验获得超9个赞

假设您正在寻找一种在中创建所有组合的通用方法groupby，您可以使用itertools.combinations：

from itertools import combinations

col_gr = ['name', 'city']

col_sum = ['dollars']

all_groups = pd.concat( [ df1.groupby(by=list(cols))[col_sum].sum().reset_index()\

.assign(**{col:'all' for col in col_gr if col not in cols})

for r in range(len(col_gr), 0, -1) for cols in combinations(col_gr, r) ]

+ [ pd.DataFrame({**{col:'all' for col in col_gr},

**{col: df1[col].sum() for col in col_sum},}, index=[0])],

axis=0, ignore_index=True)

print (all_groups)

name city dollars

0 Alice Seattle 1

1 Bob Seattle 2

2 Mallory Portland 2

3 Mallory Seattle 1

4 Alice all 1

5 Bob all 2

6 Mallory all 3

7 all Portland 2

8 all Seattle 4

9 all all 6

反对回复 2022-07-26

潇潇雨雨

TA贡献1833条经验获得超4个赞

这也是我一直在寻找的东西。以下是其他人编写的两种方法的链接，它们帮助我解决了这个问题。当然也会对其他拍摄感兴趣。

反对回复 2022-07-26

热搜

最近搜索清空

一次性任意数量的不同groupby级别

一次性任意数量的不同groupby级别

2 回答

添加回答