1 回答

TA贡献1946条经验 获得超3个赞
您没有使用虚拟表,而是对原始数据框进行了分组:
table_D_dummies = pd.get_dummies(data = table_D, columns = ["A_Code"])
table_D_dummies_grouped = table_D.groupby(by = ["Geo_ID"]).sum()
你想在table_D_dummies这里分组:
>>> table_D_dummies
Geo_ID A_Cost A_Code_12 A_Code_65 A_Code_98
0 1 2 1 0 0
1 1 9 1 0 0
2 1 1 1 0 0
3 1 10 0 1 0
4 2 6 0 1 0
5 3 7 0 1 0
6 4 7 0 1 0
7 4 6 0 0 1
8 5 2 0 0 1
>>> table_D_dummies.groupby(by = ["Geo_ID"]).sum()
A_Cost A_Code_12 A_Code_65 A_Code_98
Geo_ID
1 22 3 1 0
2 6 0 1 0
3 7 0 1 0
4 13 0 1 1
5 2 0 0 1
如果您需要对每个 dummy 的成本求和,请将它们添加到分组列中:
>>> table_D_dummies.groupby(by = [
... "Geo_ID",
... *(c for c in table_D_dummies.columns if c.startswith('A_Code_'))
... ]).sum()
A_Cost
Geo_ID A_Code_12 A_Code_65 A_Code_98
1 0 1 0 10
1 0 0 12
2 0 1 0 6
3 0 1 0 7
4 0 0 1 6
1 0 7
5 0 0 1 2
添加回答
举报