我想要一个基于 id 的 grouby 和 sum,但显示所有列作为结果。示例代码import pandas as pdimport numpy as npmre = [ ["2018-1", "Sold", 109000.0, "Appartement", 73.0, 4.0], ["2018-1", "Sold", 109000.0, "Appartement", "NaN", 0.0], ["2018-2", "Sold", 239300.0, "House", 163.0, 4.0], ["2018-2", "Sold", 239300.0, "House", 51.0, 2.0], ["2018-2", "Sold", 239300.0, "House", 51.0, 2.0]]df = pd.DataFrame(mre)# Rename columnsdf.columns = ["_idMutation", "typeOfSearch", "price", "typeOfBuilding", "surface", "nbRoom"]df["surface"] = df["surface"].astype(float)print(df)基础数据框 _idMutation typeOfSearch price typeOfBuilding surface nbRoom0 2018-1 Sold 109000.0 Appartement 73.0 4.01 2018-1 Sold 109000.0 Appartement NaN 0.02 2018-2 Sold 239300.0 House 163.0 4.03 2018-2 Sold 239300.0 House 51.0 2.04 2018-2 Sold 239300.0 House 51.0 2.0预期成绩是groupby基于_idMutation,它对surface和 进行求和nbRoom,但不影响其他行。我想显示所有列,删除重复项_idMutation并显示结果groupby _idMutation typeOfSearch price typeOfBuilding surface nbRoom0 2018-1 Sold 109000.0 Appartement 73.0 4.01 2018-2 Sold 239300.0 House 265.0 8.0当前代码以下解决方案产生预期结果。我有 1460 万行,而我提出的解决方案看起来并没有优化。# Groupby on _idMutation & sum ["surface", "nbRoom"]gb_df = df[["surface", "nbRoom"]].groupby(df["_idMutation"]).sum()# Delete duplicates _idMutationdf.drop_duplicates(subset=["_idMutation"], inplace=True)# Set _idMutation as df indexdf.set_index("_idMutation", inplace=True)# Concat df with gb_dfdf = pd.concat( [df[["typeOfSearch", "price", "typeOfBuilding"]], gb_df], axis=1)
1 回答
![?](http://img1.sycdn.imooc.com/5333a1920001d36402200220-100-100.jpg)
BIG阳
TA贡献1859条经验 获得超6个赞
我们可以使用GroupBy.agg字典来使用和设置每列所需的聚合方法。在这种情况下,我们只需要first和sum:
dfg = df.groupby("_idMutation", as_index=False).agg({
"typeOfSearch": "first",
"price": "first",
"typeOfBuilding": "first",
"surface": "sum",
"nbRoom": "sum"
})
_idMutation typeOfSearch price typeOfBuilding surface nbRoom
0 2018-1 Sold 109000.0 Appartement 73.0 4.0
1 2018-2 Sold 239300.0 House 265.0 8.0
添加回答
举报
0/150
提交
取消