1 回答
TA贡献1886条经验 获得超2个赞
我认为需要:
cols = ['D','E','F','G']
#for each group transpose df and check if all duplicates
df1 = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False))
#for duplicates aggregate sum else 0
arr = np.where(df1.all(axis=1), df.groupby('A')[cols[0]].sum(), 0)
#remove unnecessary columns and add new, get first rows per column A
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
A B C D
0 13348 xyzqr 324580 5
2 45832 gberthh 258729 0
4 58712 bgrtw 984562 2
5 76493 hzrt 638495 0
6 643509 . T648501 2
如果所有值都是重复的,则检查每个组的替代解决方案:
cols = ['D','E','F','G']
m = df.groupby('A')[cols].apply(lambda x: x.T.duplicated(keep=False).all())
print (m)
A
13348 True
45832 False
dtype: bool
arr = np.where(m, df.groupby('A')[cols[0]].sum(), 0)
df = df.drop(cols, axis=1).drop_duplicates('A').assign(D=arr)
print (df)
A B C D
0 13348 xyzqr 324580 5
2 45832 gberthh 258729 0
4 58712 bgrtw 984562 2
5 76493 hzrt 638495 0
6 643509 . T648501 2
添加回答
举报