1 回答
TA贡献2019条经验 获得超9个赞
您可以将其作为一个两阶段过程来执行。先计算一个映射系列,然后按簇映射:
s = df.query('tag == 1')\
.sort_values('amount', ascending=False)\
.drop_duplicates('cluster')\
.set_index('cluster')['name']
df['highest_name'] = df['cluster'].map(s)
print(df)
cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely
如果您想使用groupby,这是一种方法:
def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan
df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))
添加回答
举报