4 回答
TA贡献1784条经验 获得超7个赞
我会使用 numpy 广播:
i,j=np.where((df.col_1+df.col_2).values==(df.col_2+df.col_1).values[:,None])
average=0.5*(df.iloc[i].output_1.reset_index(drop=True)+\
df.iloc[j].output_1.reset_index(drop=True))
average.index=df.iloc[i].index
df['average']=average
我得到的结果如下:
col_1 col_2 output_1 average
0 a b 3 13.5
1 a c 5 3.5
2 a d 3 NaN
3 b a 24 13.5
4 b c 12 7.5
5 b d 5 NaN
6 c a 2 3.5
7 c b 3 7.5
8 c d 5 NaN
TA贡献1816条经验 获得超4个赞
尝试这个。col_12您可以删除列,也可以将其进一步用作一对唯一键(与元素顺序无关)。
print(df)
df["col_12"]=df[["col_1", "col_2"]].apply(lambda x: str(sorted(x)), axis=1)
df2=df.groupby(df["col_12"]).agg({"output_1": "mean", "col_1": "count"}).rename(columns={"output_1": "output_1_mean", "col_1": "rows_count"})
df2.loc[df2["rows_count"]==1, "output_1_mean"]/=2
df2.drop("rows_count", axis=1, inplace=True)
df=df.join(df2, on="col_12")
print(df)
并输出:
col_1 col_2 output_1
0 a b 3
1 a c 5
2 a d 3
3 b a 24
4 b c 12
5 b d 5
6 c a 2
7 c b 3
8 c d 5
col_1 col_2 output_1 col_12 output_1_mean
0 a b 3 ['a', 'b'] 13.5
1 a c 5 ['a', 'c'] 3.5
2 a d 3 ['a', 'd'] 1.5
3 b a 24 ['a', 'b'] 13.5
4 b c 12 ['b', 'c'] 7.5
5 b d 5 ['b', 'd'] 2.5
6 c a 2 ['a', 'c'] 3.5
7 c b 3 ['b', 'c'] 7.5
8 c d 5 ['c', 'd'] 2.5
[Program finished]
TA贡献1810条经验 获得超4个赞
已编辑
你可以试试
for ii in a['col_1'].unique():
p = pd.merge(a[a['col_1'] == ii], a[a['col_2'] == ii], left_on = 'col_2', right_on = 'col_1', left_index = True)
a.loc[p.index, 'mean'] = p.mean(axis = 1)
感谢@baccandr 的更正
添加回答
举报