2 回答

TA贡献1836条经验 获得超5个赞
使用pd.merge一个额外cumcounted列:
u = df2.assign(cnt=df2.groupby('index').cumcount())
v = df.assign(cnt=df.groupby('index').cumcount())
u.merge(v, on=['index', 'cnt'], how='left').drop('cnt', 1)
index d a b c
0 1 zxc asd dsa sad
1 1 cxz NaN NaN NaN
2 2 xzc fgh hgf gfh
3 3 zxc qwe ewq wqe
4 3 xcz NaN NaN NaN
细节
我们为“索引”中的重复值引入了累积计数。
u = df2.assign(cnt=df2.groupby('index').cumcount())
u
index d cnt
0 1 zxc 0
1 1 cxz 1
2 2 xzc 0
3 3 zxc 0
4 3 xcz 1
v = df.assign(cnt=df.groupby('index').cumcount())
v
index a b c cnt
0 1 asd dsa sad 0
1 2 fgh hgf gfh 0
2 3 qwe ewq wqe 0
然后我们u在“index”和“cnt”上强制执行 LEFT JOIN wrt 。这样,在结果中引入了 NaN:
u.merge(v, on=['index', 'cnt'], how='left')
index d cnt a b c
0 1 zxc 0 asd dsa sad
1 1 cxz 1 NaN NaN NaN
2 2 xzc 0 fgh hgf gfh
3 3 zxc 0 qwe ewq wqe
4 3 xcz 1 NaN NaN NaN
最后一步是删除临时“cnt”列。

TA贡献2051条经验 获得超10个赞
merge与mask和一起使用duplicated:
df = df2.merge(df1)
cols = ['index','a','b','c']
df[['a','b','c']] = df[cols].mask(df[cols].duplicated())[['a','b','c']]
print(df)
index d a b c
0 1 zxc asd dsa sad
1 1 cxz NaN NaN NaN
2 2 xzc fgh hgf gfh
3 3 zxc qwe ewq wqe
4 3 xcz NaN NaN NaN
添加回答
举报