比较数据框列与条件

我有 2 个数据框，如下所示：df1：ID col1 col2 1 A1 B1 2 A2 B2 3 A3 B3 4 A4 B4 5 A5 B5 6 A6 B6 df2：col1 col2 A1 B1 A2 O5 H3 B3 A4 B4 A5 66 A6 C6 预期结果：我想根据条件生成结果 df - df1 的 col1,col2 中的每个值都应存在于 df2 的 col1,col2 值中预期结果 df：ID col1 col2 Error1 A1 B1 No mismatch with df22 A2 B2 col2 mismatch with df23 A3 B3 col1 mismatch with df24 A4 B4 No mismatch with df25 A5 B5 col2 mismatch with df26 A6 B6 col2 mismatch with df2

查看完整描述

2 回答

白板的微信

TA贡献1883条经验获得超3个赞

使用字典理解创建助手 DataFrame 并与以下内容进行比较isin：

m = pd.DataFrame({c: ~df1[c].isin(df2[c]) for c in ['col1','col2']})

print (m)

col1 col2

0 False False

1 False True

2 True False

3 False False

4 False True

5 False True

然后numpy.where使用 mask byany测试True每行至少一个，并dot使用矩阵乘法获取列名：

df1['Error'] = np.where(m.any(axis=1),

m.dot(m.columns + ', ').str.rstrip(', ') + ' mismatch with df2',

'No mismatch with df2')

print (df1)

ID col1 col2 Error

0 1 A1 B1 No mismatch with df2

1 2 A2 B2 col2 mismatch with df2

2 3 A3 B3 col1 mismatch with df2

3 4 A4 B4 No mismatch with df2

4 5 A5 B5 col2 mismatch with df2

5 6 A6 B6 col2 mismatch with df2

反对回复 2021-08-24

慕娘9325324

TA贡献1783条经验获得超4个赞

像这样的事情应该可以解决问题，但可能有更简单的方法。

diff = pd.concat([df1[col] == df2[col] for col in df1], axis=1)

def m(row):

mismatches = []

for col in diff.columns:

if not row[col]:

mismatches.append(col)

if mismatches == []:

return 'No mismatch'

return 'Mismatches: ' + ', '.join(mismatches)

df1['Error'] = diff.apply(m, axis=1)

反对回复 2021-08-24

热搜

最近搜索清空

比较数据框列与条件

比较数据框列与条件

2 回答

添加回答