试图通过 python 和 pandas 比较 2 个 excel 文件

背景：我有 2 个文件，匹配的列很少，它们之间有一个共同的列用于比较。例子：Table 1 , about 10K rows| col1 | col2 | col3 ||------|------|------|| adam | key1 | def || mike | key2 | efg |Table 2 , about 5k rows| col1 | col2 | col3 | col4 ||------|------|------|------|| adam | key1 | def | abc || mike | key2 | | cdf |现在我正在尝试从第一个文件中获取数据，将其与第二个文件进行比较，如果不同，则创建一个新文件以从第一个文件中获取完整的行。所以在上面的例子中，以第一列作为比较列，对于 adam，col1、2、3 是相同的，所以不需要更新记录。对于 mike，col2 值将是相同的，但 col3 具有不同的值，因此它将成为更新文件的一部分。我正在使用 python，这目前太基础了，我似乎找不到更有效的方法df_1 = pd.read_excel("file1.xlsx")df_2 = pd.read_excel("file2.xlsx")df_1 = df_1.fillna('')df_2 = df_2.fillna('')df_3 = pd.DataFrame()for i in range(0, len(df_1.index)): for j in range(0,len(df_2.index)): if(df_1[col1].iloc[i] == df_2[col1].iloc[j]): flag = 0 if(df_1[col2].iloc[i] != df_2[col2].iloc[j]): flag = 1 if(df_1[col3].iloc[i] != df_2[col3].iloc[j]): flag = 1 if(flag): df_3 = df_3.append(df_1.iloc[[i]]) break;writer = ExcelWriter("update.xlsx")df_3.to_excel(writer,index=False,header=True)我尝试了一些类似下面的变体，尝试使用 col1 值在公共行上使用匹配行，然后检查其他两列值是否相同或不同并返回，但它不返回所有必需的数据df_3 = df_1[( df_1['col1'].isin(df_2['col1']) & (~df_1['col2'].isin(df_2['col2']) | ~df_1['col3'].isin(df_2['col3']) ) )]输出： col1 col2 col30 mike key2 efg

查看完整描述

1 回答

12345678_0001

TA贡献1802条经验获得超5个赞

IIUC，使用pandas.DataFrame.update：

鉴于：

# df1

col1 col2 col3

0 adam key1 def

1 mike key2 efg

# df2

col1 col2 col3 col4

0 adam key1 def abc

1 mike key2 cdf

new_df = df1.set_index('col1')

new_df.update(df2.set_index('col1'))

new_df.reset_index(inplace=True)

print(new_df)

输出：

col1 col2 col3

0 adam key1 def

1 mike key2

要获取更新的行（处于其原始状态）：

df1[(new_df != df1).any(1)]

输出：

col1 col2 col3

1 mike key2 efg

反对回复 2022-06-02

热搜

最近搜索清空

试图通过 python 和 pandas 比较 2 个 excel 文件

试图通过 python 和 pandas 比较 2 个 excel 文件

1 回答

添加回答