2 回答
TA贡献1871条经验 获得超8个赞
考虑以下数据(在第 3 行添加一行以便更好地理解)
data = pd.compat.StringIO("""5 Jan, 5.28, 'McDonalds', 'BankOfAmerica'
6 Jan, 8.44, 'Starbucks', 'BankOfAmerica'
5 Jan, 5.28, 'McDonalds Rest', 'BoA'
5 Jan, 5.28, 'McDonalds Rest', 'BankOfAmerica'
10 Feb, 22.72, 'Chipolte', 'Chase'""")
df = pd.read_csv(data,header=None)
df.columns=['Date','Amount','Dscription','Source']
print(df)
Date Amount Dscription Source
0 5 Jan 5.28 'McDonalds' 'BankOfAmerica'
1 6 Jan 8.44 'Starbucks' 'BankOfAmerica'
2 5 Jan 5.28 'McDonalds Rest' 'BoA'
3 5 Jan 5.28 'McDonalds Rest' 'BankOfAmerica'
4 10 Feb 22.72 'Chipolte' 'Chase'
对于重复和不同的来源:
df_dups =df[df.duplicated(['Date','Amount'],keep=False)]
df_dups =df_dups.drop_duplicates(['Date','Amount','Source'],keep=False)
print(df_dups)
Date Amount Dscription Source
0 5 Jan 5.28 'McDonalds' 'BankOfAmerica'
2 5 Jan 5.28 'McDonalds Rest' 'BoA'
对于没有重复(基本上拉所有其他行df- df_dup):
no_dups=df.loc[~df.index.isin(df_dups.index)]
print(no_dups)
Date Amount Dscription Source
1 6 Jan 8.44 'Starbucks' 'BankOfAmerica'
3 5 Jan 5.28 'McDonalds Rest' 'BankOfAmerica'
4 10 Feb 22.72 'Chipolte' 'Chase'
TA贡献1876条经验 获得超6个赞
使用存在
select t1.* from table_name t1
where exists( select 1 from table_name t2
where t2.date=t1.date and t2.amount=t1.amount and t1.source<>t2.source)
添加回答
举报