2 回答
TA贡献1797条经验 获得超4个赞
这是 Python 中另一个版本的用法set。这应该快得多,因为它使用向量操作。
df['set_Tag1'] = df['Tag1'].apply(lambda x: x.split(',')).map(set)
df['set_Tag2'] = df['Tag2'].apply(lambda x: x.split(',')).map(set)
df['diff'] = (df['set_Tag2'] - df['set_Tag1']).apply(lambda x: ','.join(x))
df['common'] = df.apply(lambda row: row['set_Tag2'] & row['set_Tag1'], axis=1).apply(lambda x: ','.join(x))
df.drop(columns=['set_Tag1', 'set_Tag2'], inplace=True)
TA贡献1815条经验 获得超10个赞
在python中可以使用set数据结构进行交集、并集、差集等运算。
您可以执行以下操作。
# A function which returns b - a, given a and b are arrays/lists
def diff(a,b):
a = a.split(',') # split string by ','
b = b.split(',')
return ','.join(list(set(b) - set(a))) # find difference and then join the result by ','
# A function which returns common elements between a and b are arrays/lists
def common(a,b):
a = a.split(',')
b = b.split(',')
return ','.join(list(set(b).intersection(set(a))))
# initialize your dataframe as you have provided in the question
df = pd.DataFrame({'ID': [1,2,3,4,5],'Tag1':["English,French",'Hindi,English','Kannada','French','German'],'Tag2':['Kannada','English,Hindi', 'Kannada,Hindi','French,English','Kannada,German']})
# add new columns for difference and common
df['common'] = [common(df['Tag1'][i],df['Tag2'][i]) for i in df.index]
df['diff'] = [diff(df['Tag1'][i],df['Tag2'][i]) for i in df.index]
结果 df 如下所示。
添加回答
举报