我有一个数据框,其中有一列包含一个以逗号分隔的项目的字符串。col1apple, banana, kiwiapple, bananabanana我想制作第二列“col2”来显示每一行之间的差异。所以我试图将每一行变成一个集合,并从前一行中减去它,如下所示:Python comparing two strings to differencesdf['col2'] = set(df["col1"].shift(1)) - set(df["col1"])但是我收到此错误消息:“ValueError:值的长度与索引的长度不匹配”。我做错了什么,有没有更好的方法来做我正在做的事情?编辑:预期输出col1 col2apple, banana, kiwi apple, banana kiwibanana apple
1 回答

喵喔喔
TA贡献1735条经验 获得超5个赞
df["temp"] = df.col1.str.replace("\s+", "").str.split(",")
为列赋值difference:
df['difference'] = [ ""
if isinstance(last, float) or (not set(last).difference(first))
else tuple(set(last).difference(first))
if len(set(last).difference(first)) > 1
else min(set(last).difference(first))
for first, last in zip(df.temp, df["temp"].shift())
]
df.drop('temp', axis=1)
col1 difference
0 apple, banana, kiwi
1 apple, banana kiwi
2 banana apple
添加回答
举报
0/150
提交
取消