我有数据框:df = original_title title Mexico Oil Gas Summit Mexico Oil Gas Summit我必须模糊匹配这两个(original_title & title)列的实体并获得分数。下面是我的代码:compare = pd.MultiIndex.from_product([ df['original_title'],df ['title'] ]). to_series()def metrics (tup): return pd.Series([fuzz.partial_ratio(*tup),fuzz.token_sort_ratio(*tup)], ['partial', 'token'])compare.apply(metrics)上面的代码将每个原始标题与整个标题列进行比较。同时,我希望它将每个原始标题与每行中的标题进行比较。我的预期结果是:df = original_title title partial_ratio Mexico Oil Africa Oil 81 French Property Exhibition French 100 French Exhibition French Exhibition 100感谢您的帮助。谢谢
1 回答
芜湖不芜
TA贡献1796条经验 获得超7个赞
您可以按如下方式使用Dataframesapply()函数:
df['partial_ratio'] = df.apply(lambda x: fuzz.partial_ratio(x['original_title'], x['title']), axis=1)
这给出了我认为您想要的结果(尽管数字略有不同):
... partial_ratio
... 78
... 83
... 100
... 100
... 100
添加回答
举报
0/150
提交
取消