是否有可能与蟒蛇熊猫进行模糊匹配?我有两个DataFrame,我想根据一个列合并它们。然而,由于交替拼写,不同的空格数目,没有/存在的指示符号,我希望能够合并,只要它们是相似的另一个。任何相似算法都可以(Soundex,Levenshtein,Difflib)。假设一个DataFrame具有以下数据:df1 = DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number']) numberone 1two 2three 3four 4five 5df2 = DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter']) letterone atoo bthree cfours dfive e然后我想要得到最终的DataFrame number letterone 1 atwo 2 bthree 3 cfour 4 dfive 5 e
3 回答

鸿蒙传说
TA贡献1865条经验 获得超7个赞
difflib
get_close_matches
df2
join
:
In [23]: import difflib In [24]: difflib.get_close_matchesOut[24]: <function difflib.get_close_matches>In [25]: df2.index = df2.index. map(lambda x: difflib.get_close_matches(x, df1.index)[0])In [26]: df2Out[26]: letter one a two b three c four d five eIn [31]: df1.join(df2)Out[31]: number letter one 1 a two 2 b three 3 c four 4 d five 5 e
merge
:
df1 = DataFrame([[1,'one'],[2,'two'],[3,'three'],[4,'four'],[5,'five']], columns=['number', 'name'])
df2 = DataFrame([['a','one'],['b','too'],['c','three'],['d','fours'],['e','five']], columns=['letter', 'name'])
df2['name'] = df2['name'].apply(lambda x: difflib.get_close_matches(x, df1['name'])[0])
df1.merge(df2)

翻过高山走不出你
TA贡献1875条经验 获得超3个赞
def get_closest_match(x, list_strings): best_match = None highest_jw = 0 for current_string in list_strings: current_score = jellyfish.jaro_winkler(x, current_string) if(current_score > highest_jw): highest_jw = current_score best_match = current_string return best_match df1 = pandas.DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number'])df2 = pandas.DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter'])df2. index = df2.index.map(lambda x: get_closest_match(x, df1.index))df1.join(df2)
number letter one 1 a two 2 b three 3 c four 4 d five 5 e
添加回答
举报
0/150
提交
取消