2 回答
TA贡献1866条经验 获得超5个赞
difflib.get_close_matches() 至少会帮助清理您的代码,并且可能会运行得更快。
import difflib
p_names = ['BLUEAPPLE', 'GREENBUTTON20', '400100DUCK20']
i_names = ['BLUEAPPLE', 'GREENBUTTON', '100DUCK']
for p in p_names:
print(difflib.get_close_matches(p, i_names))
>>>
['BLUEAPPLE']
['GREENBUTTON']
['100DUCK']
>>>
仍然会进行很多比较,它必须将 p_names 中的每个字符串与 i_names 中的每个字符串匹配。
类似于您使用正则表达式查找匹配项的方法:
import re
for p in p_names:
for i in i_names:
if re.search(i, p):
print(i)
# stop looking
break
TA贡献1796条经验 获得超10个赞
试试这个:
def remove_nums(product):
if re.search('\d', product):
for item in item_nums_list:
if item in product:
return item
return re.sub('(\d+)', '', product)
else:
return product
另外,请确保您使用的是普通的 python 解释器。IPython 和其他具有调试功能的解释器比常规解释器慢很多。
不过,您可能要考虑先进行一些设置操作。这是一个小例子:
product_set = set(product_list)
item_number_set = set(item_number_list)
# these are the ones that match straight away
product_matches = product_set & item_number_set
# now we can search through the substrings of ones that don't match
non_matches = product_set - item_number_set
for product in non_matches:
for item_number in item_number_set:
if item_number in product:
product_matches.add(product)
break
# product_matches is now a set of all unique codes contained in both lists by "fuzzy match"
print(product_matches)
您可能会丢失它们出现的顺序,但也许您可以找到一种方法来修改它以供您使用。
添加回答
举报