1 回答

TA贡献1830条经验 获得超3个赞
您可以对当前代码稍作修改:
from pandas import DataFrame
import re
df = {'id':[11,12,13,14,15,16],
'term': ['Ford', 'EXpensive', 'TOYOTA', 'Mercedes Benz', 'electric', 'cars'],
'sentence': ['F-FORD FORD/FORD is less expensive than Mercedes Benz.' ,'toyota, hyundai mileage is good compared to ford','tesla is an electric-car','toyota too has electric cars','CARS','CArs are expensive.']
}
#Dataframe creation
df = DataFrame(df,columns= ['id','term','sentence'])
#Dictionary creation
dct = {}
l_term = list(df['term'])
l_id = list(df['id'])
for i,j in zip(l_term,l_id):
dct[str(i).upper()] = j
#Building patterns to replace
pattern = r'(?i)(?<!-)(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(df["term"],key=len,reverse=True))))
#Replace
df["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()]))
注意事项:
dict
是保留名称,不要命名变量dict
,使用dct
dct[str(i).upper()] = j
- 将大写的键添加到字典中以启用字典中的键不区分大小写的搜索df["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()]))
是主(最后)行,它使用Series.str.replace
它允许使用可调用作为替换参数,一旦模式匹配,匹配将作为 Match 对象传递给 lambda 表达式,x
其中使用检索值dct[x.group().upper()]
并使用 访问整个匹配x.group()
。
添加回答
举报