如何从短语列表中查找字典中的短语，并使用找到的短语和计数创建数据框。应计算重复项

phrases = ['i am good', 'going to the market', 'eating cookies']dictionary = {'http://www.firsturl.com': 'i am going to the market and tomorrow will be eating cookies', 'http://www.secondurl.com': 'tomorrow is my birthday and i shall be', 'http://www.thirdurl.com': 'i am good and will go to sleep'}如果至少有一个匹配项：预期输出：url phrasecount phrasehttp://www.firsturl.com 2 going to the market, eating cookieshttp://www.thirdurl.com 1 i am good如果所有 3 个 url 都没有匹配项，则只返回第一次出现的 url，计数为零，预期输出为空白短语：url phrasecount phrasehttp://www.firsturl.com 0

查看完整描述

1 回答

拉风的咖菲猫

TA贡献1995条经验获得超2个赞

df从相应的设置初始数据帧dictionary：

df = pd.DataFrame({'urls': list(dictionary.keys()), 'strings': list(dictionary.values())})

pattern = '|'.join(phrases)

处理数据帧：

s = df.pop('strings').str.findall(pattern)

df = df.assign(phrasecount=s.str.len(), phrase=s.map(', '.join))

df = df.drop_duplicates(subset='phrasecount') if df['phrasecount'].eq(0).all() else df[df['phrasecount'].ne(0)]

结果：

# print(df)

urls phrasecount phrase

0 http://www.firsturl.com 2 going to the market, eating cookies

2 http://www.thirdurl.com 1 i am good

反对回复 2022-12-20

热搜

最近搜索清空

如何从短语列表中查找字典中的短语，并使用找到的短语和计数创建数据框。应计算重复项

如何从短语列表中查找字典中的短语，并使用找到的短语和计数创建数据框。应计算重复项

1 回答

添加回答