3 回答
TA贡献1796条经验 获得超4个赞
您可以尝试这个示例来加快速度:
df1 = pd.DataFrame({'Word':['Introduction', 'database', 'country', 'search']})
df2 = pd.DataFrame({'Text':['Introduction to python', 'sql is a database', 'Introduction to python in our country', 'search for a python teacher in our country']})
tmp = pd.DataFrame(df2['Text'].str.split().explode()).set_index('Text').assign(c=1)
tmp = tmp.groupby(tmp.index)['c'].sum()
print( df1.merge(tmp, left_on='Word', right_on=tmp.index) )
印刷:
Word c
0 Introduction 2
1 database 1
2 country 2
3 search 1
TA贡献1890条经验 获得超9个赞
Series.str.split
与Series.explode
for 系列单词一起使用:
s = df2['Text'].str.split().explode()
#oldier pandas versions
#s = df2['Text'].str.split(expand=True).stack()
然后仅按Series.isin
和过滤匹配的值boolean indexing
,按Series.value_counts
和 最后一次使用进行计数DataFrame.join
:
df1 = df1.join(s[s.isin(df1['Word'])].value_counts().rename('Count'), on='Word')
print (df1)
Word Count
0 Introduction 2
1 database 1
2 country 2
3 search 1
TA贡献1848条经验 获得超6个赞
这是简单的解决方案
world_count = pd.DataFrame(
{'words': Word['Word'].tolist(),
'count': [Text['Text'].str.contains(w).sum() for w in words],
}).rename_axis('ID')
输出:
world_count.head()
'''
words count
ID
0 Introduction 2
1 database 1
2 country 2
3 search 1
'''
逐步解决方案:
# Convert column to list
words = Word['Word'].tolist()
# Get the count
count = [Text['Text'].str.contains(w).sum() for w in words]
world_count = pd.DataFrame(
{'words': words,
'count': count,
}).rename_axis('ID')
提示:
我建议您转换为小写,这样您就不会因为大/小写而错过任何计数
import re
import pandas as pd
world_count = pd.DataFrame(
{'words': Word['Word'].str.lower().str.strip().tolist(),
'count': [Text['Text'].str.contains(w,flags=re.IGNORECASE, regex=True).sum() for w in words],
}).rename_axis('ID')
添加回答
举报