3 回答
TA贡献1887条经验 获得超5个赞
使用 Grzegorz Skibinski 的设置
df = pd.DataFrame({
"review_trimmed": [
"dog and cat",
"Cat chases mouse",
"horrible thing",
"noodle soup",
"chilli",
"pizza is Good"
]
})
searchfor = "yes cat Dog soup good bad horrible".split()
df
review_trimmed
0 dog and cat
1 Cat chases mouse
2 horrible thing
3 noodle soup
4 chilli
5 pizza is Good
_______________________________________________________
解决方案 ( pandas.Series.str.findall)
用于'|'.join将搜索到的所有项目组合成一个正则表达式字符串,以搜索任何项目。
使用flag=2这意味着IGNORECASE
df.review_trimmed.str.findall('|'.join(searchfor), 2)
0 [dog, cat]
1 [Cat]
2 [horrible]
3 [soup]
4 []
5 [Good]
Name: review_trimmed, dtype: object
我们可以join这样';':
df.review_trimmed.str.findall('|'.join(searchfor), 2).str.join(';')
0 dog;cat
1 Cat
2 horrible
3 soup
4
5 Good
Name: review_trimmed, dtype: object
TA贡献1871条经验 获得超8个赞
使用numpy:
searchfor=[wrd.lower() for wrd in searchfor]
searchfor=set(searchfor)
df["found"]=np.bitwise_and(df["review_trimmed"].str.lower().str.split("[^\w+]").map(set), searchfor)
为了显示输出,我使用了虚拟数据:
import pandas as pd
import numpy as np
df=pd.DataFrame({"review_trimmed": ["dog and cat", "Cat chases mouse", "horrible thing", "noodle soup", "chilli", "pizza is Good"]})
searchfor="yes cat Dog soup good bad horrible".split(" ")
searchfor=[wrd.lower() for wrd in searchfor]
searchfor=set(searchfor)
df["found"]=np.bitwise_and(df["review_trimmed"].str.lower().str.split("[^\w+]").map(set), searchfor)
print(searchfor)
print(df)
输出:
#searchfor:
{'cat', 'good', 'yes', 'dog', 'bad', 'horrible', 'soup'}
#df:
review_trimmed found
0 dog and cat {cat, dog}
1 Cat chases mouse {cat}
2 horrible thing {horrible}
3 noodle soup {soup}
4 chilli {}
5 pizza is Good {good}
编辑
IIUC - 只需添加.str.join(";")
searchfor=[wrd.lower() for wrd in searchfor]
searchfor=set(searchfor)
df["found"]=np.bitwise_and(df["review_trimmed"].str.lower().str.split("[^\w+]").map(set), searchfor).str.join(";")
print(searchfor)
print(df)
输出:
{'dog', 'soup', 'cat', 'bad', 'good', 'yes', 'horrible'}
review_trimmed found
0 dog and cat dog;cat
1 Cat chases mouse cat
2 horrible thing horrible
3 noodle soup soup
4 chilli
5 pizza is Good good
TA贡献1802条经验 获得超5个赞
我通过for循环尝试了这个,
import pandas as pd
words_to_look=['Yes','No']
sentences=['He knows Yes No Yes','No He dont know','He Know' ]
df=pd.DataFrame(sentences,columns=['Comments_to_look'])
string=""
final_list=[]
for item in df['Comments_to_look']:
items=set(item.split())
for item2 in items:
for item3 in words_to_look:
if item2==item3:
string=item3+" "+string
break
final_list.append(string)
string=""
df['words occured']=final_list
print(df)
输出
Comments_to_look words occured
0 He knows Yes No Yes Yes No
1 No He dont know No
2 He Know
添加回答
举报