1 回答
TA贡献1805条经验 获得超10个赞
您可以使用nltk(df作为您共享的输入数据框):
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer()
df["Stem"] = df["Word"].apply(ps.stem)
res = df.groupby("Stem")["Frequency"].sum()
输出(对于您分享的作品):
Stem
10 6309
bad 5331
cat 4244
charact 16926
dog 17054
end 8406
feel 4833
game 52055
gameplay 6195
good 6496
graphic 4372
great 3466
kill 12279
laura 24953
like 12792
love 3059
luke 21133
never 2965
new 2963
peopl 7933
play 8420
reveng 5922
stori 20739
time 4272
Name: Frequency, dtype: int64
添加回答
举报