如何获得每组 X 次以上相同单词的平均值?但在这里,我想连续获得每组(group = name)相同单词超过 4 次的平均值。例子:id | name | sentences---------------------1 | aa | david hi david david david2 | aa | david david is at home3 | bb | I'm king4 | cc | where r u going5 | dd | lol lol lol lol lol lol6 | ee | abc abc cc abc abc abc abc cc7 | ee | dd dd dd ee dd dd dd我想得到以下结果:name | avg----------aa | 0.0 (0 sentence contain the words 'david' continuously 4 times in ). total instances of 'aa' group is 2bb | 0.0 (0 sentence contains same word continuously 4 times) cc | 0.0 (0 sentence contains same word continuously 4 times)dd | 1.0 (1 sentence contains same word 'lol' continuously 4 times). total instances of 'dd' group is 1ee | 0.5 (1 sentence contains same word 'abc' continuously 4 times). total instances of 'dd' group is 2I'm using python 3.6.8
1 回答
汪汪一只猫
TA贡献1898条经验 获得超8个赞
您可以4
使用以下方法对连续出现的单词或连续多次进行计数Series.str.count
,然后使用Series.groupby
对系列cnt
进行分组name
并使用聚合mean
来获得分组平均值。
cnt = df['sentences'].str.count(r'(\w+)(\s\1){3,}')
avg = cnt.groupby(df['name']).mean().reset_index(name='avg')
细节:
print(cnt)
0 0
1 0
2 0
3 0
4 1
5 1
6 0
Name: sentences, dtype: int64
print(avg)
name avg
0 aa 0.0
1 bb 0.0
2 cc 0.0
3 dd 1.0
4 ee 0.5
添加回答
举报
0/150
提交
取消