1 回答
TA贡献1797条经验 获得超4个赞
打印(df)
reason count
0 location 35
1 recommendation 23
2 recommedation 8
3 confort 7
4 availability 4
5 reconmmendation 3
6 facilities 3
.groupby(), 部分字符串。.transform()同时找到sum
df['groupcount']=df.groupby(df.reason.str[0:4])['count'].transform('sum')
reason count groupcount
0 location 35 35
1 recommendation 23 34
2 recommedation 8 34
3 confort 7 7
4 availability 4 4
5 reconmmendation 3 34
6 facilities 3 3
如果需要并排查看字符串和部分字符串。尝试
df=df.assign(groupname=df.reason.str[0:4])
df['groupcount']=df.groupby(df.reason.str[0:4])['count'].transform('sum')
print(df)
reason count groupname groupcount
0 location 35 loca 35
1 recommendation 23 reco 34
2 recommedation 8 reco 34
3 confort 7 conf 7
4 availability 4 avai 4
5 reconmmendation 3 reco 34
6 facilities 3 faci 3
如果你有多个连续的项目,就像你在 csv 中一样;然后
#Read csv
df=pd.read_csv(r'path')
#Create another column which is a list of values 'Why you choose us' in each row
df['Why you choose us']=(df['Why you choose us'].str.lower().fillna('no comment given')).str.split(',')
#Explode group to ensure each unique reason is int its own row but with all the otehr attrutes intact
df=df.explode('Why you choose us')
#remove any white spaces before values in the column group and value_counts
df['Why you choose us'].str.strip().value_counts()
print(df['Why you choose us'].str.strip().value_counts())
location 48
no comment given 34
recommendation 25
confort 8
facilities 8
recommedation 8
price 7
availability 6
reputation 5
reconmmendation 3
internet 3
ac 3
breakfast 3
tranquility 2
cleanliness 2
aveilable 1
costumer service 1
pool 1
comfort 1
search engine 1
Name: group, dtype: int64
添加回答
举报