3 回答
TA贡献1775条经验 获得超11个赞
也许这是一个漏洞答案,但您可以从已经描述的答案中过滤掉这些值。所以如果你从这个开始:
>>> df2 = df.favorite_fruit.str.split(expand=True).stack()
>>> df2
0 0 apple
1 banana
2 cherries
1 0 banana
1 cherries
2 dragonfruit
2 0 cherries
1 dragonfruit
3 0 dragonfruit
4 0 apple
1 elderberry
dtype: object
您可以使用isin将数据限制为目标列表中的数据:
>>> target = ['apple', 'banana']
>>> df2[df2.isin(target)].value_counts()
banana 2
apple 2
dtype: int64
或者甚至在你最初的回答之后:
>>> df.favorite_fruit.str.split(expand=True).stack().value_counts().loc[target]
apple 2
banana 2
dtype: int64
如果问题是这么多数据的expand操作stack成本很高,那么这可能不会令人满意。但我认为这可能比基于循环的答案更好?
TA贡献1789条经验 获得超8个赞
也许有点迂回的方式,但如果你的favorite_fruit列总是以空格分隔,这样的方法应该可行:
import pandas as pd
list = ['apple','banana','cherries','dragonfruit','elderberry']
data = {'name': ['Alpha', 'Bravo','Charlie','Delta','Echo'],
'favorite_fruit': ['apple banana cherries', 'banana cherries dragonfruit',
'cherries dragonfruit','dragonfruit','apple elderberry']}
df = pd.DataFrame (data, columns = ['name','favorite_fruit'])
new_df = pd.DataFrame()
data = {}
for i, row in df.iterrows():
s = row['favorite_fruit']
items = s.split(' ')
for item in items:
if item in data.keys():
data[item].append(1)
else:
data[item] = [1]
for key, value in data.items():
data[key] = sum(value)
fruit = []
frequency = []
for key, value in data.items():
fruit.append(key)
frequency.append(value)
new_df = pd.DataFrame({'fruit': fruit, 'frequency':frequency})
print(new_df)
这会打印出以下内容:
fruit frequency
0 apple 2
1 banana 2
2 cherries 3
3 dragonfruit 3
4 elderberry 1
TA贡献1779条经验 获得超6个赞
拆分后尝试使用爆炸功能。
df.favorite_fruit.str.split().explode().value_counts()
cherries 3
dragonfruit 3
banana 2
apple 2
elderberry 1
Name: favorite_fruit, dtype: int64
添加回答
举报