4 回答
TA贡献1862条经验 获得超6个赞
get_dummies()
很好,但sklearn's
MultiLabelBinarizer
有更好的性能:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
a = mlb.fit_transform(df["Hobbies"])
df_expanded = pd.DataFrame(a, columns=mlb.classes_, index=df.index)
# merge them using the following:
df_merged = df.merge(df_expanded, left_index=True, right_index=True)
print(df_merged)
index Name Hobbies Play_PS4 Play_hockey Read Sleep Watch_NBA
0 Paul [Watch_NBA, Play_PS4] 1 0 0 0 1
1 Jeff [Play_hockey, Read, Play_PS4] 1 1 1 0 0
2 Kyle [Sleep, Watch_NBA] 0 0 0 1 1
TA贡献1852条经验 获得超7个赞
In [86]: df
Out[86]:
Name Hobbies
0 Paul [NBA, PS4]
1 Jeff [Hockey, Read, PS4]
2 Kyle [Sleep, NBA]
In [87]: df['dummy'] = 1
In [88]: df.explode("Hobbies").pivot(index='Name', columns='Hobbies', values='dummy').fillna(value=0)
Out[88]:
Hobbies Hockey NBA PS4 Read Sleep
Name
Jeff 1.0 0.0 1.0 1.0 0.0
Kyle 0.0 1.0 0.0 0.0 1.0
Paul 0.0 1.0 1.0 0.0 0.0
TA贡献1828条经验 获得超3个赞
你想要get_dummies()
方法。
对于你的例子:
names = df.Name
df = pd.get_dummies(df.Hobbies.apply(pd.Series).stack()).sum(level=0)
df.insert(0, 'Name', names)
#output:
Name Play_PS4 Play_hockey Read Sleep Watch_NBA
0 Paul 1 0 0 0 1
1 Jeff 1 1 1 0 0
2 Kyle 0 0 0 1 1
TA贡献1808条经验 获得超4个赞
你可以试试这个:
n = df['Name']
df = df['Hobbies'].apply(lambda x: pd.Series([1] * len(x), index=x)).fillna(0, downcast='infer')
df.insert(0, 'Name', n)
print(df)
输出:
Name Watch_NBA Play_PS4 Play_hockey Read Sleep
0 Paul 1 1 0 0 0
1 Jeff 0 1 1 1 0
2 Kyle 1 0 0 0 1
添加回答
举报