1 回答

TA贡献1850条经验 获得超11个赞
来自MovieLens的MovieLens 25M 数据集的数据
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# data
df = pd.read_csv('ml-25m/movies.csv')
print(df.head())
movieId title genres
0 1 Toy Story (1995) Adventure|Animation|Children|Comedy|Fantasy
1 2 Jumanji (1995) Adventure|Children|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama|Romance
4 5 Father of the Bride Part II (1995) Comedy
# clean genres
df['genres'] = df['genres'].str.split('|')
df = df.explode('genres').reset_index(drop=True)
print(df.head())
movieId title genres
0 1 Toy Story (1995) Adventure
1 1 Toy Story (1995) Animation
2 1 Toy Story (1995) Children
3 1 Toy Story (1995) Comedy
4 1 Toy Story (1995) Fantasy
流派很重要
gc = df.genres.value_counts().to_frame()
print(genre_count)
genres
Drama 25606
Comedy 16870
Thriller 8654
Romance 7719
Action 7348
Horror 5989
Documentary 5605
Crime 5319
(no genres listed) 5062
Adventure 4145
Sci-Fi 3595
Children 2935
Animation 2929
Mystery 2925
Fantasy 2731
War 1874
Western 1399
Musical 1054
Film-Noir 353
IMAX 195
阴谋:sns.barplot
和ax
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(x=gc.index, y=gc.genres, palette=sns.color_palette("BuGn_r", n_colors=len(genre_count) + 4), ax=ax)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment='right')
plt.show()
没有ax
plt.figure(figsize=(12, 6))
chart = sns.barplot(x=gc.index, y=gc.genres, palette=sns.color_palette("BuGn_r", n_colors=len(genre_count)))
chart.set_xticklabels(chart.get_xticklabels(), rotation=45, horizontalalignment='right')
plt.show()
阴谋:sns.countplot
如果情节顺序无关紧要,请使用
sns.countplot
跳过使用。.value_counts()
要订购
countplot
,order=df.genres.value_counts().index
必须使用,所以如果需要降序,countplot
并不能真正使您免于需要 。.value_counts()
fig, ax = plt.subplots(figsize=(12, 6))
sns.countplot(data=df, x='genres', ax=ax)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment='right')
plt.show()
添加回答
举报