4 回答
TA贡献1834条经验 获得超8个赞
第一步是使用perSeriesGroupBy.value_counts
的计数值,优点是已经对值进行了排序,然后通过 获取计数器,通过 过滤第一个值,通过 旋转,更改列名并最后转换为列:City
ID
GroupBy.cumcount
3
loc
DataFrame.pivot
ID
DataFrame.reset_index
df = (df.groupby('ID')['City'].value_counts()
.groupby(level=0).cumcount()
.loc[lambda x: x < 3]
.reset_index(name='c')
.pivot('ID','c','City')
.rename(columns={0:'first_', 1:'second_', 2:'third_'})
.add_suffix('frequent_city')
.rename_axis(None, axis=1)
.reset_index())
print (df)
ID first_frequent_city second_frequent_city third_frequent_city
0 1 London New York Berlin
1 2 Shanghai NaN NaN
TA贡献1735条经验 获得超5个赞
另一种使用count作为排序参考的方法,然后通过遍历groupby对象重新创建数据框:
df = (df.assign(count=df.groupby(["ID","City"])["City"].transform("count"))
.drop_duplicates(["ID","City"])
.sort_values(["ID","count"], ascending=False))
print (pd.DataFrame([i["City"].unique()[:3] for _, i in df.groupby("ID")]).fillna(np.NaN))
0 1 2
0 London New York Berlin
1 Shanghai NaN NaN
TA贡献2003条经验 获得超2个赞
获取.countby ID,City然后np.where()与.groupby()withmax和median一起使用min。然后将索引和取消堆叠行设置为列上的列max。
df = df.assign(count=df.groupby(['ID', 'City'])['City'].transform('count')).drop_duplicates()
df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('min')), 'third_frequent_city', np.nan)
df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('median')), 'second_frequent_city', df['max'])
df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('max')), 'first_frequent_city', df['max'])
df = df.drop('count',axis=1).set_index(['ID', 'max']).unstack(1)
输出:
City
max first_frequent_city second_frequent_city third_frequent_city
ID
1 London New York Berlin
2 Shanghai NaN NaN
TA贡献1818条经验 获得超7个赞
有点长,基本上你 groupby 两次,第一部分基于分组按升序对数据进行排序的想法,第二部分允许我们将数据拆分为单独的列:
(df
.groupby("ID")
.tail(3)
.drop_duplicates()
.groupby("ID")
.agg(",".join)
.City.str.split(",", expand=True)
.set_axis(["first_frequent_city",
"second_frequent_city",
third_frequent_city"],
axis="columns",)
)
first_frequent_city second_frequent_city third_frequent_city
ID
1 London New York Berlin
2 Shanghai None None
添加回答
举报