首页猿问如何分组并获得三个最频繁的值？

如何分组并获得三个最频繁的值？

Python

汪汪一只猫 2023-05-16 09:46:27

我想按 id 分组并获得三个最常见的城市。例如我有原始数据框 ID City 1 London 1 London 1 New York 1 London 1 New York 1 Berlin 2 Shanghai 2 Shanghai我想要的结果是这样的：ID first_frequent_city second_frequent_city third_frequent_city1 London New York Berlin2 Shanghai NaN NaN

查看完整描述

4 回答

MMMHUHU

TA贡献1834条经验获得超8个赞

第一步是使用perSeriesGroupBy.value_counts的计数值，优点是已经对值进行了排序，然后通过获取计数器，通过过滤第一个值，通过旋转，更改列名并最后转换为列：CityIDGroupBy.cumcount3locDataFrame.pivotIDDataFrame.reset_index

df = (df.groupby('ID')['City'].value_counts()

.groupby(level=0).cumcount()

.loc[lambda x: x < 3]

.reset_index(name='c')

.pivot('ID','c','City')

.rename(columns={0:'first_', 1:'second_', 2:'third_'})

.add_suffix('frequent_city')

.rename_axis(None, axis=1)

.reset_index())

print (df)

ID first_frequent_city second_frequent_city third_frequent_city

0 1 London New York Berlin

1 2 Shanghai NaN NaN

反对回复 2023-05-16

喵喔喔

TA贡献1735条经验获得超5个赞

另一种使用count作为排序参考的方法，然后通过遍历groupby对象重新创建数据框：

df = (df.assign(count=df.groupby(["ID","City"])["City"].transform("count"))

.drop_duplicates(["ID","City"])

.sort_values(["ID","count"], ascending=False))

print (pd.DataFrame([i["City"].unique()[:3] for _, i in df.groupby("ID")]).fillna(np.NaN))

0 1 2

0 London New York Berlin

1 Shanghai NaN NaN

反对回复 2023-05-16

湖上湖

TA贡献2003条经验获得超2个赞

获取.countby ID，City然后np.where()与.groupby()withmax和median一起使用min。然后将索引和取消堆叠行设置为列上的列max。

df = df.assign(count=df.groupby(['ID', 'City'])['City'].transform('count')).drop_duplicates()

df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('min')), 'third_frequent_city', np.nan)

df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('median')), 'second_frequent_city', df['max'])

df['max'] = np.where((df['count'] == df.groupby('ID')['count'].transform('max')), 'first_frequent_city', df['max'])

df = df.drop('count',axis=1).set_index(['ID', 'max']).unstack(1)

输出：

City

max first_frequent_city second_frequent_city third_frequent_city

1 London New York Berlin

2 Shanghai NaN NaN

反对回复 2023-05-16

qq_笑_17

TA贡献1818条经验获得超7个赞

有点长，基本上你 groupby 两次，第一部分基于分组按升序对数据进行排序的想法，第二部分允许我们将数据拆分为单独的列：

(df

.groupby("ID")

.tail(3)

.drop_duplicates()

.groupby("ID")

.agg(",".join)

.City.str.split(",", expand=True)

.set_axis(["first_frequent_city",

"second_frequent_city",

third_frequent_city"],

axis="columns",)

)

first_frequent_city second_frequent_city third_frequent_city

1 London New York Berlin

2 Shanghai None None

反对回复 2023-05-16

4 回答
0 关注
148 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何分组并获得三个最频繁的值？

如何分组并获得三个最频繁的值？

4 回答

添加回答