5 回答
TA贡献1856条经验 获得超17个赞
采用排序后的顺序,然后对其应用二次函数,其中根是数组长度的 1/2(加上一些小的偏移量)。通过这种方式,最高排名被赋予极值(eps偏移量的符号决定了您是否想要排名在最低值之上的最高值)。我在末尾添加了一个小组来展示它如何正确处理重复值或奇数组大小。
def extremal_rank(s):
eps = 10**-4
y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index)
- (len(s)+1)/2 + eps)**2
return y.reindex_like(s)
df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
df = df.sort_values(['Group', 'rnk'], ascending=[True, False])
Group Name Performance rnk
2 A Chad Webster 142 6.2505
0 A Sheldon Webb 33 6.2495
4 A Elijah Mendoza 122 2.2503
1 A Traci Dean 64 2.2497
3 A Ora Harmon 116 0.2501
5 A June Strickland 68 0.2499
8 B Joel Gill 132 2.2503
9 B Vernon Stone 80 2.2497
7 B Betty Sutton 127 0.2501
6 B Beth Vasquez 95 0.2499
11 C b 110 9.0006
12 C c 68 8.9994
10 C a 110 4.0004
13 C d 68 3.9996
15 C f 70 1.0002
16 C g 70 0.9998
14 C e 70 0.0000
TA贡献1828条经验 获得超3个赞
您可以避免在 Performace 上groupby使用sort_values一次升序一次降序,concat两个排序的数据帧,然后使用sort_index并drop_duplicates获得预期的输出:
df_ = (pd.concat([df.sort_values(['Group', 'Performance'], ascending=[True, False])
.reset_index(), #need the original index for later drop_duplicates
df.sort_values(['Group', 'Performance'], ascending=[True, True])
.reset_index()
.set_index(np.arange(len(df))+0.5)], # for later sort_index
axis=0)
.sort_index()
.drop_duplicates('index', keep='first')
.reset_index(drop=True)
[['Group', 'Name', 'Performance']]
)
print(df_)
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95
TA贡献1770条经验 获得超3个赞
对每个组应用nlargest和的排序串联:nsmallest
>>> (df.groupby('Group')[df.columns[1:]]
.apply(lambda x:
pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
x.nsmallest(x.shape[0]-x.shape[0]//2,'Performance').reset_index()]
)
.sort_index()
.drop('index',1))
.reset_index().drop('level_1',1))
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95
TA贡献1818条经验 获得超7个赞
只是另一种使用自定义函数的方法np.empty:
def mysort(s):
arr = s.to_numpy()
c = np.empty(arr.shape, dtype=arr.dtype)
idx = arr.shape[0]//2 if not arr.shape[0]%2 else arr.shape[0]//2+1
c[0::2], c[1::2] = arr[:idx], arr[idx:][::-1]
return pd.DataFrame(c, columns=s.columns)
print (df.sort_values("Performance", ascending=False).groupby("Group").apply(mysort))
Group Name Performance
Group
A 0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
B 0 B Joel Gill 132
1 B Vernon Stone 80
2 B Betty Sutton 127
3 B Beth Vasquez 95
基准:
TA贡献1877条经验 获得超1个赞
让我们尝试用 检测min, max行groupby().transform(),然后排序:
groups = df.groupby('Group')['Performance']
mins, maxs = groups.transform('min'), groups.transform('max')
(df.assign(temp=df['Performance'].eq(mins) | df['Performance'].eq(maxs))
.sort_values(['Group','temp','Performance'],
ascending=[True, False, False])
.drop('temp', axis=1)
)
输出:
Group Name Performance
2 A Chad Webster 142
0 A Sheldon Webb 33
4 A Elijah Mendoza 122
3 A Ora Harmon 116
5 A June Strickland 68
1 A Traci Dean 64
8 B Joel Gill 132
9 B Vernon Stone 80
7 B Betty Sutton 127
6 B Beth Vasquez 95
添加回答
举报