按中位数、百分位数和占总数的百分比分组

我有一个看起来像这样的数据框...... ID Acuity TOTAL_ED_LOS 1 2 423 2 5 52 3 5 535 4 1 87 ...我想生成一个如下所示的表： Acuity Count Median Percentile_25 Percentile_75 % of total 1 234 ... 31% 2 65 ... 8% 3 56 ... 7% 4 345 ... 47% 5 35 ... 5%我已经有代码可以提供我需要的一切，除了 % of total 列def percentile(n): def percentile_(x): return np.percentile(x, n) percentile_.__name__ = 'percentile_%s' % n return percentile_df_grp = df_merged_v1.groupby(['Acuity'])df_grp['TOTAL_ED_LOS'].agg(['count','median', percentile(25), percentile(75)]).reset_index()有没有一种有效的方法可以添加总列的百分比？下面的链接包含有关如何获取总数百分比的代码，但我不确定如何将其应用到我的代码中。我知道我可以创建两个表然后合并它们，但我很好奇是否有更简洁的方法。如何在 Python 中计算 groupby 中的计数和百分比

查看完整描述

1 回答

海绵宝宝撒

TA贡献1809条经验获得超8个赞

这是使用一些 pandas 内置工具的一种方法：

# Set random number seeed and create a dummy datafame with two columns

np.random.seed(123)

df = pd.DataFrame({'activity':np.random.choice([*'ABCDE'], 40),

'TOTAL_ED_LDS':np.random.randint(50, 500, 40)})

# Reshape dataframe to get activit per column

# then use the output from describe and transpose

df_out = df.set_index([df.groupby('activity').cumcount(),'activity'])['TOTAL_ED_LDS']\

.unstack().describe().T

#Calculate percent count of total count

df_out['% of Total'] = df_out['count'] / df_out['count'].sum() * 100.

df_out

输出：

count mean std min 25% 50% 75% max % of Total

activity

A 8.0 213.125000 106.810162 93.0 159.50 200.0 231.75 421.0 20.0

B 10.0 308.200000 116.105125 68.0 240.75 324.5 376.25 461.0 25.0

C 6.0 277.666667 117.188168 114.0 193.25 311.5 352.50 409.0 15.0

D 7.0 370.285714 124.724649 120.0 337.50 407.0 456.00 478.0 17.5

E 9.0 297.000000 160.812002 51.0 233.00 294.0 415.00 488.0 22.5

反对回复 2023-02-22

热搜

最近搜索清空

按中位数、百分位数和占总数的百分比分组

按中位数、百分位数和占总数的百分比分组

1 回答

添加回答