2 回答
TA贡献1802条经验 获得超6个赞
没那么容易,添加了另外 2 个解决方案。差异在于平均值,因为不存在mean,means也count用于排除缺失值,所以我更喜欢size方法:
np.random.seed(2020)
df = pd.DataFrame(np.random.randint(10, size=(3, 3)))
dfs = [df, df * 2, df * 3, df * 5]
list_of_summaries = [x.agg(['min','max','size', 'mean','count', 'sum']) for x in dfs]
df = pd.concat(list_of_summaries, axis=1)
df = pd.DataFrame([df.loc['min'].min(level=0),
df.loc['max'].max(level=0),
df.loc['size'].sum(level=0),
df.loc['sum'].sum(level=0)])
df.loc['mean'] = df.loc['sum'].div(df.loc['size'])
df = df.drop('sum')
print (df)
0 1 2
min 0.000000 3.000000 0.0
max 35.000000 40.000000 15.0
size 12.000000 12.000000 12.0
mean 11.916667 17.416667 5.5
df1 = (pd.concat(list_of_summaries, axis=1)
.T
.groupby(level=0)
.agg({'min':'min', 'max':'max', 'size':'sum', 'sum':'sum'})
.T)
df1.loc['mean'] = df1.loc['sum'].div(df.loc['size'])
df1 = df1.drop('sum')
print (df1)
0 1 2
min 0.000000 3.000000 0.0
max 35.000000 40.000000 15.0
size 12.000000 12.000000 12.0
mean 11.916667 17.416667 5.5
import functools
import pandas as pd
def reduce_(a, b):
return pd.DataFrame([
pd.concat([a.loc['min'], b.loc['min']], axis=1).min(axis=1),
pd.concat([a.loc['max'], b.loc['max']], axis=1).max(axis=1),
pd.concat([a.loc['count'], b.loc['count']], axis=1).sum(axis=1),
pd.concat([a.loc['mean'], b.loc['mean']], axis=1).mean(axis=1),
], index=['min', 'max', 'count', 'mean'])
assert len(list_of_summaries) > 0
summary_of_summaries = functools.reduce(reduce_, list_of_summaries)
print (summary_of_summaries)
0 1 2
min 0.000000 3.000000 0.00
max 35.000000 40.000000 15.00
count 12.000000 12.000000 12.00
mean 15.708333 22.958333 7.25
TA贡献1841条经验 获得超3个赞
这是我目前拥有的最好的方法,不需要将所有数据合并到一个巨大的 DataFrame 中。它的可读性或效率不是很高,但我不妨将其发布以明确我正在寻找的内容:
import functools
import pandas as pd
def reduce_(a, b):
return pd.DataFrame([
pd.concat([a.loc['min'], b.loc['min']], axis=1).min(axis=1),
pd.concat([a.loc['max'], b.loc['max']], axis=1).max(axis=1),
pd.concat([a.loc['count'], b.loc['count']], axis=1).sum(axis=1),
# mean is weighted so trickier
], index=['min', 'max', 'count'])
assert len(list_of_summaries) > 0
summary_of_summaries = functools.reduce(reduce_, list_of_summaries)
这里的困难本质上是我需要为每一行使用不同的运算符,你知道吗?
添加回答
举报