2 回答
TA贡献1816条经验 获得超4个赞
如果我理解正确的话:
# Precompute bins for pd.cut
bins = list(range(0, df['Height (m)'].max() + 5, 5))
# Cut Height into intervals which exclude the right endpoint,
# with bin edges at multiples of 5
df['HeightBin'] = pd.cut(df['Height (m)'], bins=bins, right=False)
# Within each bin, get mean, stdev (normalized by N-1 by default),
# and also show sample size to explain why some std values are NaN
df.groupby('HeightBin')['My data'].agg(['mean', 'std', 'count'])
mean std count
HeightBin
[0, 5) NaN NaN 0
[5, 10) 2.00 0.000000 2
[10, 15) 1.25 0.353553 2
[15, 20) 5.00 NaN 1
[20, 25) 5.00 NaN 1
[25, 30) 6.35 0.494975 2
[30, 35) 8.00 NaN 1
TA贡献1719条经验 获得超6个赞
如果我理解正确,这就是您想要做的:
import pandas as pd
import numpy as np
bins = np.arange(0, 30, 5) # adjust as desired
df_stats = pd.DataFrame(columns=['mean', 'st_dev']) # DataFrame for the results
df_stats['mean'] = df.groupby(pd.cut(df['Height (m)'], bins, right=False)).mean()['My data']
df_stats['st_dev'] = df.groupby(pd.cut(df['Height (m)'], bins, right=False)).std()['My data']
添加回答
举报