我有以下数据框:date = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00','2015-02-04 01:00:00','2015-02-04 01:30:00','2015-02-04 02:00:00','2015-02-04 02:30:00','2015-02-04 03:00:00','2015-02-04 03:30:00','2015-02-04 04:00:00','2015-02-04 04:30:00','2015-02-04 05:00:00','2015-02-04 05:30:00','2015-02-04 06:00:00','2015-02-04 06:30:00','2015-02-04 07:00:00','2015-02-04 07:30:00','2015-02-04 08:00:00','2015-02-04 08:30:00','2015-02-04 09:00:00','2015-02-04 09:30:00','2015-02-04 10:00:00','2015-02-04 10:30:00','2015-02-04 11:00:00','2015-02-04 11:30:00','2015-02-04 12:00:00','2015-02-04 12:30:00','2015-02-04 13:00:00','2015-02-04 13:30:00','2015-02-04 14:00:00','2015-02-04 14:30:00','2015-02-04 15:00:00','2015-02-04 15:30:00','2015-02-04 16:00:00','2015-02-04 16:30:00','2015-02-04 17:00:00','2015-02-04 17:30:00','2015-02-04 18:00:00','2015-02-04 18:30:00','2015-02-04 19:00:00','2015-02-04 19:30:00','2015-02-04 20:00:00','2015-02-04 20:30:00','2015-02-04 21:00:00','2015-02-04 21:30:00','2015-02-04 22:00:00','2015-02-04 22:30:00','2015-02-04 23:00:00','2015-02-04 23:30:00']df = pd.DataFrame({'value':value,'index':date})df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')df.drop(['index'],axis=1,inplace=True)print(df) valueindex 2015-02-03 23:00:00 33.242015-02-03 23:30:00 31.712015-02-04 00:00:00 34.392015-02-04 00:30:00 34.492015-02-04 01:00:00 34.672015-02-04 01:30:00 34.46我想有效地进行以下操作:对于每一年,计算严格低于 0、包含 0 和严格低于 20、然后高于 20 的值出现的百分比我知道函数 cut 和 groupby,但我想不出一种方法来合并两者来优雅地做到这一点。预期结果类似于: inf0 supequal0_inf20 supequal20 2015 0.2 0.6 0.22016 0.7 0.1 0.22017 0.1 0.8 0.1非常感谢您的帮助,
1 回答

皈依舞
TA贡献1851条经验 获得超3个赞
鉴于您的df,我不知道优雅,这应该有效:
# altered bins for demonstration purposes
binned = pd.cut(x=df.value, bins=[-np.inf, 40, 50, np.inf], right=False, labels=['low', 'mid', 'high'])
grouped = binned.groupby([pd.Grouper(freq='Y'), binned]).count() / binned.groupby(pd.Grouper(freq='Y')).count()
结果print(grouped):
index value
2015-12-31 low 0.520000
mid 0.380000
high 0.100000
添加回答
举报
0/150
提交
取消