我有这个数据框import pandas as pdfrom datetime import datetimedf = pd.DataFrame([ {"_id": "1", "date": datetime.strptime("2020-09-29 07:00:00", '%Y-%m-%d %H:%M:%S'), "status": "started"}, {"_id": "2", "date": datetime.strptime("2020-09-29 14:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"}, {"_id": "3", "date": datetime.strptime("2020-09-25 17:00:00", '%Y-%m-%d %H:%M:%S'), "status": "started"}, {"_id": "4", "date": datetime.strptime("2020-09-17 09:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"}, {"_id": "5", "date": datetime.strptime("2020-09-19 07:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"}, {"_id": "6", "date": datetime.strptime("2020-09-19 08:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"},]).set_index('date')看起来像这样: _id statusdate 2020-09-29 07:00:00 1 started2020-09-29 14:00:00 2 end2020-09-25 17:00:00 3 started2020-09-17 09:00:00 4 end2020-09-19 07:00:00 5 end我正在尝试按天分组并计算每个状态。但我想在列名称中包含名称的名称。这是所需的输出: status_started status_enddate2020-09-29 07:00:00 1 12020-09-25 17:00:00 1 02020-09-17 09:00:00 0 12020-09-19 07:00:00 0 2我试过这个:df = df.groupby([pd.Grouper(freq='d'), 'status']).agg({'status': "count"})df = df.reset_index(level="status")out: statusdate status 2020-09-17 end 12020-09-19 end 22020-09-25 started 12020-09-29 end 12020-09-29 started 1但并没有成功改造df。
2 回答
qq_笑_17
TA贡献1818条经验 获得超7个赞
您可以尝试crosstab:
d = pd.crosstab(df.index.date, df['status'])\
.rename_axis('date').add_prefix('status_')
status status_end status_started
date
2020-09-17 1 0
2020-09-19 2 0
2020-09-25 0 1
2020-09-29 1 1
一只名叫tom的猫
TA贡献1906条经验 获得超3个赞
您只需要unstack:
df.groupby([pd.Grouper(freq='d'), 'status']).size().unstack('status', fill_value=0)
输出:
status end started
date
2020-09-17 1 0
2020-09-19 2 0
2020-09-25 0 1
2020-09-29 1 1
添加回答
举报
0/150
提交
取消