有一个数据框数据如下 InsuranceId InsuranceStatus Date0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:001 Ins1234 Successful 2019-06-07 23:59:43.123456+00:002 Ins1234 Successful 2018-06-07 23:59:43.123456+00:003 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:004 Ins5678 Successful 2019-07-07 22:59:32.123421+00:005 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00尝试根据 InsuranceId 和 max(Date) 分组创建行号/排名df['RowNum'] = df.groupby('InsuranceId')['InsuranceStatus']['Date'].rank(method="first", ascending=True)and df['RowNum'] = df.groupby(by=['InsuranceId'])['InsuranceStatus']['Date'].transform(lambda x: x.rank())通过引用PANDAS 中类似 SQL 的窗口函数:Python Pandas Dataframe 中的行编号Error: Index Error: Columns status already selected 试图达到以下输出 InsuranceId InsuranceStatus Date RowNum0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00 11 Ins1234 Successful 2019-06-07 23:59:43.123456+00:00 22 Ins1234 Successful 2018-06-07 23:59:43.123456+00:00 33 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00 14 Ins5678 Successful 2019-07-07 22:59:32.123421+00:00 25 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00 3有什么我想补充的吗?请提出任何建议最终输出: InsuranceId InsuranceStatus Date Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00
1 回答
largeQ
TA贡献2039条经验 获得超7个赞
使用rank
. 只需传递您想要进行分组的值并对需要排名的列进行排名即可。
df['Rank'] = df.groupby(by=['InsuranceId'])['Date'].rank(method='max',ascending=False) df[df['Rank']==1]
输出:
InsuranceId InsuranceStatus Date Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00
添加回答
举报
0/150
提交
取消