2 回答
TA贡献1801条经验 获得超8个赞
对我来说工作正常,似乎没有分配回新变量:
mydata['State'] = pd.Categorical(mydata['State'],
["Delivered", "In-Transit", "Shipped", "Cancelled"],
ordered=True)
#keep='first'is default value, so should be omitted
mydata = mydata.sort_values('state').drop_duplicates(['ID','version'])
print (mydata)
ID version Name state
2 101 1 Nut Delivered
3 101 2 Nut 2.0 In-Transit
5 102 1 Screw In-Transit
6 102 2 Screw 2.0 Shipped
此外,如果想要按 排序输出ID,version请按多列添加排序:
mydata['State'] = pd.Categorical(mydata['State'],
["Delivered", "In-Transit", "Shipped", "Cancelled"],
ordered=True)
mydata = mydata.sort_values(['ID','version','state']).drop_duplicates(['ID','version'])
TA贡献1802条经验 获得超5个赞
使用pd.Categoricalwithordered=True创建一个分类变量,然后sort_values在这个分类变量上使用groupbyonID, version和aggusing first:
mydata['State'] = pd.Categorical(mydata['State'], ["Delivered", "In-Transit", "Shipped", "Cancelled"], ordered=True)
df = mydata.sort_values('State').groupby(['ID', 'version'], as_index=False).first()
结果:
ID version Name State
0 101 1 Nut Delivered
1 101 2 Nut 2.0 In-Transit
2 102 1 Screw In-Transit
3 102 2 Screw 2.0 Shipped
添加回答
举报