基于优先级过滤 pandas DataFrame 的高效/Pythonic 方法

我有以下数据框。+-----------+----------+-----+| InvoiceNo | ItemCode | Qty |+-----------+----------+-----+| Inv-001 | A | 2 |+-----------+----------+-----+| Inv-001 | B | 3 |+-----------+----------+-----+| Inv-001 | C | 1 |+-----------+----------+-----+| Inv-002 | B | 3 |+-----------+----------+-----+| Inv-002 | D | 4 |+-----------+----------+-----+| Inv-003 | C | 3 |+-----------+----------+-----+| Inv-003 | D | 9 |+-----------+----------+-----+| Inv-004 | D | 5 |+-----------+----------+-----+| Inv-004 | E | 8 |+-----------+----------+-----+| Inv-005 | X | 2 |+-----------+----------+-----+Type我的任务是根据项目出现的优先级创建一个附加列。例如：ItemCode A有1st优先权。然后B有2nd优先权，C有3rd优先权。其余项目有least优先级和分类有Other。因此，如果任何 Invoice 包含 item A，则类型应该Type - A与其他项目无关。来自余额发票如果项目B包含，则类型应该是Type - B。同样的C。如果 none ofA, B or C不存在于任何发票中，则类型应为Type - Other。下面是我想要的输出。+-----------+----------+-----+--------------+| InvoiceNo | ItemCode | Qty | Type |+-----------+----------+-----+--------------+| Inv-001 | A | 2 | Type - A |+-----------+----------+-----+--------------+| Inv-001 | B | 3 | Type - A |+-----------+----------+-----+--------------+| Inv-001 | C | 1 | Type - A |+-----------+----------+-----+--------------+| Inv-002 | B | 3 | Type - B |+-----------+----------+-----+--------------+| Inv-002 | D | 4 | Type - B |+-----------+----------+-----+--------------+| Inv-003 | C | 3 | Type - C |+-----------+----------+-----+--------------+| Inv-003 | D | 9 | Type - C |+-----------+----------+-----+--------------+| Inv-004 | D | 5 | Type - Other |+-----------+----------+-----+--------------+| Inv-004 | E | 8 | Type - Other |+-----------+----------+-----+--------------+| Inv-005 | X | 2 | Type - Other |+-----------+----------+-----+--------------+现在，最重要efficient的pythonic方法是什么？

查看完整描述

1 回答

江户川乱折腾

TA贡献1851条经验获得超5个赞

我觉得我们可以Categorical做到transform

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)

df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('min').fillna('other')

更新

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)

df=df.sort_values('Type')

df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('first').fillna('other')

df=df.sort_index()

Out[32]:

InvoiceNo ItemCode Qty Type

0 Inv-001 A 2 Type_A

1 Inv-001 B 3 Type_A

2 Inv-001 C 1 Type_A

3 Inv-002 B 3 Type_B

4 Inv-002 D 4 Type_B

5 Inv-003 C 3 Type_C

6 Inv-003 D 9 Type_C

7 Inv-004 D 5 Type_other

8 Inv-004 E 8 Type_other

9 Inv-005 X 2 Type_other

反对回复 2023-05-09

热搜

最近搜索清空

基于优先级过滤 pandas DataFrame 的高效/Pythonic 方法

基于优先级过滤 pandas DataFrame 的高效/Pythonic 方法

1 回答

添加回答