1 回答

TA贡献1812条经验 获得超5个赞
按照你的解释:
"""
goal waiting
Column_A Column_B Column_A_B Status Qty
0 test1 WO1 test1W01 Cancelled 12
1 test4 WO6 test4WO6 Active 3000
2 test6 WO6 test6WO6 Open 14
3 test6 WO6 test6WO6 Active 88
4 test7 WO7 test7WO7 Active 1500
"""
import pandas as pd
import numpy as np
from numpy import NaN
df = pd.DataFrame({'Column_A':['test1', 'test7', 'test7', 'test4', 'test6', 'test6', 'test7'],'Column_B':['WO1','WO7', 'WO7', 'WO6', 'WO6', 'WO6', 'WO7'],
'Status': ['Cancelled','Cancelled', 'Active', 'Active', 'Open', 'Active', 'Active'],
'Qty': ['12', '34' , '13', '3000', '14', '88', '1500']})
df_deleted = df.copy(deep=True)
df_deleted.drop(df.index,inplace=True)
#Step1
def process(r):
return r['Column_A'] + r['Column_B']
df["Column_A_B"] = df.apply(lambda row: process(row), axis = 1)
print("step 1");print(df)
#Step2
df['countAB'] = df.groupby('Column_A_B')['Qty'].transform('count')
print("step 2");print(df)
#Step3
df['True_False']=df['countAB'] == 1
print("step 3");print(df)
#Step4
todelete = df[(df['Status'] == 'Cancelled') & (df['True_False'] == False)]
df = df[(df['Status'] != 'Cancelled') | (df['True_False'] == True)]
df.drop(['countAB','True_False'], axis=1, inplace=True)
todelete.drop(['True_False', 'countAB'], axis=1, inplace=True)
df_deleted = df_deleted.append(todelete)
print("step 4");print(df);print("step 4 - deleted");print(df_deleted)
#5tep5
df['Qty'] = df['Qty'].astype(int)
df['maxAB'] = df.groupby('Column_A_B')['Qty'].transform('max')
todelete = df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]
df= df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]
df = df.reset_index(drop=True)
todelete.drop(['maxAB'], axis=1, inplace=True)
df_deleted = df_deleted.append(todelete)
df.drop(['maxAB'], axis=1, inplace=True)
print("step 5");print(df);print("step 5 - deleted");print(df_deleted)
输出:
Column_A Column_B Status Qty Column_A_B
0 test1 WO1 Cancelled 12 test1WO1
1 test4 WO6 Active 3000 test4WO6
2 test6 WO6 Open 14 test6WO6
3 test6 WO6 Active 88 test6WO6
4 test7 WO7 Active 1500 test7WO7
step 5 - deleted
Column_A Column_A_B Column_B Qty Status
1 test7 test7WO7 WO7 34 Cancelled
2 test7 test7WO7 WO7 13 Active
一些解释:
对于步骤 1:
它只是将 2 列与 lambda 连接起来,当您使用 apply 时,您会在每一行(轴 = 1)上执行某些操作,结果在新列“Column_A_B”中
#Step1
# definition of lambda function (others ways to do exist)
def process(r):
return r['Column_A'] + r['Column_B'] # i concatenate the 2 values
df["Column_A_B"] = df.apply(lambda row: process(row), axis = 1)
print("step 1");print(df)
结果:
step 1
Column_A Column_B Status Qty Column_A_B
0 test1 WO1 Cancelled 12 test1WO1
1 test7 WO7 Cancelled 34 test7WO7
2 test7 WO7 Active 13 test7WO7
3 test4 WO6 Active 3000 test4WO6
4 test6 WO6 Open 14 test6WO6
5 test6 WO6 Active 88 test6WO6
6 test7 WO7 Active 1500 test7WO7
对于第 5 步:
这个想法是在每个组中创建一个具有最大值 Qty 的新列(这里的组是 Column_A_B),所以在这个命令之后:
df['maxAB'] = df.groupby('Column_A_B')['Qty'].transform('max')
print("maxAB");print(df)
结果:
maxAB
Column_A Column_B Status Qty Column_A_B maxAB
0 test1 WO1 Cancelled 12 test1WO1 12 *max value of group test1WO1
2 test7 WO7 Active 13 test7WO7 1500 *max value of group test7WO7
3 test4 WO6 Active 3000 test4WO6 3000 *max value of group test4WO6
4 test6 WO6 Open 14 test6WO6 88 *max value of group test6WO6
5 test6 WO6 Active 88 test6WO6 88 *max value of group test6WO6
6 test7 WO7 Active 1500 test7WO7 1500 *max value of group test7WO7
正如你所看到的,你在自己面前拥有每个组的最大值(对不起我的英语)
现在,对于 Qty > 99 和 Qty <=16 的每个组,我只删除 Qty <= 16 的行。
所以下一个命令说:我保留所有回答这个过滤器的行并放入 todelete 数据帧
todelete = df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]
所以在 todelete 我想保留,但在 df 我想删除(并保留所有其他行)。我们必须使用相反的过滤器。
在逻辑 => A and b 中,相反的not ( A and B) = (not A) or (not B)
所以“不是”的逻辑
df[(df['maxAB'] > 99) & (df['Qty'] <= 16)]
是:
df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]
所以在这个命令之后:
# i want to keep rows which have a Qty <= 99
# or
# rows which have a Qty > 16
df= df[(df['maxAB'] <= 99) | (df['Qty'] > 16)]
您可以通过使用变量来简化:
filter = (df['maxAB'] > 99) & (df['Qty'] <= 16)
todelete = df[filter]
df= df[~filter]
~filter等价于not filter
我重建索引(0到4)
df = df.reset_index(drop=True)
最后,等待最终结果(删除临时列后)
希望这有助于理解...
添加回答
举报