3 回答
TA贡献1712条经验 获得超3个赞
重用当前代码的解决方案:
def cust_func(row):
r = row['Col2']
if r >=0 AND r<=10:
val = 1
elif r >=10.01 AND r<=20:
val = 2
elseif r>=20.01 AND r<=30:
val = 3
return val
df['Col3'] = df.apply(cust_func, axis=1)
最佳解决方案:
cut_labels = [1, 2, 3]
cut_bins = [0, 10, 20,30]
df['Col3'] = pd.cut(df['Col2'], bins=cut_bins, labels=cut_labels)
TA贡献1784条经验 获得超9个赞
有几种方法:麻木选择和麻木。我更喜欢后者,因为我不必列出条件 - 只要您的数据排序,它就适用于二分法算法;是的,我认为这是一群中最快的。
如果您运行一些计时并共享结果,那将很酷:
Col1 Col2
0 a 5
1 b 10
2 c 15
3 d 20
4 e 25
5 f 30
6 g 1
7 h 11
8 i 21
9 j 7
#step 1: create your 'conditions'
#sort dataframe on Col2
df = df.sort_values('Col2')
#benchmarks are ur ranges within which you set your scores/grade
benchmarks = np.array([10,20,30])
#the grades to be assigned for Col2
score = np.array([1,2,3])
#and use search sorted
#it will generate the indices for where the values should be
#e.g if you have [1,4,5] then the position of 3 will be 1, since it is between 1 and 4
#and python has a zero based index notation
indices = np.searchsorted(benchmarks,df.Col2)
#create ur new column by indexing the score array with the indices
df['Col3'] = score[indices]
df = df.sort_index()
df
Col1 Col2 Col3
0 a 5 1
1 b 10 1
2 c 15 2
3 d 20 2
4 e 25 3
5 f 30 3
6 g 1 1
7 h 11 2
8 i 21 3
9 j 7 1
TA贡献1744条经验 获得超4个赞
你可以用 np.select() 漂亮而干净地做到这一点。我添加了一些<=,因为我猜你想更新所有值。但是,如果需要,它很容易编辑。
conditions = [(df['Col2'] > 0) & (df['Col2'] <= 10),
(df['Col2'] > 10) & (df['Col2'] <= 20),
(df['Col2'] > 20) & (df['Col2'] <= 30) ]
updates = [1, 2, 3]
df["Col3"] = np.select(conditions, updates, default=999)
使用原始范围将导致这种情况,其中值 == 10, 20, 30 从 np.select() 获取值 999。
conditions = [(df['Col2'] > 0) & (df['Col2'] < 10),
(df['Col2'] > 10.01) & (df['Col2'] < 20),
(df['Col2'] > 20.1) & (df['Col2'] < 30) ]
updates = [1, 2, 3]
df["Col3"] = np.select(conditions, updates, default=999)
print(df)
Col1 Col2 Col3
0 a 5 1
1 b 10 999
2 c 15 2
3 d 20 999
4 e 25 3
5 f 30 999
6 g 1 1
7 h 11 2
8 i 21 3
9 j 7 1
添加回答
举报