首页猿问两个数字之间的列上的熊猫动作

两个数字之间的列上的熊猫动作

Python

紫衣仙女 2022-09-27 16:20:52

目前使用熊猫和麻痹症。我有一个名为“df”的数据帧。假设我有下面的数据，我该如何根据 between 子句给第三列一个值？如果可能的话，我想把它当作一种矢量化的方法，以保持我已经拥有的速度。我尝试过lambda函数，但坦率地说，我不明白我在做什么，我遇到了错误，例如对象没有属性“之间”。一般方法 - 使用非矢量化方法：NOTE: I am looking for a way to make this vectorised.If df.['Col2'] is between 0 and 10 df.['Col 3'] = 1Elseif df.['Col2'] is between 10.01 and 20 df.['Col3'] = 2Else if df.['Col2'] is between 20.1 and 30 df.['Col3'] = 3样品集+------+------+------+| Col1 | Col2 | Col3 |+------+------+------+| a | 5 | 1 || b | 10 | 1 || c | 15 | 2 || d | 20 | 2 || e | 25 | 3 || f | 30 | 3 || g | 1 | 1 || h | 11 | 2 || i | 21 | 3 || j | 7 | 1 |+------+------+------+非常感谢

查看完整描述

3 回答

交互式爱情

TA贡献1712条经验获得超3个赞

重用当前代码的解决方案：

def cust_func(row):

r = row['Col2']

if r >=0 AND r<=10:

val = 1

elif r >=10.01 AND r<=20:

val = 2

elseif r>=20.01 AND r<=30:

val = 3

return val

df['Col3'] = df.apply(cust_func, axis=1)

最佳解决方案：

cut_labels = [1, 2, 3]

cut_bins = [0, 10, 20,30]

df['Col3'] = pd.cut(df['Col2'], bins=cut_bins, labels=cut_labels)

反对回复 2022-09-27

千万里不及你

TA贡献1784条经验获得超9个赞

有几种方法：麻木选择和麻木。我更喜欢后者，因为我不必列出条件 - 只要您的数据排序，它就适用于二分法算法;是的，我认为这是一群中最快的。

如果您运行一些计时并共享结果，那将很酷：

Col1 Col2

0 a 5

1 b 10

2 c 15

3 d 20

4 e 25

5 f 30

6 g 1

7 h 11

8 i 21

9 j 7

#step 1: create your 'conditions'

#sort dataframe on Col2

df = df.sort_values('Col2')

#benchmarks are ur ranges within which you set your scores/grade

benchmarks = np.array([10,20,30])

#the grades to be assigned for Col2

score = np.array([1,2,3])

#and use search sorted

#it will generate the indices for where the values should be

#e.g if you have [1,4,5] then the position of 3 will be 1, since it is between 1 and 4

#and python has a zero based index notation

indices = np.searchsorted(benchmarks,df.Col2)

#create ur new column by indexing the score array with the indices

df['Col3'] = score[indices]

df = df.sort_index()

Col1 Col2 Col3

0 a 5 1

1 b 10 1

2 c 15 2

3 d 20 2

4 e 25 3

5 f 30 3

6 g 1 1

7 h 11 2

8 i 21 3

9 j 7 1

反对回复 2022-09-27

慕无忌1623718

TA贡献1744条经验获得超4个赞

你可以用 np.select（）漂亮而干净地做到这一点。我添加了一些<=，因为我猜你想更新所有值。但是，如果需要，它很容易编辑。

conditions = [(df['Col2'] > 0) & (df['Col2'] <= 10),

(df['Col2'] > 10) & (df['Col2'] <= 20),

(df['Col2'] > 20) & (df['Col2'] <= 30) ]

updates = [1, 2, 3]

df["Col3"] = np.select(conditions, updates, default=999)

使用原始范围将导致这种情况，其中值 == 10， 20， 30 从 np.select（）获取值 999。

conditions = [(df['Col2'] > 0) & (df['Col2'] < 10),

(df['Col2'] > 10.01) & (df['Col2'] < 20),

(df['Col2'] > 20.1) & (df['Col2'] < 30) ]

updates = [1, 2, 3]

df["Col3"] = np.select(conditions, updates, default=999)

print(df)

Col1 Col2 Col3

0 a 5 1

1 b 10 999

2 c 15 2

3 d 20 999

4 e 25 3

5 f 30 999

6 g 1 1

7 h 11 2

8 i 21 3

9 j 7 1

反对回复 2022-09-27

3 回答
0 关注
96 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

两个数字之间的列上的熊猫动作

两个数字之间的列上的熊猫动作

3 回答

添加回答