首页猿问在数据框的特定列上应用函数

在数据框的特定列上应用函数

Python

BIG阳 2022-06-02 15:19:06

def include_mean(): if pd.isnull('Age'): if 'Pclass'==1: return 38 elif 'Pclass'==2: return 30 elif 'Pclass'==3: return 25 else: return 'Age'train['Age']=train[['Age','Pclass']].apply(include_mean(),axis=1)为什么上面的代码给我一个类型错误。 TypeError: ("'NoneType' object is not callable", 'occurred at index 0')我现在知道正确的代码是def impute_age(cols): Age = cols[0] Pclass = cols[1] if pd.isnull(Age):if Pclass == 1: return 37elif Pclass == 2: return 29else: return 24else: return Agetrain['Age'] = train[['Age','Pclass']].apply(impute_age,axis=1)现在我想知道为什么需要进行更改，即更改背后的确切原因。'cols' 在这里做什么。

查看完整描述

3 回答

慕雪6442864

TA贡献1812条经验获得超5个赞

sttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

当您在applypanda 数据帧上使用该方法时，您传递给 apply 的函数会在每一列（或每一行，取决于axis默认为0列轴的参数）上调用。因此，您的函数必须具有apply将传递给它的行的参数。

def include_mean():

if pd.isnull('Age'):

if 'Pclass'==1:

return 38

elif 'Pclass'==2:

return 30

elif 'Pclass'==3:

return 25

else: return 'Age'

这有几个问题。

'Pclass'==1:保证为False，因为您正在比较一个字符串( 'Pclass') 和一个整数( 1)，它们不能相等。您想要的是比较Pclass列条目的值，您可以通过索引列来检索它：col["Pclass"]，或者col[1]如果Pclass是第二列。
如果pd.isnull('Age')是False，则函数返回None。由于字符串'Age'不为空，因此应该始终如此。当你这样做时d.apply(include_mean())，你正在调用include_mean，它返回None，然后将该值传递给apply. 但apply需要一个可调用的（例如一个函数）。
在else子句中，您将返回 string 'Age'。'Age'这意味着您的数据框将在某些单元格中具有值。

您的第二个示例解决了这些问题： impute_age 函数现在为 row( ) 提供一个参数，查找和比较colstheAge和列的值，然后您传递该函数而不调用该方法。Pclassapply

反对回复 2022-06-02

翻翻过去那场雪

TA贡献2065条经验获得超14个赞

欢迎来到 Python。要回答您的问题，尤其是在开始阶段，有时您只需要打开一个新的 IPython 笔记本并尝试一下：

In [1]: import pandas as pd

...: def function(x):

...: return x+1

...:

...: df = pd.DataFrame({'values':range(10)})

...: print(df)

...:

values

0 0

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

In [2]: print(df.apply(function))

values

0 1

1 2

2 3

3 4

4 5

5 6

6 7

7 8

8 9

9 10

在您的问题中，cols是您循环的每一行的值。

反对回复 2022-06-02

墨色风雨

TA贡献1853条经验获得超6个赞

不要使用apply(axis=1). 相反，您应该使用.loc. 这是顶壳的简单映射。

m = train.Age.isnull()

d = {1: 38, 2: 30, 3: 25}

train.loc[m, 'Age'] = train.loc[m, 'Pclass'].map(d)

对于底部情况，因为else我们可以使用np.select. 它的工作方式是我们创建一个条件列表，它遵循 if、elif else 逻辑的顺序。然后我们提供一个选择列表，当我们遇到第一个时可以从中选择True。由于您有嵌套逻辑，我们需要首先取消嵌套它，以便它在逻辑上读作

if age is null and pclass == 1

elif age is null and pclass == 2

elif age is null

else

样本数据

import pandas as pd

import numpy as np

df = pd.DataFrame({'Age': [50, 60, 70, np.NaN, np.NaN, np.NaN, np.NaN],

'Pclass': [1, 1, 1, 1, 2, np.NaN, 1]})

# Age Pclass

#0 50.0 1.0

#1 60.0 1.0

#2 70.0 1.0

#3 NaN 1.0

#4 NaN 2.0

#5 NaN NaN

#6 NaN 1.0

m = df.Age.isnull()

conds = [m & df.Pclass.eq(1),

m & df.Pclass.eq(2),

choices = [37, 29, 24]

df['Age'] = np.select(conds, choices, default=df.Age)

# |

# Takes care of else, i.e. Age not null

print(df)

# Age Pclass

#0 50.0 1.0

#1 60.0 1.0

#2 70.0 1.0

#3 37.0 1.0

#4 29.0 2.0

#5 24.0 NaN

#6 37.0 1.0

反对回复 2022-06-02

3 回答
0 关注
171 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

在数据框的特定列上应用函数

在数据框的特定列上应用函数

3 回答

添加回答