循环遍历 DataFrame 中的行子集

我尝试使用函数计算系列中最频繁的元素来循环 DataFrame 的低谷行。当我手动向其中提供一个系列时，该功能可以完美运行：# Create DataFramedf = pd.DataFrame({'a' : [1, 2, 1, 2, 1, 2, 1, 1], 'b' : [1, 1, 2, 1, 1, 1, 2, 2], 'c' : [1, 2, 2, 1, 2, 2, 2, 1]})# Create function calculating most frequent elementfrom collections import Counterdef freq_value(series): return Counter(series).most_common()[0][0]# Test function on one rowfreq_value(df.iloc[1])# Another testfreq_value((df.iloc[1, 0], df.iloc[1, 1], df.iloc[1, 2]))通过这两个测试，我得到了想要的结果。但是，当我尝试通过 DataFrame 行在循环中应用此函数并将结果保存到新列中时，出现错误"'Series' object is not callable", 'occurred at index 0'。产生错误的行如下：# Loop trough rows of a dataframe and write the result into new columndf['result'] = df.apply(lambda row: freq_value((row('a'), row('b'), row('c'))), axis = 1)row()在apply()函数中究竟是如何工作的？它不应该freq_value()从“a”、“b”、“c”列提供给我的函数值吗？

查看完整描述

3 回答

一只名叫tom的猫

TA贡献1906条经验获得超3个赞

row不是您的函数lambda，因此括号不合适，相反，您应该使用__getitem__方法或loc访问器来访问值。前者的语法糖是[]：

df['result'] = df.apply(lambda row: freq_value((row['a'], row['b'], row['c'])), axis=1)

使用loc替代方案：

def freq_value_calc(row):

return freq_value((row.loc['a'], row.loc['b'], row.loc['c']))

要准确理解为什么会出现这种情况，将您lambda的函数重写为命名函数会有所帮助：

def freq_value_calc(row):

print(type(row)) # useful for debugging

return freq_value((row['a'], row['b'], row['c']))

df['result'] = df.apply(freq_value_calc, axis=1)

运行这个，你会发现它row的类型是<class 'pandas.core.series.Series'>，即如果你使用axis=1. 要访问给定标签的系列中的值，您可以使用__getitem__/[]语法或loc.

反对回复 2021-06-15

眼眸繁星

TA贡献1873条经验获得超9个赞

您也可以使用df.mode, 和来获得所需的结果axis=1。这将避免使用apply, 并且仍然会为您提供每行最常见值的列。

df['result'] = df.mode(1)

>>> df

a b c result

0 1 1 1 1

1 2 1 2 2

2 1 2 2 2

3 2 1 1 1

4 1 1 2 1

5 2 1 2 2

6 1 2 2 2

7 1 2 1 1

反对回复 2021-06-15

繁花如伊

TA贡献2012条经验获得超12个赞

df['CommonValue'] = df.apply(lambda x: x.mode()[0], axis = 1)

反对回复 2021-06-15

热搜

最近搜索清空

循环遍历 DataFrame 中的行子集

循环遍历 DataFrame 中的行子集

3 回答

添加回答