使用 apply() 两次创建新列会导致新列被覆盖

我已经编写了一些与这个玩具示例等效的 Pandas 代码：df_test = pd.DataFrame({'product': [0, 0, 1, 1], 'sold_for': [5000, 4500, 10000, 8000]})def product0_makes_profit(row, product0_cost): return row['sold_for'] > product0_costdef product1_makes_profit(row, product1_cost): return row['sold_for'] > product1_costdf_test['made_profit'] = df_test[df_test['product']==0].apply(product0_makes_profit, args=[4000], axis=1, result_type="expand")df_test['made_profit'] = df_test[df_test['product']==1].apply(product1_makes_profit, args=[9000], axis=1, result_type="expand")df_test我得到以下结果： product sold_for made_profit0 0 5000 NaN1 0 4500 NaN2 1 10000 True3 1 8000 False我希望第 0 行和第 1 行的“made_profit”列是 True，而不是 NaN，但显然第二个 apply() 覆盖了由第一个 apply() 生成的 made_profit 列。我怎样才能得到我期望的列？我不想在第一个 apply() 中创建一个“product0_made_profit”列，在第二个 apply() 中创建一个“product1_made_profit”列，所以我可以将这两列合并到我想要获得的一个“made_profit”列中，因为在我的实际代码中，我在产品列中有很多不同的值（意味着要应用很多不同的功能）。

查看完整描述

1 回答

米琪卡哇伊

TA贡献1998条经验获得超6个赞

您需要分配给具有相同条件的过滤行loc，因此如果条件为，则仅处理行True：

m1 = df_test['product']==0

m2 = df_test['product']==1

df_test.loc[m1, 'made_profit'] = df_test[m1].apply(product0_makes_profit, args=[4000], axis=1, result_type="expand")

df_test.loc[m2, 'made_profit'] = df_test[m2].apply(product1_makes_profit, args=[9000], axis=1, result_type="expand")

print (df_test)

product sold_for made_profit

0 0 5000 True

1 0 4500 True

2 1 10000 True

3 1 8000 False

编辑：

如果返回多个值，function需要Series通过新列名返回索引，还需要创建新列，填充一些默认值（例如NaN）之前loc：

cols = ['made_profit', 'profit_amount']

def product0_makes_profit(row, product0_cost):

return pd.Series([row['sold_for'] > product0_cost, row['sold_for'] - product0_cost], index=cols)

def product1_makes_profit(row, product1_cost):

return pd.Series([row['sold_for'] > product1_cost, row['sold_for'] - product1_cost], index=cols)

for c in cols:

df_test[c] = np.nan

is_prod0 = (df_test['product']==0)

df_test.loc[is_prod0, cols] = df_test[is_prod0].apply(product0_makes_profit, args=[4000], axis=1, result_type="expand")

is_prod1 = (df_test['product']==1)

df_test.loc[is_prod1, cols] = df_test[is_prod1].apply(product1_makes_profit, args=[9000], axis=1, result_type="expand")

print(df_test)

product sold_for made_profit profit_amount

0 0 5000 True 1000.0

1 0 4500 True 500.0

2 1 10000 True 1000.0

3 1 8000 False -1000.0

反对回复 2021-11-09

热搜

最近搜索清空

使用 apply() 两次创建新列会导致新列被覆盖

使用 apply() 两次创建新列会导致新列被覆盖

1 回答

添加回答