在不使用 apply 的情况下在数据帧上使用滚动时遇到问题，这很慢

我有一个数据框，如下所示：ID Date Prize IfWon1 01-01-20 5 12 01-01-20 8 11 01-03-20 3 01 01-04-20 10 11 01-07-20 5 02 01-10-20 5 13 01-10-20 10 1我想添加一个新列，对于给定的 ID，该列将包括他们在该日期之前 7 天内赢得的所有奖金的总和，但不包括该日期。目标是拥有一个如下所示的数据框：ID Date Prize IfWon PrevWon1 01-01-20 5 1 02 01-01-20 8 1 01 01-03-20 3 0 51 01-04-20 10 1 51 01-07-20 5 0 152 01-10-20 5 1 03 01-10-20 10 1 0我必须执行的代码如下，它可以工作，但我有两个问题：def get_rolling_prize_sum(grp, freq): return grp.rolling(freq, on = 'Date', closed = 'right')['CurrentWon'].sum()processed_data_df['CurrentWon'] = processed_data_df['Prize'] * processed_data_df['IfWon'] # gets deleted laterprocessed_data_df['PrevWon'] = processed_data_df.groupby('ID', group_keys=False).apply(get_rolling_prize_sum, '7D').astype(float) - processed_data_df['CurrentWon']因为我不想包括当天的奖品，所以我试图关闭右侧的滚动，但这不起作用（例如，取出上面的 close = 'right' 会做完全相同的事情）。因此，我最终在最后一行进行了减法。我使用的实际数据库很大，我需要在不同的点进行许多滚动求和，但它的速度非常慢。有人告诉我，我可以在不使用 .apply 的情况下直接使用 .rolling 来完成此操作，但我无法使其正常工作。我的尝试如下，有错误，我会注意到该错误花了几分钟才产生，这是唯一重要的计算，所以看起来好像它正在执行其中的一部分，然后稍后失败：# Not using closed right here, just subtractingprocessed_data_df['PrevWon'] = processed_data_df.groupby('ID', group_keys=False).rolling('7D', on = 'Date')['CurrentWon'].sum() - processed_data_df['CurrentWon']ValueError: cannot join with no overlapping index names有任何想法吗？

查看完整描述

1 回答

慕码人8056858

TA贡献1803条经验获得超6个赞

改进了之前的答案并设法解决了 groupby 的排序问题

df = pd.read_csv("data.csv")

df["Date"] = pd.to_datetime(df['Date'], format='%m-%d-%y')

df["CurrentWon"] = df["Prize"] * df["IfWon"]

result = df.groupby("ID").rolling("7D", on = 'Date', closed = 'right').CurrentWon.sum().reset_index()

result.rename(columns={"CurrentWon": "PreviousWon"}, inplace=True)

df = df.merge(result, on=["ID", "Date"])

df["PreviousWon"] -= df["CurrentWon"]

print(df)

输出：

ID Date Prize IfWon CurrentWon PreviousWon

0 1 2020-01-01 5 1 5 0.0

1 2 2020-01-01 8 1 8 0.0

2 1 2020-01-03 3 0 0 5.0

3 1 2020-01-04 10 1 10 5.0

4 1 2020-01-07 5 0 0 15.0

5 2 2020-01-10 5 1 5 0.0

6 3 2020-01-10 10 1 10 0.0

反对回复 2023-10-25

热搜

最近搜索清空

在不使用 apply 的情况下在数据帧上使用滚动时遇到问题，这很慢

在不使用 apply 的情况下在数据帧上使用滚动时遇到问题，这很慢

1 回答

添加回答