首页猿问 Pandas 滚动函数加值

Pandas 滚动函数加值

Python

FFIVE 2022-05-24 12:55:22

我有一个非常标准的函数，它似乎会产生非常奇怪的响应；我以为我已经弄清楚发生了什么，但现在我不太确定。本质上，我想使用滚动函数来创建之前两个值的简单滚动平均值。当我直接执行此操作时，它似乎从框架中的其他位置提取第一个数字的值，而当我在循环中执行此操作时，我不知道它来自哪里。样本数据：player game_id game_order TOI_comp G_compA.J..GREER 2016020227 37 16.566667 02016020251 36 11.733333 02016020268 35 12.700000 02016020278 34 15.433333 02016020296 33 11.850000 0player_avgs_base.sort_values(by=['player','game_order'],ascending=False, inplace=True)avgtoi = player_avgs_base["TOI_comp"].rolling(2).mean().shift()avgtoiplayer game_id game_orderZENON.KONOPKA 2013021047 2 NaNA.J..GREER 2016020268 35 NaN 2016020278 34 9.308333 2016020296 33 14.066667 2017020134 32 13.641667 2017020149 31 10.108333 2017020165 30 7.175000 2017020194 29 6.100000我本来希望更像player game_id game_order A.J..GREER 2016020251 36 NaN 2016020268 35 NaN 2016020278 34 12.22 2016020296 33 14.066667 2017020134 32 13.641667 2017020149 31 10.108333

查看完整描述

1 回答

largeQ

TA贡献2039条经验获得超7个赞

我认为这是一个排序问题。如果这能解决您的问题，请您尝试一下：

player_avgs_base.sort_values(["player","game_order"], ascending=False, inplace=True)

如果愿意，您可以在执行排序后设置索引。

另一点是，对于您的代码，滚动不尊重分组。我猜你想计算每个玩家的滚动总和，对，而不是混合其他玩家的价值。如果是这样，您可以使用以下代码：

df2= df.sort_values(["player",'game_id',"game_order"])

df2['TOI_comp_avg_lt']= df2.groupby('player')['TOI_comp'].apply(lambda ser: ser.rolling(2).mean().shift())

这输出：

player game_id game_order TOI_comp G_comp TOI_comp_avg_lt

0 A.J..GREER 2016020227 37 16.566667 0 NaN

2 A.J..GREER 2016020251 36 11.733333 0 NaN

4 A.J..GREER 2016020268 35 12.700000 0 14.150000

6 A.J..GREER 2016020278 34 15.433333 0 12.216666

7 A.J..GREER 2016020296 33 11.850000 0 14.066666

1 ZENON.KONOPKA 2013021047 34 12.666666 0 NaN

5 ZENON.KONOPKA 2013021047 35 14.722222 0 NaN

3 ZENON.KONOPKA 2013021047 37 13.111111 0 13.694444

对于以下测试数据：

import pandas as pd

import io

raw= """A.J..GREER 2016020227 37 16.566667 0

ZENON.KONOPKA 2013021047 34 12.666666 0

A.J..GREER 2016020251 36 11.733333 0

ZENON.KONOPKA 2013021047 37 13.111111 0

A.J..GREER 2016020268 35 12.700000 0

ZENON.KONOPKA 2013021047 35 14.722222 0

A.J..GREER 2016020278 34 15.433333 0

A.J..GREER 2016020296 33 11.850000 0"""

df= pd.read_csv(io.StringIO(raw), sep='\s+', names=['player', 'game_id', 'game_order', 'TOI_comp', 'G_comp'])

顺便提一句。你set_index的不能替代排序。该索引对输出没有影响。例如，如果您df按照上面的定义使用并执行：

df_indexed= df.set_index(["player",'game_id',"game_order"])

df_indexed_result= df_indexed.copy()

df_indexed_result['TOI_comp_shifted']= df_indexed["TOI_comp"].shift()

df_indexed_result['TOI_comp_rolling_mean']= df_indexed["TOI_comp"].rolling(2).mean().shift()

你得到：

TOI_comp G_comp TOI_comp_shifted TOI_comp_rolling_mean

player game_id game_order

A.J..GREER 2016020227 37 16.566667 0 NaN NaN

ZENON.KONOPKA 2013021047 34 12.666666 0 16.566667 NaN

A.J..GREER 2016020251 36 11.733333 0 12.666666 14.616667

ZENON.KONOPKA 2013021047 37 13.111111 0 11.733333 12.200000

A.J..GREER 2016020268 35 12.700000 0 13.111111 12.422222

ZENON.KONOPKA 2013021047 35 14.722222 0 12.700000 12.905555

A.J..GREER 2016020278 34 15.433333 0 14.722222 13.711111

2016020296 33 11.850000 0 15.433333 15.077777

如果您查看该TOI_comp_shifted列，您会发现它只是填充了前一列的值，无论player它属于哪一列（滚动平均值也是如此）。所以索引对这个操作没有影响。

对于你的第二个问题。我认为循环应该像这样工作，如果你的数据框的列名是好的：

group_obj= df2.groupby('player')

for col in ['TOI_comp', 'G_comp']:

df2[f'{col}_lt']= group_obj[col].apply(lambda ser: ser.rolling(2).mean().shift())

假设您想以相同的方式将滚动平均值应用于列列表。

反对回复 2022-05-24

1 回答
0 关注
139 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Pandas 滚动函数加值

Pandas 滚动函数加值

1 回答

添加回答