2 回答

TA贡献1796条经验 获得超7个赞
如果我理解正确,如果可以将其team_stats_1970_2017作为pandas数据框,则可以应用2个合并:一个在home_team和上season_yr,一个在visitor_team和上season_yr:
merged_df = (game_df.merge(team_stats_1970_2017,
left_on=['home_team', 'season_yr'],
right_on=['team', 'season_yr'])
.merge(team_stats_1970_2017, left_on=['visitor_team', 'season_yr'],
right_on=['team', 'season_yr'],
suffixes=['_home', '_visitor'])
.drop(['team_visitor', 'team_home'], axis=1))
>>> merged_df
season_yr home_team visitor_team home_team_runs visitor_team_runs \
0 2017 ARI SFG 6 5
1 2017 ARI SFG 4 8
2 2017 ARI SFG 8 6
3 2017 ARI SFG 9 3
4 2017 ARI CLE 7 3
5 2017 ARI CLE 11 2
6 2017 ATL LAD 2 3
r_per_g_home pa_home ab_home b_r_home b_h_home ... b3_home \
0 5.01 6224.0 5525 812 1405 ... 39
1 5.01 6224.0 5525 812 1405 ... 39
2 5.01 6224.0 5525 812 1405 ... 39
3 5.01 6224.0 5525 812 1405 ... 39
4 5.01 6224.0 5525 812 1405 ... 39
5 5.01 6224.0 5525 812 1405 ... 39
6 4.52 6216.0 5584 732 1467 ... 26
b_hr_home r_per_g_visitor pa_visitor ab_visitor b_r_visitor \
0 220 3.94 6137.0 5551 639
1 220 3.94 6137.0 5551 639
2 220 3.94 6137.0 5551 639
3 220 3.94 6137.0 5551 639
4 220 5.05 6234.0 5511 818
5 220 5.05 6234.0 5511 818
6 165 4.75 6191.0 5408 770
b_h_visitor b2_visitor b3_visitor b_hr_visitor
0 1382 290 28 128
1 1382 290 28 128
2 1382 290 28 128
3 1382 290 28 128
4 1449 333 29 212
5 1449 333 29 212
6 1347 312 20 221
[7 rows x 21 columns]
然后,您可以使用它merged_df来计算特征。例如(因为它似乎你希望你的特点np.arrays),计算之间的差异pa_home和pa_visitor(这仅仅是一个虚拟的例子):
>>> (merged_df['pa_home'] - merged_df['pa_visitor']).values
array([ 87., 87., 87., 87., -10., -10., 25.])
添加回答
举报