我有一个大数据框的问题。这是一个小片段。我想用最大值填充最后一列 E,如果有一些值或让它为空。那就是数据:d = {'A': [4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074, 4000074], 'B': ['SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746','SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746', 'SP000796746'], 'C': [201926, 201926, 201926, 201926, 201926, 201926, 201909,201909, 201909, 201909, 201909, 201909, 201933, 201933, 201933, 201933, 201933, 201933], 'D': [-1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4, -1, 0, 1, 2, 3, 4], 'E': [np.nan, 1000, 1000, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 3000, 3000, np.nan]}它看起来像这样: A B C D E0 4000074 SP000796746 201926 -1 NaN1 4000074 SP000796746 201926 0 1000.02 4000074 SP000796746 201926 1 1000.03 4000074 SP000796746 201926 2 NaN4 4000074 SP000796746 201926 3 NaN5 4000074 SP000796746 201926 4 NaN6 4000074 SP000796746 201909 -1 NaN7 4000074 SP000796746 201909 0 NaN8 4000074 SP000796746 201909 1 NaN9 4000074 SP000796746 201909 2 NaN10 4000074 SP000796746 201909 3 NaN11 4000074 SP000796746 201909 4 NaN12 4000074 SP000796746 201933 -1 NaN13 4000074 SP000796746 201933 0 NaN14 4000074 SP000796746 201933 1 NaN15 4000074 SP000796746 201933 2 3000.016 4000074 SP000796746 201933 3 3000.017 4000074 SP000796746 201933 4 NaN
1 回答
料青山看我应如是
TA贡献1772条经验 获得超8个赞
您可以使用D 列中的新 -1 和 中的组groupby.transform来完成。然后是原来的专栏。maxcumsumfillna
df['E'] = df['E'].fillna(df['E'].groupby(df['D'].eq(-1).cumsum()).transform('max'))
编辑:用零填充,你可以这样做:
mask = df['E'].groupby(df['D'].eq(-1).cumsum()).transform('any')
df.loc[mask, 'E'] = df.loc[mask, 'E'].fillna(0)
添加回答
举报
0/150
提交
取消