1 回答

TA贡献1804条经验 获得超3个赞
首先你需要转换Date成datetime类型:
df['Date'] = pd.to_datetime(df['Date'])
之后,可以提取Anchor Datewith的索引idxmax,然后用于loc提取实际日期:
idx = df['Anchor Date'].eq('Y').groupby(df['ID']).transform('idxmax')
df['Diff'] = (df['Date'] - df.loc[idx, 'Date'].values) / np.timedelta64(1, 'D')
另一种方法是使用布尔索引提取那些日期,然后映射:
anchor_dates = df.loc[df['Anchor Date']=='Y', ['ID','Date']].set_index('ID')['Date']
df['Diff'] = (df['Date'] - anchor_dates)/np.timedelta64(1, 'D')
输出:
ID Date Anchor Date Diff
0 123 2018-01-05 N -123.0
1 123 2018-04-10 N -28.0
2 123 2018-05-08 Y 0.0
3 123 2018-10-12 N 157.0
4 234 2018-01-04 N 0.0
5 234 2018-01-04 N 0.0
6 234 2018-01-04 Y 0.0
7 456 2018-05-06 N -26.0
8 456 2018-05-06 N -26.0
9 456 2018-05-10 N -22.0
10 456 2018-06-01 Y 0.0
11 567 2018-03-02 N 0.0
12 567 2018-03-02 N 0.0
13 567 2018-03-02 Y 0.0
添加回答
举报