2 回答
TA贡献1824条经验 获得超6个赞
我会这样做:
df = {'Request': [0, 0, 1, 0, 1, 0, 0],
'Time': ['16:00', '17:00', '18:00', '19:00', '20:00', '20:30', '24:00'],
'grant': [3, 0, 0, 5, 0, 0, 5]}
df = pd.DataFrame(df) #create DataFrame
#get rid of any rows have neither a grant nor request
df = df[(df[['grant', 'Request']].T != 0).any()]
#change the time in HH:MM to number of minutes
df['Time'] = df['Time'].str.split(":").apply(lambda x: int(x[0])*60 + int(x[1]))
#get the difference between those times
df['timeElapsed'] = df['Time'].diff()
#filter out the requests to only get the grants and their times.
#Also, drop the NA from the first line.
df = df[(df[['grant']].T != 0).any()].dropna()
#drop all columns except timeElapsed and Grant
df = df[['timeElapsed', 'grant']]
那么输出看起来像这样,timeElaped 以分钟为单位:
timeElapsed grant
3 60.0 5
6 240.0 5
TA贡献1784条经验 获得超2个赞
您首先需要将您的Time索引转换为可减去的东西以找到时间增量。使用pd.to_timestamp不起作用,因为没有24:00. 下面的解决方案使用十进制时间(1:30PM = 13.5):
# Convert the index into decimal time
df.index = pd.to_timedelta(df.index + ':00') / pd.Timedelta(hours=1)
# Get time when each request was made
r = df[df['Request'] != 0].index.to_series()
# Get time where each grant was made
g = df[df['grant'] != 0].index.to_series()
# `asof` mean "get the last available value in `r` as the in `g.index`
tmp = r.asof(g)
df['Delta'] = tmp.index - tmp
结果:
Request grant Delta
Time
16.0 0 3 NaN
17.0 0 0 NaN
18.0 1 0 NaN
19.0 0 5 1.0
20.0 1 0 NaN
20.5 0 0 NaN
24.0 0 5 4.0
添加回答
举报