1 回答
TA贡献1803条经验 获得超6个赞
这是我的解决方案:
df = pd.DataFrame({
'Shift_id' :[ 123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,
345,345,345,345,345,345,345,345,345,345,345,345,345,345,345,345],
'activity_id' : [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,
6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9],
'activity_begin_time' : ['09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25',
'09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30'],
'activity_end_time' : ['09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30',
'09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25'],
'activity_station' : ['None', 'Za','None','Ba','None','None', 'Za','None','Ba','None','None', 'Za','None','Ba','None',
'None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra']
})
首先删除重复项:
df = df.drop_duplicates()
使用pandas.to_timedelta:
df['activity_begin_time'] = pd.to_timedelta(df['activity_begin_time']+':00')
df['activity_end_time'] = pd.to_timedelta(df['activity_end_time']+':00')
df['activity_duration'] = df['activity_end_time'] - df['activity_begin_time']
然后您可以通过groupby使用每列的特定聚合:
df.groupby('activity_station').agg({'activity_duration': np.sum})
产生:
activity_duration
activity_station
Ba 05:25:00
None 00:45:00
Ra 04:55:00
Za 05:50:00
添加回答
举报