为了账号安全,请及时绑定邮箱和手机立即绑定

如何在不计算 Python 中的重复值的情况下进行分组和求和

如何在不计算 Python 中的重复值的情况下进行分组和求和

慕盖茨4494581 2022-01-11 17:03:49
我想将时间格式从 12:45 更改为日期时间格式,同时保持该格式并计算活动的时间差(结果为 activity_duration)。其次,我想总结由 activity_station 分组的 activity_duration我将时间更改为日期时间格式,但我得到随机的年、月、日等。我知道如何分组,但不知道如何在应用分组时消除重复项。df = pd.DataFrame({     'Shift_id' :[ 123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,                345,345,345,345,345,345,345,345,345,345,345,345,345,345,345,345],    'activity_id' : [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,                    6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9],    'activity_begin_time' : ['09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25',                            '09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30'],    'activity_end_time' : ['09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30',                          '09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25'],    'activity_station' : ['None', 'Za','None','Ba','None','None', 'Za','None','Ba','None','None', 'Za','None','Ba','None',                         'None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra']})df['activity_begin_time'] = pd.to_datetime(df['activity_begin_time'])df['activity_end_time'] = pd.to_datetime(df['activity_end_time'])df['activity_duration'] = df['activity_end_time'] - df['activity_begin_time']df['activity_duration'] = df['activity_duration']/np.timedelta64(1,'h')我想对由 activity_station 分组的 acitivity_duration 求和,同时消除重复值
查看完整描述

1 回答

?
慕码人8056858

TA贡献1803条经验 获得超6个赞

这是我的解决方案:


df = pd.DataFrame({ 

    'Shift_id' :[ 123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,

                345,345,345,345,345,345,345,345,345,345,345,345,345,345,345,345],

    'activity_id' : [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,

                    6,7,8,9,6,7,8,9,6,7,8,9,6,7,8,9],

    'activity_begin_time' : ['09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25','09:00','09:05','12:00','12:30','17:25',

                            '09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30','09:00','09:05','12:00','12:30'],

    'activity_end_time' : ['09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30','09:05','12:00','12:30', '17:25','17:30',

                          '09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25','09:05','12:00','12:30', '17:25'],

    'activity_station' : ['None', 'Za','None','Ba','None','None', 'Za','None','Ba','None','None', 'Za','None','Ba','None',

                         'None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra','None','Za','Ba','Ra']

})

首先删除重复项:


df = df.drop_duplicates()

使用pandas.to_timedelta:


df['activity_begin_time'] = pd.to_timedelta(df['activity_begin_time']+':00')

df['activity_end_time'] = pd.to_timedelta(df['activity_end_time']+':00')

df['activity_duration'] = df['activity_end_time'] - df['activity_begin_time']

然后您可以通过groupby使用每列的特定聚合:


df.groupby('activity_station').agg({'activity_duration': np.sum})

产生:


                   activity_duration

activity_station    

Ba                 05:25:00

None               00:45:00

Ra                 04:55:00

Za                 05:50:00


查看完整回答
反对 回复 2022-01-11
  • 1 回答
  • 0 关注
  • 150 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信