为了账号安全,请及时绑定邮箱和手机立即绑定

计算时间间隔内列值的平均值

计算时间间隔内列值的平均值

炎炎设计 2022-10-06 17:05:45
我有数据框                        id      timestamp               data    gradient        Starttimestamp                                       2020-01-15 06:12:49.213 40250   2020-01-15 06:12:49.213 20.0    0.00373         NaN 2020-01-15 06:12:49.313 40251   2020-01-15 06:12:49.313 19.5    0.00354         0.0 2020-01-15 08:05:10.083 40256   2020-01-15 08:05:10.083 20.0    0.00020         1.0 2020-01-15 08:05:10.183 40257   2020-01-15 08:05:10.183 20.5    -0.00440        0.0                            ...2020-01-31 09:01:50.993 40310   2020-01-31 09:01:50.993 21.0    0.55473         1.02020-01-31 09:01:51.093 40311   2020-01-31 09:01:51.093 21.5    0.00589         0.0                            ...我想找到data介于两者之间start_time ==1的平均值30 seconds。可重现的例子:d = {'timestamp':["2020-01-15 06:12:49.213", "2020-01-15 06:12:49.313", "2020-01-15 08:05:10.083", "2020-01-15 08:05:10.183", "2020-01-15 09:01:50.993", "2020-01-15 09:01:51.093", "2020-01-15 09:51:01.890", "2020-01-15 09:51:01.990", "2020-01-15 10:40:59.657", "2020-01-15 10:40:59.757", "2020-01-15 10:42:55.693", "2020-01-15 10:42:55.793", "2020-01-15 10:45:35.767", "2020-01-15 10:45:35.867", "2020-01-15 10:45:46.770", "2020-01-15 10:45:46.870", "2020-01-15 10:47:19.783", "2020-01-15 10:47:19.883", "2020-01-15 10:47:22.787"],'data': [20.0, 19.5, 20.0, 20.5, 21.0, 21.5, 22.0, 22.5, 23.0, 23.5, 23.0, 22.5, 23.0, 23.5, 24.0, 24.5, 25.0, 25.5, 26], 'gradient': [NaN, NaN, 0.000000, 0.000148, 0.000294, 0.000294, 0.000339, 0.000339, 0.000334, 0.000334, 0.000000, -0.008618, 0.000000, 0.006247, 0.090884, 0.090884, 0.010751, 0.010751, 0.332889],'Start': [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,]}df = pd.DataFrame(d)预期输出:start_time               end_time                   Average2020-01-15 08:05:10.083  2020-01-15 09:01:51.093    20.25  = average of (20.0, 20.5)2020-01-15 10:45:35.767  2020-01-15 10:45:35.767    23.75  = average of (23.0, 23.5, 24.0, 24.5)
查看完整描述

2 回答

?
凤凰求蛊

TA贡献1825条经验 获得超4个赞

首先获取timestamp每组GroupBy.transformGroupBy.first然后比较Series.between


df['timestamp'] = pd.to_datetime(df['timestamp'])

df['g'] = df['Start'].cumsum()


df1 = df[df['g'].ne(0)].copy()

#

s = df1.groupby('g')['timestamp'].transform('first')

df1 = df1[df1['timestamp'].between(s, s + pd.Timedelta(30, 's'))]

#

df2 = df1.groupby('g').agg(start_time=('timestamp','first'),

                           end_time=('timestamp','last'),

                           Average=('data','mean')).reset_index(drop=True)

print (df2)

               start_time                end_time  Average

0 2020-01-15 08:05:10.083 2020-01-15 08:05:10.183    20.25

1 2020-01-15 10:45:35.767 2020-01-15 10:45:46.870    23.75


查看完整回答
反对 回复 2022-10-06
?
喵喵时光机

TA贡献1846条经验 获得超7个赞

试试这个代码。


df['timestamp'] = pd.to_datetime(df['timestamp'])


start_time_list = []

end_time_list = []

average_list = []


for start_ind in df[df['Start'] == 1].index:   

    end_ind = np.where(df['timestamp'] <= df.iloc[start_ind]['timestamp'] + pd.to_timedelta(30, unit = 's'))[0][-1] + 1    

    average = df['data'].iloc[start_ind:end_ind].mean()


    start_time_list.append(df.iloc[start_ind]['timestamp'])

    end_time_list.append(df.iloc[end_ind]['timestamp'])

    average_list.append(average)


output = pd.DataFrame({"start_time":start_time_list,

                       "end_time":end_time_list,

                       "average":average_list})


查看完整回答
反对 回复 2022-10-06
  • 2 回答
  • 0 关注
  • 101 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信