1 回答
TA贡献2003条经验 获得超2个赞
我认为最好的方法是首先旋转数据框,这样每个传感器都有一个时间序列列:
df.pivot(columns="location", values="temperature")
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:56 23.54 NaN NaN NaN
2019-08-22 21:29:44 NaN 23.33 NaN NaN
2019-08-22 21:29:53 23.40 NaN NaN NaN
2019-08-23 22:21:06 NaN NaN 25.0 NaN
2019-08-23 22:21:33 NaN NaN NaN 24.12
然后你可以用插值法填充缺失的数据
df.pivot(columns="location", values="temperature").interpolate(method="time", limit_direction="both")
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:56 23.540000 23.33 25.0 24.12
2019-08-22 21:29:44 23.422105 23.33 25.0 24.12
2019-08-22 21:29:53 23.400000 23.33 25.0 24.12
2019-08-23 22:21:06 23.400000 23.33 25.0 24.12
2019-08-23 22:21:33 23.400000 23.33 25.0 24.12
现在你应该让所有数据点在时间上对齐,你可以重新采样到一个恒定的采样率,比方说“1 分钟”
df.pivot(columns="location", values="temperature").interpolate(method="time", limit_direction="both").resample("1 min").mean()
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:00 23.540000 23.33 25.0 24.12
2019-08-22 21:29:00 23.411053 23.33 25.0 24.12
2019-08-22 21:30:00 NaN NaN NaN NaN
2019-08-22 21:31:00 NaN NaN NaN NaN
2019-08-22 21:32:00 NaN NaN NaN NaN
... ... ... ... ...
2019-08-23 22:17:00 NaN NaN NaN NaN
2019-08-23 22:18:00 NaN NaN NaN NaN
2019-08-23 22:19:00 NaN NaN NaN NaN
2019-08-23 22:20:00 NaN NaN NaN NaN
2019-08-23 22:21:00 23.400000 23.33 25.0 24.12
你显然有很多丢失的数据,采样间隔这么小,数据点稀疏,我猜你的实际数据集中有更多(理想情况下,你希望在每个重采样间隔中至少有一个数据点)。
现在由您和您的实际数据决定如何进行。.nearest()您可以使用而不是填充缺失的数据.mean()。如果缺少的项目只是少数,您可以用滚动平均值填充它们。
添加回答
举报