我有一个需要重新采样的数据集。为此,我需要按天对其进行分组,同时计算每个传感器的中值。我正在使用该window函数,但是它只返回一个样本。这是数据集:+--------+-------------+-------------------+------+------------------+|Variable| Sensor Name| Timestamp| Units| Value|+--------+-------------+-------------------+------+------------------+| NO2|aq_monitor914|2018-10-07 23:15:00|ugm -3|0.9945200000000001|| NO2|aq_monitor914|2018-10-07 23:30:00|ugm -3|1.1449200000000002|| NO2|aq_monitor914|2018-10-07 23:45:00|ugm -3| 1.13176|| NO2|aq_monitor914|2018-10-08 00:00:00|ugm -3| 0.9212|| NO2|aq_monitor914|2018-10-08 00:15:00|ugm -3| 1.39872|| NO2|aq_monitor914|2018-10-08 00:30:00|ugm -3| 1.51528|| NO2|aq_monitor914|2018-10-08 00:45:00|ugm -3| 1.61116|| NO2|aq_monitor914|2018-10-08 01:00:00|ugm -3| 1.59612|| NO2|aq_monitor914|2018-10-08 01:15:00|ugm -3| 1.12612|| NO2|aq_monitor914|2018-10-08 01:30:00|ugm -3| 1.04528|+--------+-------------+-------------------+------+------------------+我需要按天重新采样,计算每一天“值”列的中位数。我正在使用以下代码来做到这一点:magic_percentile = psf.expr('percentile_approx(Value, 0.5)') #Calculates median of the 'Value' column data = data.groupby('Variable','Sensor Name',window('Timestamp', "1 day")).agg(magic_percentile.alias('Value')但是,这是问题所在,这只会返回以下 DataFrame:+--------+-------------+--------------------+-------+|Variable| Sensor Name| window| Value|+--------+-------------+--------------------+-------+| NO2|aq_monitor914|[2018-10-07 21:00...|1.13176|+--------+-------------+--------------------+-------+详细说明“窗口”列:window=Row(start=datetime.datetime(2018, 10, 7, 21, 0), end=datetime.datetime(2018, 10, 8, 21, 0))在我的理解中window,它应该为当前时间戳创建一个一天的窗口,例如: 2018-10-07 23:15:00 应该变成: 2018-10-07 并按变量、传感器名称和当天对传感器进行分组,然后计算它的中位数。我真的很困惑如何做到这一点。
添加回答
举报
0/150
提交
取消