1 回答
TA贡献1859条经验 获得超6个赞
我认为在这里使用 merge 或 join 会更容易。
数据
import pandas as pd
import dask.dataframe as dd
diz_df = {'building_id': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
'time': {0: '2016-01-01 00:15:00',
1: '2016-05-17 22:15:00',
2: '2016-10-21 13:45:00',
3: '2016-12-26 02:45:00',
4: '2016-10-21 14:00:00'},
'electricity_cooling_kwh': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'electricity_heating_kwh': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
'total_site_electricity_kwh': {0: 4.082225,
1: 5.627103,
2: 21.547435,
3: 4.082225,
4: 21.547435},
'iso_zone': {0: 'MISO-E', 1: 'MISO-E', 2: 'MISO-E', 3: 'MISO-E', 4: 'MISO-E'}}
diz_filter = {'iso_zone': {0: 'MISO-E',
1: 'MISO-E',
2: 'MISO-E',
3: 'CAISO',
4: 'CAISO',
5: 'CAISO'},
'time': {0: '2016-05-17 22:15:00',
1: '2016-10-21 13:45:00',
2: '2016-12-26 02:45:00',
3: '2016-08-24 10:15:00',
4: '2016-07-03 14:30:00',
5: '2016-04-22 12:45:00'}}
df = pd.DataFrame(diz_df)
df_filter = pd.DataFrame(diz_filter)
# converting to datetime
df["time"] = df["time"].astype("M8")
df_filter["time"] = df_filter["time"].astype("M8")
使用pandas
df_out = pd.merge(df, df_filter, on=["time", "iso_zone"])
使用dask
df = dd.from_pandas(df, npartitions=2)
# It doesn't matter if the second dataframe is pandas or dask
# df_filter = dd.from_pandas(df_filter, npartitions=2)
df_out = dd.merge(df, df_filter, on=["time", "iso_zone"])
添加回答
举报