2 回答

TA贡献1806条经验 获得超5个赞
解决方案的想法是建立所有组合的表,
df = store_df.merge(weather_station_df, on='date', suffixes=('_store', '_station'))
计算距离
df['dist'] = (df.x_store - df.x_station)**2 + (df.y_store - df.y_station)**2
并选择每组的最小值:
df.groupby(['store_id', 'date']).apply(lambda x: x.loc[x.dist.idxmin(), ['station_id', 'weather']]).reset_index()
如果你有很多约会,你可以按组加入。

TA贡献1995条经验 获得超2个赞
import math
import numpy as np
def distance(x1, x2, y1, y2):
return np.sqrt((x2-x1)**2 + (y2-y1)**2)
#Join On Date to get all combinations of store and stations per day
df_all = store_df.merge(weather_station_df, on=['date'])
#Apply distance formula to each combination
df_all['distances'] = distance(df_all['x_y'], df_all['x_x'], df_all['y_y'], df_all['y_x'])
#Get Minimum distance for each day Per store_id
df_mins = df_all.groupby(['date', 'store_id'])['distances'].min().reset_index()
#Use resulting minimums to get the station_id matching the min distances
closest_stations_df = df_mins.merge(df_all, on=['date', 'store_id', 'distances'], how='left')
#filter out the unnecessary columns
result_df = closest_stations_df[['store_id', 'date', 'station_id', 'weather', 'distances']].sort_values(['store_id', 'date'])
编辑:使用矢量化距离公式
添加回答
举报