2 回答
TA贡献1839条经验 获得超15个赞
另一种选择是使用distance_matrixfrom scipy.spatial:
dist_mat = distance_matrix(dfPoints [['Xh','Yh']], dfTrafaPostaje [['Xt','Yt']])
dfPoints [np.min(dist_mat,axis=1)<5]
1500 dfPoints和花了大约 2 秒30000 dfTrafaPostje。
更新:获取得分最高的参考点的索引:
dist_mat = distance_matrix(dfPoints [['Xh','Yh']], dfTrafaPostaje [['Xt','Yt']])
# get the M scores of those within range
M_mat = pd.DataFrame(np.where(dist_mat <= 5, dfTrafaPosaje['M'].values[None, :], np.nan),
index=dfPoints['H-points'] ,
columns=dfTrafaPostaje['TP-points'])
# get the points with largest M values
# mask with np.nan for those outside range
dfPoints['M'] = np.where(M_mat.notnull().any(1), M_mat.idxmax(1), np.nan)
对于包含的样本数据:
H-points Xh Yh TP
0 a 10 15 a
1 b 35 5 NaN
2 c 52 11 NaN
3 d 78 20 NaN
4 e 9 10 NaN
TA贡献1802条经验 获得超4个赞
您可以使用scipy中的 cdist 计算成对距离,然后在距离小于半径的地方创建一个带有 True 的掩码,最后过滤:
import pandas as pd
from scipy.spatial.distance import cdist
dfPoints = pd.DataFrame({'H-points': ['a', 'b', 'c', 'd', 'e'],
'Xh': [10, 35, 52, 78, 9],
'Yh': [15, 5, 11, 20, 10]})
dfTrafaPostaje = pd.DataFrame({'TP-points': ['a', 'b', 'c'],
'Xt': [15, 25, 35],
'Yt': [15, 25, 35]})
radius = 5
distances = cdist(dfPoints[['Xh', 'Yh']].values, dfTrafaPostaje[['Xt', 'Yt']].values, 'sqeuclidean')
mask = (distances <= radius*radius).sum(axis=1) > 0 # create mask
print(dfPoints[mask])
输出
H-points Xh Yh
0 a 10 15
添加回答
举报