2 回答
TA贡献1876条经验 获得超6个赞
Scipy 可以帮助你。请看以下假设示例:
import pandas as pd
from scipy.spatial import cKDTree
dataset1 = pd.DataFrame(pd.np.random.rand(100,3))
dataset2 = pd.DataFrame(pd.np.random.rand(10, 3))
ck = cKDTree(dataset1.values)
ck.query_ball_point(dataset2.values, r=0.1)
数组([列表([]),列表([]),列表([]),列表([]),列表([28, 83]),列表([79]),列表([]),列表([86]), 列表([40]), 列表([29, 60, 95])], dtype=object)
TA贡献1807条经验 获得超9个赞
使用 Numpy 方法:
如果您的两个数据框如下所示:
df1
coords
0 (4,3,5)
1 (5,4,3)
df2
coords
0 (6,7,8)
1 (8,7,6)
然后:
import numpy as np
from itertools import product
#convert dataframes into numpy arrays
df1_arr = np.array([np.array(x) for x in df1.coords.values])
df2_arr = np.array([np.array(x) for x in df2.coords.values])
#create array of cartesian product of elements of the two arrays
cart_arr = np.array([x for x in product(df1_arr,df2_arr)])
#compute Euclidian distance (or norm) between pairs of elements in two arrays
#outputs new array with one value per pair of coordinates
norms_arr = np.linalg.norm(np.diff(cart_arr,axis=1)[:,0,:],axis=1)
#create distance threshold for "close enough"
radius = 5.5
#find values in norms array that are less than or equal to distance threshold
good_idxs = np.argwhere(norms_arr <= radius)[:,0]
good_coord_pairs = cart_arr[good_idxs]
#store corresponding pairs of coordinates and distances in new dataframe
final_df = pd.DataFrame({'df1_coords':list(map(tuple,good_coord_pairs[:,0,:])),
'df2_coords':list(map(tuple(good_coord_pairs[:,1,:])), 'distance':norms_arr[good_idxs],
index=list(range(len(good_coord_pairs))))
将产生:
final_df
df1_coords df2_coords distance
0 (4,3,5) (6,7,8) 5.385165
1 (5,4,3) (8,7,6) 5.196152
添加回答
举报