熊猫数据框中的每一行都包含2点的经/纬度坐标。使用下面的Python代码,计算许多(几百万)行的这两个点之间的距离需要很长时间!考虑到两个点相距不到50英里,并且精度不是很重要,是否可以使计算更快?from math import radians, cos, sin, asin, sqrtdef haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c return kmfor index, row in df.iterrows():
df.loc[index, 'distance'] = haversine(row['a_longitude'], row['a_latitude'], row['b_longitude'], row['b_latitude'])
3 回答

婷婷同学_
TA贡献1844条经验 获得超8个赞
如果允许使用scikit-learn,我将给以下机会:
from sklearn.neighbors import DistanceMetricdist = DistanceMetric.get_metric('haversine')# example datalat1, lon1 = 36.4256345, -5.1510261lat2, lon2 = 40.4165, -3.7026lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])X = [[lat1, lon1], [lat2, lon2]]kms = 6367print(kms * dist.pairwise(X))
添加回答
举报
0/150
提交
取消