首页猿问如何在数据框列中找到第一个和最后一...

如何在数据框列中找到第一个和最后一个元素并修剪这些元素之间的值

Python

蓝山帝景 2021-11-30 18:27:02

我一直在使用坐标数据。（经纬度）背景Act Df = Index Latitude Longitude0 66.36031097267725 23.7148073574859361 66.36030099322495 23.714795481937692..Flt Df =Index Latitude Longitude0 66.34622070356742 23.6879605863061791 66.34620931053996 23.6879510921166242..len(Actual) = 12053 len(Fleet) = 8000 上述数据表明，Fleet 数据坐标点在 Actual Data Lat & Long Graph 中占据的面积较短。笔记：Fleet Lat & Long 值可能不必等于 Actual Lat & long 值，但它在 Actual Lat/Long 图形点中拥有较短的区域要求我想根据 Fleet Lat/Long Data 中的值修剪 Actual Lat/Long 数据的一部分。我的要求是，当我在 Open Street 地图或 matplotlib 中绘制实际纬度/经度数据和舰队纬度/经度数据时，必须遵循相同的路径。（位置可能不一定相同）我试过的：我使用了算术运算actual_data[(actual_data['Latitude'] <= fleet_data_Lat_start_point) & (actual_data['Longitude'] <= fleet_data_Long_start_point) & (actual_data['Latitude'] <= fleet_data_Lat_end_point) & (actual_data['Longitude'] <= fleet_data_Long_end_point)]

查看完整描述

1 回答

哔哔one

TA贡献1854条经验获得超8个赞

这是我的解决方案：我使用库 geopy 来计算距离。

您可以选择在 geodesic() 或 great_circle() 中计算距离，函数 distance = geodesic。

你可以在度量标准更改.km到.miles或m或ft如果你喜欢别的指标

from geopy.distance import lonlat, distance, great_circle,geodesic

dmin=[]

for index, r in df_actual.iterrows():

valmin = df_fleet.apply(lambda x:

distance(lonlat(x['Longitude'], x['Latitude']),

lonlat(r['Longitude'], r['Latitude'])).km,axis=1).min()

dmin.append(valmin)

df_actual['nearest to fleet(km)'] = dmin

print(df_actual)

如果你想要所有舰队点 < 100m 每个实际点，你做

for ai, a in df_actual.iterrows():

actual = lonlat(a['Longitude'], a['Latitude'])

filter = df_fleet.apply(lambda x:

distance(lonlat(x['Longitude'], x['Latitude']), actual).meters < 100 ,axis=1)

print(f"for {(a['Longitude'], a['Latitude'])}"); print(df_fleet[filter])

最后一个解决方案基于树计算，我认为它非常非常快，我正在使用 scipy 空间，它计算空间中的最近点并给出欧几里得距离的结果。我刚刚调整了 x,y,z 空间点中的 lat,lon 以获得正确的结果（测地线或半正弦）。在这里，我生成了 2 个（纬度，经度）15000 和 10000 行的数据帧，我正在为 df2 中的每个 df1 搜索五个最近的数据帧

from random import uniform

from math import radians, sin, cos

from scipy.spatial import cKDTree

import pandas as pd

import numpy as np

def to_cartesian(lat, lon):

lat = radians(lat); lon = radians(lon)

R = 6371

x = R * cos(lat) * cos(lon)

y = R * cos(lat) * sin(lon)

z = R * sin(lat)

return x, y , z

def newpoint():

return uniform(23, 24), uniform(66, 67)

def ckdnearest(gdA, gdB, bcol):

nA = np.array(list(zip(gdA.x, gdA.y, gdA.z)) )

nB = np.array(list(zip(gdB.x, gdB.y, gdB.z)) )

btree = cKDTree(nB)

dist, idx = btree.query(nA,k=5) #search the first 5 (k=5) nearest point df2 for each point of df1

dist = [d for d in dist]

idx = [s for s in idx]

df = pd.DataFrame.from_dict({'distance': dist,

'index of df2' : idx})

return df

#create the first df (actual)

n = 15000

lon,lat = [],[]

for x,y in (newpoint() for x in range(n)):

lon += [x];lat +=[y]

df1 = pd.DataFrame({'lat': lat, 'lon': lon})

df1['x'], df1['y'], df1['z'] = zip(*map(to_cartesian, df1.lat, df1.lon))

#-----------------------

#create the second df (fleet)

n = 10000

lon,lat = [],[]

for x,y in (newpoint() for x in range(n)):

lon += [x];lat +=[y]

id = [x for x in range(n)]

df2 = pd.DataFrame({'lat': lat, 'lon': lon})

df2['x'], df2['y'], df2['z'] = zip(*map(to_cartesian, df2.lat, df2.lon))

#-----------------------

df = ckdnearest(df1, df2, 'unused')

print(df)

如果你只想要 1 个没有笛卡尔坐标的最近点：

def ckdnearest(gdA, gdB, bcol):

nA = np.array(list(zip(gdA.lat, gdA.lon)))

nB = np.array(list(zip(gdB.lat, gdB.lon)))

btree = cKDTree(nB)

dist, idx = btree.query(nA,k=1) #search the first nearest point df2

df = pd.DataFrame.from_dict({'distance': dist, 'index of df2' : idx})

return df

反对回复 2021-11-30

1 回答
0 关注
196 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何在数据框列中找到第一个和最后一个元素并修剪这些元素之间的值

如何在数据框列中找到第一个和最后一个元素并修剪这些元素之间的值

1 回答

添加回答