如何在不迭代的情况下根据特定对从数据框中选择结果？

我想根据某些特定对从数据帧中查询（或定位）子数据帧。使用迭代很容易做到这一点，但速度很慢。import pandas as pddf=pd.DataFrame([[1,2,3], [1,5,6], [7,8,9], [2,3,8]], columns=['x','y','z'])dfOut[4]: x y z0 1 2 31 1 5 62 7 8 93 2 3 8我想得到一个子数据框，其中 (x,y)=(1,2) 和 (x,y)=(1,5) 和 (x,y)=(2,3)，如下所示Out[5]: x y z0 1 2 31 1 5 63 2 3 8我的方法是使用迭代来获取索引：xy_list=[(1,2),(1,5),(2,3)]index_list=[]for x,y in xy_list: index_list+=df.query('x==@x & y==@y').index.tolist()df_sub=df.loc[index_list]df_subOut[6]: x y z0 1 2 31 1 5 63 2 3 8有没有什么方法可以在不使用迭代的情况下做到这一点？

查看完整描述

2 回答

慕村9548890

TA贡献1884条经验获得超4个赞

你很接近，但你不需要query反复调用。只需使用构建您的查询字符串str.join并query在之后进行一次调用。

data = [(1, 2), (1, 5), (2, 3)]

pattern = '(' + ') | ('.join(f"x == {a} & y == {b}" for a, b in data) + ')'

pattern

# '(x == 1 & y == 2) | (x == 1 & y == 5) | (x == 2 & y == 3)'

df.query(pattern)

x y z

0 1 2 3

1 1 5 6

3 2 3 8

另一种选择是使用Index.isin和一些过滤：

df[df.set_index(['x', 'y']).index.isin(data)]

x y z

0 1 2 3

1 1 5 6

3 2 3 8

或者，使用MultiIndex.from_arrays以下方法构建 MultiIndex ：

df[pd.MultiIndex.from_arrays([df['x'], df['y']]).isin(data)]

x y z

0 1 2 3

1 1 5 6

3 2 3 8

结果相同，效率更高。

反对回复 2021-12-17

婷婷同学_

TA贡献1844条经验获得超8个赞

或者你可以做一个df.set_index()and df.loc[]：

xy_list=[(1,2),(1,5),(2,3)]

df_new=df.set_index(['x','y']).loc[xy_list].reset_index()

x y z

0 1 2 3

1 1 5 6

2 2 3 8

反对回复 2021-12-17

热搜

最近搜索清空

如何在不迭代的情况下根据特定对从数据框中选择结果？

如何在不迭代的情况下根据特定对从数据框中选择结果？

2 回答

添加回答