将多个过滤器应用于pandas DataFrame或Series的有效方法

我有一个场景，用户想要将多个过滤器应用于Pandas DataFrame或Series对象。本质上，我想有效地将用户在运行时指定的一堆过滤（比较操作）链接在一起。过滤器应为可加性的（又称每个过滤器应缩小结果）。我目前正在使用，reindex()但这每次都会创建一个新对象并复制基础数据（如果我正确理解了文档）。因此，这在筛选大型Series或DataFrame时可能效率很低。我认为使用apply()，map()或类似的方法可能更好。我对Pandas来说还很陌生，所以仍然想尽一切办法。TL; DR我想采用以下形式的字典，并将每个操作应用于给定的Series对象，然后返回“已过滤” Series对象。relops = {'>=': [1], '<=': [1]}长例子我将从当前的示例开始，仅过滤单个Series对象。以下是我当前正在使用的功能： def apply_relops(series, relops): """ Pass dictionary of relational operators to perform on given series object """ for op, vals in relops.iteritems(): op_func = ops[op] for val in vals: filtered = op_func(series, val) series = series.reindex(series[filtered]) return series用户向字典提供他们要执行的操作：>>> df = pandas.DataFrame({'col1': [0, 1, 2], 'col2': [10, 11, 12]})>>> print df>>> print df col1 col20 0 101 1 112 2 12>>> from operator import le, ge>>> ops ={'>=': ge, '<=': le}>>> apply_relops(df['col1'], {'>=': [1]})col11 12 2Name: col1>>> apply_relops(df['col1'], relops = {'>=': [1], '<=': [1]})col11 1Name: col1同样，我上述方法的“问题”是，我认为在步骤之间存在很多不必要的数据复制。另外，我想对此进行扩展，以便传入的字典可以包含要操作的列，并根据输入字典过滤整个DataFrame。但是，我假设该系列的所有工作都可以轻松扩展到DataFrame。

查看完整描述

3 回答

潇湘沐

TA贡献1816条经验获得超6个赞

由于pandas 0.22更新了，因此提供了以下比较选项：

gt（大于）

lt（小于）

eq（等于）

ne（不等于）

ge（大于或等于）

还有很多。这些函数返回布尔数组。让我们看看如何使用它们：

# sample data

df = pd.DataFrame({'col1': [0, 1, 2,3,4,5], 'col2': [10, 11, 12,13,14,15]})

# get values from col1 greater than or equals to 1

df.loc[df['col1'].ge(1),'col1']

1 1

2 2

3 3

4 4

5 5

# where co11 values is better 0 and 2

df.loc[df['col1'].between(0,2)]

col1 col2

0 0 10

1 1 11

2 2 12

# where col1 > 1

df.loc[df['col1'].gt(1)]

col1 col2

2 2 12

3 3 13

4 4 14

5 5 15

反对回复 2019-11-06

热搜

最近搜索清空

将多个过滤器应用于pandas DataFrame或Series的有效方法

将多个过滤器应用于pandas DataFrame或Series的有效方法

3 回答

添加回答