首页猿问如何为Pandas数据帧实现�...

如何为Pandas数据帧实现'in'和'not in'

Python MySQL

杨__羊羊 2019-05-24 15:41:39

如何为Pandas数据帧实现'in'和'not in'我怎样才能实现SQL的的等价物IN和NOT IN？我有一个包含所需值的列表。这是场景：df = pd.DataFrame({'countries':['US','UK','Germany','China']})countries = ['UK','China']# pseudo-code:df[df['countries'] not in countries]我目前的做法如下：df = pd.DataFrame({'countries':['US','UK','Germany','China']})countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})# INdf.merge(countries,how='inner',on='countries')# NOT INnot_in = df.merge(countries,how='left',on='countries')not_in = not_in[pd.isnull(not_in['matched'])]但这似乎是一个可怕的kludge。任何人都可以改进吗？

查看完整描述

4 回答

函数式编程

TA贡献1807条经验获得超9个赞

你可以用pd.Series.isin。

对于“IN”:( somewhere.isin(something)读：是否something在somewhere？）

或者“不在”： ~somewhere.isin(something)

举个例子：

>>> df

countries

0 US

1 UK

2 Germany

3 China

>>> countries

['UK', 'China']

>>> df.countries.isin(countries)

0 False

1 True

2 False

3 True

Name: countries, dtype: bool

>>> df[df.countries.isin(countries)]

countries

1 UK

3 China

>>> df[~df.countries.isin(countries)]

countries

0 US

2 Germany

反对回复 2019-05-24

摇曳的蔷薇

TA贡献1793条经验获得超6个赞

使用.query（）方法的替代解决方案：

In [5]: df.query("countries in @countries")

Out[5]:

countries

1 UK

3 China

In [6]: df.query("countries not in @countries")

Out[6]:

countries

0 US

2 Germany

反对回复 2019-05-24

慕婉清6462132

TA贡献1804条经验获得超2个赞

我一直在对这样的行进行泛型过滤：

criterion = lambda row: row['countries'] not in countries

not_in = df[df.apply(criterion, axis=1)]

反对回复 2019-05-24

慕哥6287543

TA贡献1831条经验获得超10个赞

如何实现in和not in一个pandas DataFrame？

：熊猫提供了两种方法Series.isin，并DataFrame.isin分别对系列和DataFrames。这是titular python运算符到它们等效的pandas操作的映射。

╒════════╤══════════════════════╤══════════════════════╕

│ │ Python │ Pandas │

╞════════╪══════════════════════╪══════════════════════╡

│ in │ item in sequence │ sequence.isin(item) │

├────────┼──────────────────────┼──────────────────────┤

│ not in │ item not in sequence │ ~sequence.isin(item) │

╘════════╧══════════════════════╧══════════════════════╛

要实现“not in”，必须反转结果isin。

另请注意，在pandas情况下，“ sequence”可以引用Series或DataFrame，而“ item”本身可以是可迭代的（很快就会更多）。

基于ONE Column过滤DataFrame（也适用于Series）

最常见的情况是isin在特定列上应用条件以过滤DataFrame中的行。

df = pd.DataFrame({'countries': ['US', 'UK', 'Germany', np.nan, 'China']})

countries

0 US

1 UK

2 Germany

3 China

c1 = ['UK', 'China'] # list

c2 = {'Germany'} # set

c3 = pd.Series(['China', 'US']) # Series

c4 = np.array(['US', 'UK']) # array

Series.isin接受各种类型作为输入。以下是获得所需内容的所有有效方法：

df['countries'].isin(c1)

0 False

1 True

2 False

3 False

4 True

Name: countries, dtype: bool

# `in` operation

df[df['countries'].isin(c1)]

countries

1 UK

4 China

# `not in` operation

df[~df['countries'].isin(c1)]

countries

0 US

2 Germany

3 NaN

# Filter with `set` (tuples work too)

df[df['countries'].isin(c2)]

countries

2 Germany

# Filter with another Series

df[df['countries'].isin(c3)]

countries

0 US

4 China

# Filter with array

df[df['countries'].isin(c4)]

countries

0 US

1 UK

过滤多个列

有时，您会希望对多列使用某些搜索字词进行“入”成员资格检查，

df2 = pd.DataFrame({

'A': ['x', 'y', 'z', 'q'], 'B': ['w', 'a', np.nan, 'x'], 'C': np.arange(4)})

df2

A B C

0 x w 0

1 y a 1

2 z NaN 2

3 q x 3

c1 = ['x', 'w', 'p']

要将isin条件应用于“A”和“B”列，请使用DataFrame.isin：

df2[['A', 'B']].isin(c1)

A B

0 True True

1 False False

2 False False

3 False True

从这里，为了保留至少有一列的行True，我们可以any沿第一轴使用：

df2[['A', 'B']].isin(c1).any(axis=1)

0 True

1 False

2 False

3 True

dtype: bool

df2[df2[['A', 'B']].isin(c1).any(axis=1)]

A B C

0 x w 0

3 q x 3

请注意，如果要搜索每个列，则只需省略列选择步骤即可

df2.isin(c1).any(axis=1)

同样，要保留ALL列所在的行True，请使用all与以前相同的方式。

df2[df2[['A', 'B']].isin(c1).all(axis=1)]

A B C

0 x w 0

反对回复 2019-05-24

4 回答
0 关注
2347 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何为Pandas数据帧实现'in'和'not in'

如何为Pandas数据帧实现'in'和'not in'

4 回答

添加回答