1 回答
TA贡献1812条经验 获得超5个赞
解决方案如果仅0
且1
列中的值Chef_is_masterchef
:
首先对每个组的值进行计数,crosstab
并过滤两列是否具有更多或相等的值 by N
,DataFrame.ge
然后DataFrame.all
将与这两个条件匹配的索引值传递给Series.isin
:
N = 3s = pd.crosstab(df['Hotel_id'], df['Chef_is_masterchef']).ge(N).all(1) df1 = df[df['Hotel_id'].isin(s.index[s])]
或者,您可以为调用的测试0
值创建新列,并与每组的计数和值一起zeros
使用,输出将传递到:GroupBy.transform
sum
0
1
mask
boolean indexing
N = 3 mask = (df.assign(zeros=df['Chef_is_masterchef'].eq(0)) .groupby('Hotel_id')[['Chef_is_masterchef', 'zeros']] .transform('sum') .ge(N) .all(axis=1)) df1 = df[mask]
print (df1)
Hotel_id Month_Year Chef_Id Chef_is_masterchef
0 2400614 May-2015 2297544 0
1 2400614 June-2015 2297544 0
2 2400614 July-2015 2297544 0
3 2400614 August-2015 2297544 0
4 2400614 September-2015 2297544 0
5 2400614 October-2015 2297544 0
6 2400614 November-2015 2297544 0
7 2400614 December-2015 2297544 0
8 2400614 January-2016 2297544 1
9 2400614 February-2016 2297544 1
10 2400614 March-2016 2297544 1
16 2400133 February-2016 4597531 0
17 2400133 March-2016 4597531 0
18 2400133 April-2016 4597531 0
19 2400133 May-2016 4597531 0
20 2400133 June-2016 4597531 0
21 2400133 July-2016 4597531 0
22 2400133 August-2016 4597531 1
23 2400133 September-2016 4597531 1
24 2400133 October-2016 4597531 1
25 2400133 November-2016 4597531 1
26 2400133 December-2016 4597531 1
27 2400133 January-2017 4597531 1
28 2400133 February-2017 4597531 1
29 2400133 March-2017 4597531 1
30 2400133 April-2017 4597531 1
31 2400133 May-2017 4597531 1
N = 6
mask = (df.assign(zeros=df['Chef_is_masterchef'].eq(0))
.groupby('Hotel_id')[['Chef_is_masterchef', 'zeros']]
.transform('sum')
.ge(N)
.all(axis=1))
df2 = df[mask]
print (df2)
Hotel_id Month_Year Chef_Id Chef_is_masterchef
16 2400133 February-2016 4597531 0
17 2400133 March-2016 4597531 0
18 2400133 April-2016 4597531 0
19 2400133 May-2016 4597531 0
20 2400133 June-2016 4597531 0
21 2400133 July-2016 4597531 0
22 2400133 August-2016 4597531 1
23 2400133 September-2016 4597531 1
24 2400133 October-2016 4597531 1
25 2400133 November-2016 4597531 1
26 2400133 December-2016 4597531 1
27 2400133 January-2017 4597531 1
28 2400133 February-2017 4597531 1
29 2400133 March-2017 4597531 1
30 2400133 April-2017 4597531 1
31 2400133 May-2017 4597531 1
如果可能的话,可以使用其他一些值,例如0和:1
N = 3
mask = (df.assign(zeros= df['Chef_is_masterchef'].eq(0),
ones = df['Chef_is_masterchef'].eq(1))
.groupby('Hotel_id')[['ones', 'zeros']]
.transform('sum')
.ge(N)
.all(axis=1))
df1 = df[mask]
添加回答
举报