2 回答

TA贡献1829条经验 获得超9个赞
groupy()与_np.where()
考虑这个样本:
>>> df = pd.DataFrame({'id':[1,2,3,4,5], 'tag': ['a','a','a','d','e']})
>>> df
id tag
0 1 a
1 2 a
2 3 a
3 4 d
4 5 e
>>> df['counter'] = df.groupby(['tag'])['tag'].transform('count')
>>> df
id tag counter
0 1 a 3
1 2 a 3
2 3 a 3
3 4 d 1
4 5 e 1
>>> df['counter'] = np.where(df['counter'] > 2, ['Retain'], ['Remove'])
>>> df
id tag counter
0 1 a Retain
1 2 a Retain
2 3 a Retain
3 4 d Remove
4 5 e Remove
>>> df = df[df['counter'].isin(['Retain'])]
>>> df
id tag counter
0 1 a Retain
1 2 a Retain
2 3 a Retain

TA贡献1895条经验 获得超3个赞
添加一列标记要保留的值,然后按此过滤:
# Make a boolean series as a mapping of values with more than 2 counts
more_than_2_values = df1.b.value_counts() > 2
# Add a new column that indicates which values should be kept
df1["more_than_2"] = df["b"].map(more_than_2_values).fillna(False)
# Filter the data, drop the label column if desired
desired_result = df1[df1["more_than_2"].drop(columns="more_than_2"]
添加回答
举报