我在熊猫中有以下数据框 id source 1 AS 2 AS 3 AS 4 AT 5 BR 6 BT 7 BR 8 BT 9 AS 10 BE我想在上面的数据框中做的是,无论哪个来源出现次数少于 3 次,都应编码为 OTHERS。我有 100 万个条目,其中包含超过 10K 个独特的来源。我们如何在熊猫中做到这一点。所需的数据框是 id source 1 AS 2 AS 3 AS 4 OTHERS 5 OTHERS 6 OTHERS 7 OTHERS 8 OTHERS 9 AS 10 OTHERS
1 回答

眼眸繁星
TA贡献1873条经验 获得超9个赞
尝试这个,
df.loc[df.groupby('source').transform('count').lt(3)['id'], 'source'] = 'OTHERS'
id source
0 1 AS
1 2 AS
2 3 AS
3 4 OTHERS
4 5 OTHERS
5 6 OTHERS
6 7 OTHERS
7 8 OTHERS
8 9 AS
9 10 OTHERS
添加回答
举报
0/150
提交
取消