首页猿问重新映射和重新组合python...

重新映射和重新组合python pandas中的值

Python

德玛西亚99 2021-04-29 02:06:10

我有一个数据框，其中值已分配给组：import pandas as pddf = pd.DataFrame({ 'num' : [0.43, 5.2, 1.3, 0.33, .74, .5, .2, .12], 'group' : [1, 2, 2, 2, 3,4,5,5] })df group num0 1 0.431 2 5.202 2 1.303 2 0.334 3 0.745 4 0.506 5 0.207 5 0.12我想确保没有价值在一个小组中。如果值为“孤立”，则应将其重新分配给成员多于一个的下一个最高组。因此，结果数据框应如下所示： group num0 2 0.431 2 5.202 2 1.303 2 0.334 5 0.745 5 0.506 5 0.207 5 0.12实现此结果的最有效方法是什么？

查看完整描述

2 回答

当年话下

TA贡献1890条经验获得超9个赞

只能将向量化操作用于此任务。您可以pd.Series.bfill用来创建从原始索引到新索引的映射：

counts = df['group'].value_counts().sort_index().reset_index()

counts['original'] = counts['index']

counts.loc[counts['group'] == 1, 'index'] = np.nan

counts['index'] = counts['index'].bfill().astype(int)

print(counts)

index group original

0 2 1 1

1 2 3 2

2 5 1 3

3 5 1 4

4 5 2 5

然后使用pd.Series.map执行映射：

df['group'] = df['group'].map(counts.set_index('original')['index'])

print(df)

group num

0 2 0.43

1 2 5.20

2 2 1.30

3 2 0.33

4 5 0.74

5 5 0.50

6 5 0.20

7 5 0.12

反对回复 2021-05-11

慕田峪7331174

TA贡献1828条经验获得超13个赞

这是我发现的一种解决方案，可能有更好的方法来执行此操作...

# Find the orphans

count = df.group.value_counts().sort_index()

orphans = count[count == 1].index.values.tolist()

# Find the sets

sets = count[count > 1].index.values.tolist()

# Find where orphans should be remapped

where = [bisect.bisect(sets, x) for x in orphans]

remap = [sets[x] for x in where]

# Create a dictionary for remapping, and replace original values

change = dict(zip(orphans, remap))

df = df.replace({'group': change})

group num

0 2 0.43

1 2 5.20

2 2 1.30

3 2 0.33

4 5 0.74

5 5 0.50

6 5 0.20

7 5 0.12

反对回复 2021-05-11

2 回答
0 关注
223 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

重新映射和重新组合python pandas中的值

重新映射和重新组合python pandas中的值

2 回答

添加回答