仅在数据框中保留直接父子 ID 对

我有以下数据框： id_parent id_child0 1100 10901 1100 10802 1100 10703 1100 10604 1090 10805 1090 10706 1080 1070我只想保持直接父子连接。示例：1100 有 3 个连接，但只保留 1090，因为 1080 和 1070 已经是 1090 的子节点。此示例 df 仅包含 1 个样本，df 由多个父/子集群组成。因此，输出应如下所示： id_parent id_child0 1100 10901 1090 10802 1080 10703 1100 1060示例代码：import pandas as pd#create sample input df_input = pd.DataFrame.from_dict({'id_parent': {0: 1100, 1: 1100, 2: 1100, 3: 1100, 4: 1090, 5: 1090, 6: 1080}, 'id_child': {0: 1090, 1: 1080, 2: 1070, 3: 1060, 4: 1080, 5: 1070, 6: 1070}})#create sample outputdf_output = pd.DataFrame.from_dict({'id_parent': {0: 1100, 1: 1090, 2: 1080, 3: 1100}, 'id_child': {0: 1090, 1: 1080, 2: 1070, 3: 1060}})我目前的方法是基于这个问题：Creating dictionary of parent child pairs in pandas dataframe 但也许有一种简单干净的方法可以解决这个问题而不依赖于额外的非标准库？

查看完整描述

2 回答

凤凰求蛊

TA贡献1825条经验获得超4个赞

这对我有用：

# First: group df by child id

grouped = df_input.groupby(['id_child'], as_index=True).apply(lambda a: a[:])

# Second: Create a new output dataframe

OUTPUT = pd.DataFrame(columns=['id_parent','id_child'])

# Third: Fill it with the unique childs ids and the minimun id for their parent in case of more than one.

for i,id_ch in enumerate(df_input.id_child.unique()):

OUTPUT.loc[i] = [min(grouped.loc[id_ch].id_parent), id_ch]

反对回复 2023-03-16

慕妹3146593

TA贡献1820条经验获得超9个赞

我可以使用得到结果drop_duplicates

In [6]: df

Out[6]:

id_parent id_child

0 1100 1090

1 1100 1080

2 1100 1070

3 1090 1080

4 1090 1070

5 1080 1070

In [9]: df.drop_duplicates(subset=['id_parent']).reset_index(drop=True)

Out[9]:

id_parent id_child

0 1100 1090

1 1090 1080

2 1080 1070

反对回复 2023-03-16

热搜

最近搜索清空

仅在数据框中保留直接父子 ID 对

仅在数据框中保留直接父子 ID 对

2 回答

添加回答