2 回答
TA贡献1825条经验 获得超4个赞
这对我有用:
# First: group df by child id
grouped = df_input.groupby(['id_child'], as_index=True).apply(lambda a: a[:])
# Second: Create a new output dataframe
OUTPUT = pd.DataFrame(columns=['id_parent','id_child'])
# Third: Fill it with the unique childs ids and the minimun id for their parent in case of more than one.
for i,id_ch in enumerate(df_input.id_child.unique()):
OUTPUT.loc[i] = [min(grouped.loc[id_ch].id_parent), id_ch]
TA贡献1820条经验 获得超9个赞
我可以使用得到结果drop_duplicates
In [6]: df
Out[6]:
id_parent id_child
0 1100 1090
1 1100 1080
2 1100 1070
3 1090 1080
4 1090 1070
5 1080 1070
In [9]: df.drop_duplicates(subset=['id_parent']).reset_index(drop=True)
Out[9]:
id_parent id_child
0 1100 1090
1 1090 1080
2 1080 1070
添加回答
举报