2 回答

TA贡献1827条经验 获得超9个赞
reduce考虑使用suffixes参数对merge重复列名进行一些处理并删除中间子列的链合并:
def proc_build(x,y):
temp = (pd.merge(x, y, left_on='parents', right_on='child',
how='left', suffixes=['_',''])
.fillna('-'))
return temp
final_df = (reduce(proc_build, [df, df, df, df])
.set_axis(['child', 'parents',
'child1', 'A',
'child2', 'B',
'child3', 'C'], axis='columns', inplace=False)
.reindex(['child', 'parents'] + list('ABC'), axis='columns')
)
print(final_df)
# child parents A B C
# 0 Joe Steffani Dani Selma Kevin
# 1 Joe Steffani Dani John -
# 2 Anna Bob Selma Kevin Robert
# 3 Anna Steffani Dani Selma Kevin
# 4 Anna Steffani Dani John -
# 5 Steffani Dani Selma Kevin Robert
# 6 Steffani Dani John - -
# 7 Bob Selma Kevin Robert -
# 8 Rea Anna Bob Selma Kevin
# 9 Rea Anna Steffani Dani Selma
# 10 Rea Anna Steffani Dani John
# 11 Dani Selma Kevin Robert -
# 12 Dani John - - -
# 13 Selma Kevin Robert - -
# 14 John - - - -
# 15 Kevin Robert - - -
要扩展另一列,例如D ,请在and中添加另一个带有附加列表项的df可迭代参数,特别是and 。虽然有一些方法可以使这些项目动态化,但可能会变得昂贵,因此应该以一些声明性的强调来处理。reduceset_axisreindex['child4', 'D']list('ABCD')reduce
final_df = (reduce(proc_build, [df] * 5)
.set_axis(['child', 'parents',
'child1', 'A',
'child2', 'B',
'child3', 'C',
'child4', 'D'], axis='columns', inplace=False)
.reindex(['child', 'parents'] + list('ABCD'), axis='columns')
)
print(final_df)
# child parents A B C D
# 0 Joe Steffani Dani Selma Kevin Robert
# 1 Joe Steffani Dani John - -
# 2 Anna Bob Selma Kevin Robert -
# 3 Anna Steffani Dani Selma Kevin Robert
# 4 Anna Steffani Dani John - -
# 5 Steffani Dani Selma Kevin Robert -
# 6 Steffani Dani John - - -
# 7 Bob Selma Kevin Robert - -
# 8 Rea Anna Bob Selma Kevin Robert
# 9 Rea Anna Steffani Dani Selma Kevin
# 10 Rea Anna Steffani Dani John -
# 11 Dani Selma Kevin Robert - -
# 12 Dani John - - - -
# 13 Selma Kevin Robert - - -
# 14 John - - - - -
# 15 Kevin Robert - - - -

TA贡献1829条经验 获得超7个赞
这是我的一个粗略的解决方案。你应该优化它。
加载所有数据帧
将所有数据框的名称保存在列表中
list_data = [data1,data2]
list_df = []
i = 0
for data in list_data:
vars()[f'df{i}'] = pd.DataFrame(data)
list_df.append(f'df{i}')
i += 1
然后创建2个代理变量;
df_family :这将是一个输出
last_df :为了打破循环,如果父列中的每一行都是'-',但列表中还剩下数据框。
last_df = False
df_family = pd.DataFrame()
这部分将根据需要将数据框合并在一起。我还将名称更改为 1,2,...,n,以便您轻松重命名。
for df in list_df:
if last_df:
break
if (eval(df)['parents'] == '-').all():
last_df = True
if df_family.empty:
df_family = eval(df)
else:
df_family = pd.merge(df_family,eval(df), how = 'left', left_on = df_family.columns[-1], right_on = eval(df).columns[0])
df_family.drop(columns = [eval(df).columns[0]], axis = 1, inplace = True)
list_cols = [i for i in range(df_family.shape[1])]
df_family.columns = list_cols
添加回答
举报