2 回答
TA贡献1836条经验 获得超5个赞
这是一个树结构,一种特殊类型的图。数据框并不是表示树的特别方便的方式;我建议您切换到networkx或其他一些基于图形的包。然后查找如何进行简单的路径遍历;您会在图形包文档中找到直接支持。
如果你坚持自己做——这是一个合理的编程练习——你只需要像这样的伪代码
for each parent not in "child" column:
here = parent
while here in parent column:
here = here["child"]
record (parent, here) pair
TA贡献1797条经验 获得超6个赞
虽然您的预期输出似乎与您的描述有些不一致(AC2 似乎不应该被视为父节点,因为它不是源节点),但我非常有信心您想从每个源节点运行遍历以定位它所有的叶子。在数据框中这样做并不方便,因此我们可以使用df.values
并创建一个邻接列表字典来表示图形。我假设图中没有循环。
import pandas as pd
from collections import defaultdict
def find_leaves(graph, src):
if src in graph:
for neighbor in graph[src]:
yield from find_leaves(graph, neighbor)
else:
yield src
def pair_sources_to_leaves(df):
graph = defaultdict(list)
children = set()
for parent, child in df.values:
graph[parent].append(child)
children.add(child)
leaves = [[x, list(find_leaves(graph, x))]
for x in graph if x not in children]
return (pd.DataFrame(leaves, columns=df.columns)
.explode(df.columns[-1])
.reset_index(drop=True))
if __name__ == "__main__":
df = pd.DataFrame({
"parent": ["AC1", "AC2", "AC3", "AC1", "AC11",
"AC5", "AC5", "AC6", "AC8", "AC9"],
"child": ["AC2", "AC3", "AC4", "AC11", "AC12",
"AC2", "AC6", "AC7", "AC9", "AC10"]
})
print(pair_sources_to_leaves(df))
输出:
parent child
0 AC1 AC4
1 AC1 AC12
2 AC5 AC4
3 AC5 AC7
4 AC8 AC10
添加回答
举报