首页猿问比较两个网络边缘列表

比较两个网络边缘列表

Python

慕尼黑的夜晚无繁华 2021-05-10 13:06:44

我有两个列表-master.txt及其子集child.txt。我想在master.txt中打印child.txt中不存在的边缘master.txtA BB CD Fchild.txtB AC BE F输出：DF我已经编写了示例代码file1 = open("master.txt", "r")file2 = open("child.txt", "r")probe_id = file1.readlines()loc_names = file2.readlines()`#flag=0for i in probe_id: i=i.rstrip() probe_info=i.split("\t") probe_info[0]=probe_info[0].strip() probe_info[1]=probe_info[1].strip() flag=0 for j in loc_names: j=j.strip() loc_names=j.split("\t") loc_names[0]=loc_names[0].strip() loc_names[1]=loc_names[1].strip() #throwing index out of range error if (probe_info[0]==loc_names[0] and probe_info[1]==loc_names[1]) or (probe_info[0]==loc_names[1] and probe_info[1]==loc_names[0]): flag=1 if flag==0: print i到目前为止，拆分较小的文件时，索引超出范围。请帮助。另外，如果还有其他更快的技术可以执行相同的操作，请告诉我。谢谢

查看完整描述

3 回答

波斯汪

TA贡献1811条经验获得超4个赞

如果我正确理解您的要求，那么您需要做的是：

$ awk '

{ edge=($1>$2 ? $1 FS $2 : $2 FS $1) }

NR==FNR{ file1[edge]; next }

!(edge in file1)

' child.txt master.txt

D F

如果您想在子级中找到不在母版中的边缘，则只需翻转输入文件的顺序即可：

$ awk '

{ edge=($1>$2 ? $1 FS $2 : $2 FS $1) }

NR==FNR{ file1[edge]; next }

!(edge in file1)

' master.txt child.txt

E F

上面的代码非常快，因为它只是在进行哈希查找。

反对回复 2021-05-18

慕神8447489

TA贡献1780条经验获得超1个赞

您可能想使用pythondict进行快速查找：

child = {}

with open('child.txt', 'r') as c:

for line in c:

p1, p2 = line.strip().split()

child[p1] = p2

child[p2] = p1

with open('master.txt', 'r') as m:

for line in m:

p1, p2 = line.strip().split()

if child.get(p1) == p2:

continue

print(line)

关于您的代码，您将重新分配给loc_names该对['E', 'F']，因此外循环的下一次迭代意味着loc_names将设置内循环j为'E'：

file1 = open("master.txt", "r")

file2 = open("child.txt", "r")

probe_id = file1.readlines()

loc_names = file2.readlines()`

#flag=0

for i in probe_id:

i=i.rstrip()

probe_info=i.split("\t")

probe_info[0]=probe_info[0].strip()

probe_info[1]=probe_info[1].strip()

flag=0

for j in loc_names: # j will be 'E' after second iteration of outer loop

j=j.strip()

loc_names=j.split("\t")

loc_names[0]=loc_names[0].strip()

loc_names[1]=loc_names[1].strip() # loc_names is ['E', 'F']

if (probe_info[0]==loc_names[0] and probe_info[1]==loc_names[1]) or (probe_info[0]==loc_names[1] and probe_info[1]==loc_names[0]):

flag=1

if flag==0:

print i

反对回复 2021-05-18

守着一只汪

TA贡献1872条经验获得超3个赞

您可以将每行中的项目拆分为frozensets，然后将其放入set每个文件的，以便可以有效set.difference地获取无效的内容child.txt：

print(' '.join({frozenset(l.split()) for l in open("master.txt")} - {frozenset(l.split()) for l in open("child.txt")}))

反对回复 2021-05-18

3 回答
0 关注
159 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

比较两个网络边缘列表

比较两个网络边缘列表

3 回答

添加回答