1 回答

TA贡献1797条经验 获得超6个赞
你的代码是错误的。考虑ID2_1234_CAT_ANIMAL_GOOD_3在内循环中检查行时会发生什么:
subject_id = cat.split('_')[0] #ID2
num_id = cat.split('_')[1] # 1234
subject_num = subject_id + '_' + num_id #ID2_1234
for j, dog in enumerate(files_list):
# when dog is the line ID2_1234_CAT_ANIMAL_GOOD_3
if subject_num in dog and 'GOOD' in dog: # this is true
if 'GOOD' in dog and 'DOG' in dog: # this is false
continue;
else:
file_filter.append(cat) # then it outputs it
问题是,每行GOOD,并CAT在将“匹配本身”的内循环。
恕我直言,我会使用itertools.groupby。类似的东西:
from itertools import groupby
def key(line):
return line.split('_')[:2]
for key, lines in groupby(sorted(files_list, key=key), key=key):
good_lines = [line for line in lines if 'GOOD' in line]
if len(good_lines) == 1 and 'CAT' in good_lines[0]:
file_filter.append(good_lines[0])
这也应该是更有效的 O(nlog n) 与 O(n^2) 相比,尽管它需要 RAM 中文件的所有内容。
如果您有除CATand以外的其他“类” ,DOG并且您想输出GOOD CAT除subject_idis之外的所有行,GOOD DOG您可以通过以下方式修改上面的代码:
is_good_cat = any('CAT' in line for line in good_lines)
is_good_dog = any('DOG' in line for line in good_lines)
if is_good_cat and not is_good_dog:
file_filter.extend(line for line in good_lines if 'CAT' in good_lines)
(你需要使用.extend和循环,因为我们不再知道要写哪一行,所以你必须过滤它们。
添加回答
举报