2 回答
TA贡献1795条经验 获得超7个赞
我认为您可以通过简单的检查来完成所需的工作。让我解释一下我是否正确理解。你可以有一个标志(真/假值)来检测你是否在有趣的块中。每当您找到“###PERFORMANCE”时,您都可以更改此标志。然后您可以将这两个块保存在两个列表或您喜欢的任何结构中。
下面是代码片段
logFile = "logfile.txt"
with open(logFile) as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content]
# flag
are_we_in_the_interesting_block = False;
# two lists to save the liens
interesting_block = [];
non_interesting_block = [];
for line in content:
# check if there is the text ###PERFORMANCE
is_there_performance = line.find('###PERFORMANCE');
# if it's not there, it returns -1
if is_there_performance > 0:
are_we_in_the_interesting_block = not are_we_in_the_interesting_block;
else:
if are_we_in_the_interesting_block:
# here I append to a list, but you can do your processing
interesting_block.append(line);
else:
# here processing of the non interesting parts
non_interesting_block.append(line);
print('Interesting blocks')
print(interesting_block)
print('\n')
print('Non interesting blocks')
print(non_interesting_block)
产生的输出将是
Interesting blocks
['20190122 09:10,500 number1 string1 string2 string3', '20190122 09:24,670 number2 string1 string2 string3', '20190122 10:05,000 number3 string1 string2 string3', '20190122 10:33,960 number4 string1 string2 string3', '20190122 11:00,321 number5 string1 string2 string3', '20190123 08:10,500 number1 string1 string2 string3', '20190123 08:24,670 number2 string1 string2 string3', '20190123 09:05,000 number3 string1 string2 string3', '20190123 10:33,960 number4 string1 string2 string3', '20190123 10:00,321 number5 string1 string2 string3', '20190124 10:10,500 number1 string1 string2 string3', '20190124 10:24,670 number2 string1 string2 string3', '20190124 11:05,000 number3 string1 string2 string3', '20190124 12:33,960 number4 string1 string2 string3', '20190124 13:00,321 number5 string1 string2 string3']
Non interesting blocks
['20190123 10:24,670 number1 string1 string2 string3 string4 date1 number2', '20190123 10:32,130 number1 string1 string2 string3 string4 date1 number2']
然后,interesting_block[n]如果需要,您可以访问以获取第 n 行。
TA贡献1802条经验 获得超5个赞
由于您只是在 PERFORMANCE 分隔符上进行匹配,因此使用 NLTK 似乎有点过分。一个简单的方法是使用一个简单的匹配(是行中的预期字符串),然后根据它切换您的捕获模式。例如:
in_block = False
IDENTIFIER = 'PERFORMANCE'
with open(logfile) as f:
for line in f.readlines():
if IDENTIFIER in line:
# Toggle the boolean
in_block = not in_block
if in_block:
print(line)
添加回答
举报