3 回答

TA贡献1909条经验 获得超7个赞
首先,您的代码中有一些拼写错误——在您编写的某些地方l_pos和其他地方,lv_pos.
另一个问题是我认为你没有意识到这file_l1是一个列表列表,所以if word in file_l1:它没有按照你的想法去做。您需要word根据这些子列表中的每一个检查每个。
这是一些基于您自己的工作代码:
fname_in = "simple_test.txt"
l_pos = []
search_list = ['word1', 'word2']
with open(fname_in) as f:
lines = f.read().splitlines()
for i, line in enumerate(lines):
for word in search_list:
if word in line:
l_pos.append(lines[i - 1])
print(l_pos) # -> ['I want this line1.', 'I want this line2.']
更新
这是另一种方法,不需要一次将整个文件读入内存,因此不需要那么多内存:
from collections import deque
fname_in = "simple_test.txt"
l_pos = []
search_list = ['word1', 'word2']
with open(fname_in) as file:
lines = (line.rstrip('\n') for line in file) # Generator expression.
try: # Create and initialize a sliding window.
sw = deque(next(lines), maxlen=2)
except StopIteration: # File with less than 1 line.
pass
for line in lines:
sw.append(line)
for word in search_list:
if word in sw[1]:
l_pos.append(sw[0])
print(l_pos) # -> ['I want this line1.', 'I want this line2.']

TA贡献2011条经验 获得超2个赞
在您的示例的第二行中,您编写了lv_pos而不是l_pos. 在with声明中,您可以像这样修复它,我认为:
fname_in = "test.txt"
l_pos = []
search_list = ['word1', 'word2']
file_l1 = f.readlines()
for line in range(len(file_l1)):
for word in search_words:
if word in file_l1[line].split(" "):
l_pos.append(file_l1[line - 1])
print(l_pos)
我对这个解决方案并不感到兴奋,但我认为它可以通过最少的修改来修复您的代码。

TA贡献1951条经验 获得超3个赞
将文件视为成对的line和lines-before的集合:
[prev for prev,this in zip(lines, lines[1:])
if 'word1' in this or 'word2' in this]
#['I want this line1.', 'I want this line2.']
这种方法可以扩展到涵盖任意数量的单词:
words = {'word1', 'word2'}
[prev for prev,this in zip(lines,lines[1:])
if any(word in this for word in words)]
#['I want this line1.', 'I want this line2.']
最后,如果您关心正确的单词而不是出现次数(如"thisisnotword1"),您应该正确地标记行,例如nltk.word_tokenize():
from nltk import word_tokenize
[prev for prev,this in zip(lines,lines[1:])
if words & set(word_tokenize(this))]
#['I want this line1.', 'I want this line2.']
添加回答
举报