首页猿问是否可以通过列表中的单词搜索...

是否可以通过列表中的单词搜索 txt 文件并返回上面的行？

Python

慕盖茨4494581 2021-08-14 16:39:37

我有一个带有句子的 txt 文件，并且能够从其中的列表中找到单词。我想将“找到的行”上方的行打印到单独的列表中。我用下面的代码尝试过，但这只会返回[].这是我的代码：fname_in = "test.txt"lv_pos = []search_list = ['word1', 'word2']with open (fname_in, 'r') as f: file_l1 = [line.split('\n') for line in f.readlines()] counter = 0 for word in search_list: if word in file_l1: l_pos.append(file_l1[counter - 1]) counter += 1print(l_pos)文本文件看起来像这样：Bla bla blaI want this line1.I found this line with word1.Bla bla blaI want this line2.I found this line with word2.我想要的结果是：l_pos = ['I want this line1.','I want this line2.']

查看完整描述

3 回答

jeck猫

TA贡献1909条经验获得超7个赞

首先，您的代码中有一些拼写错误——在您编写的某些地方l_pos和其他地方，lv_pos.

另一个问题是我认为你没有意识到这file_l1是一个列表列表，所以if word in file_l1:它没有按照你的想法去做。您需要word根据这些子列表中的每一个检查每个。

这是一些基于您自己的工作代码：

fname_in = "simple_test.txt"

l_pos = []

search_list = ['word1', 'word2']

with open(fname_in) as f:

lines = f.read().splitlines()

for i, line in enumerate(lines):

for word in search_list:

if word in line:

l_pos.append(lines[i - 1])

print(l_pos) # -> ['I want this line1.', 'I want this line2.']

更新

这是另一种方法，不需要一次将整个文件读入内存，因此不需要那么多内存：

from collections import deque

fname_in = "simple_test.txt"

l_pos = []

search_list = ['word1', 'word2']

with open(fname_in) as file:

lines = (line.rstrip('\n') for line in file) # Generator expression.

try: # Create and initialize a sliding window.

sw = deque(next(lines), maxlen=2)

except StopIteration: # File with less than 1 line.

pass

for line in lines:

sw.append(line)

for word in search_list:

if word in sw[1]:

l_pos.append(sw[0])

print(l_pos) # -> ['I want this line1.', 'I want this line2.']

反对回复 2021-08-14

森林海

TA贡献2011条经验获得超2个赞

在您的示例的第二行中，您编写了lv_pos而不是l_pos. 在with声明中，您可以像这样修复它，我认为：

fname_in = "test.txt"

l_pos = []

search_list = ['word1', 'word2']

file_l1 = f.readlines()

for line in range(len(file_l1)):

for word in search_words:

if word in file_l1[line].split(" "):

l_pos.append(file_l1[line - 1])

print(l_pos)

我对这个解决方案并不感到兴奋，但我认为它可以通过最少的修改来修复您的代码。

反对回复 2021-08-14

饮歌长啸

TA贡献1951条经验获得超3个赞

将文件视为成对的line和lines-before的集合：

[prev for prev,this in zip(lines, lines[1:])

if 'word1' in this or 'word2' in this]

#['I want this line1.', 'I want this line2.']

这种方法可以扩展到涵盖任意数量的单词：

words = {'word1', 'word2'}

[prev for prev,this in zip(lines,lines[1:])

if any(word in this for word in words)]

#['I want this line1.', 'I want this line2.']

最后，如果您关心正确的单词而不是出现次数（如"thisisnotword1"），您应该正确地标记行，例如nltk.word_tokenize()：

from nltk import word_tokenize

[prev for prev,this in zip(lines,lines[1:])

if words & set(word_tokenize(this))]

#['I want this line1.', 'I want this line2.']

反对回复 2021-08-14

3 回答
0 关注
149 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

是否可以通过列表中的单词搜索 txt 文件并返回上面的行？

是否可以通过列表中的单词搜索 txt 文件并返回上面的行？

3 回答

添加回答