首页猿问如何遍历 Python...

如何遍历 Python 中的字符串列表并连接属于标签的字符串？

Python

SMILET 2022-11-01 17:13:51

在 Python 3 中遍历元素列表时，如何“隔离”感兴趣的元素之间的内容？我有一个清单：list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]在此列表中，有带有标签 < h > 的元素和其他没有标签的元素。这个想法是具有此标签的元素是“标题”，直到下一个标签的以下元素是它的内容。如何连接属于 header 的列表元素以具有两个相等大小的列表：headers = ["<h1> question 1", "<h1> answer 1", "<h1> question 2", "<h> answer 2"]content = ["question 1 content question 1 more content", "answer 1 content answer 1 more content", "question 2 content", "answer 2 content"]这两个列表的长度相同，在这种情况下，每个列表有 4 个元素。我能够将这些部分分开，但您可以使用一些帮助来完成：list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]headers = []content = []for i in list: if "<h1>" in i: headers.append(i) if "<h1>" not in i: tempContent = [] tempContent.append(i) content.append(tempContent)关于如何组合这些文本以使它们一一对应的任何想法？谢谢！

查看完整描述

2 回答

catspeake

TA贡献1111条经验获得超0个赞

假设在每个标题之后所有元素都是该标题的内容，并且第一个元素始终是标题 - 您可以使用itertools.groupby.

key可以是元素是否具有标题标签，这样标题的内容将在其后分组：

from itertools import groupby

lst = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]

headers = []

content = []

for key, values in groupby(lst, key=lambda x: "<h" in x):

if key:

headers.append(*values)

else:

content.append(" ".join(values))

print(headers)

print(content)

给出：

['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']

['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content']

您当前方法的问题是您总是只将一项添加到内容中。您要做的是累积temp_content列表，直到遇到下一个标题，然后才添加它并重置：

headers = []

content = []

temp_content = None

for i in list:

if "<h" in i:

if temp_content is not None:

content.append(" ".join(temp_content))

temp_content = []

headers.append(i)

else:

temp_content.append(i)

反对回复 2022-11-01

慕勒3428872

TA贡献1848条经验获得超6个赞

您可以在collections.defaultdict迭代列表时将标题和内容收集到 a 中。然后将键和值拆分为最后headers的content列表。我们可以通过简单地检查一个字符串来检测标题。str.startswith "<h"

我还使用该continue语句在找到标头后立即进入下一次迭代。也可以在这里只使用一个else语句。

from collections import defaultdict

lst = [

"<h1> question 1",

"question 1 content",

"question 1 more content",

"<h1> answer 1",

"answer 1 content",

"answer 1 more content",

"<h1> question 2",

"question 2 content",

"<h> answer 2",

"answer 2 content",

]

header_map = defaultdict(list)

header = None

for item in lst:

if item.startswith("<h"):

header = item

continue

header_map[header].append(item)

headers = list(header_map)

print(headers)

content = [" ".join(v) for v in header_map.values()]

print(content)

输出：

['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']

['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content'

反对回复 2022-11-01

2 回答
0 关注
110 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何遍历 Python 中的字符串列表并连接属于标签的字符串？

如何遍历 Python 中的字符串列表并连接属于标签的字符串？

2 回答

添加回答