为了账号安全,请及时绑定邮箱和手机立即绑定

如何遍历 Python 中的字符串列表并连接属于标签的字符串?

如何遍历 Python 中的字符串列表并连接属于标签的字符串?

SMILET 2022-11-01 17:13:51
在 Python 3 中遍历元素列表时,如何“隔离”感兴趣的元素之间的内容?我有一个清单:list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]在此列表中,有带有标签 < h > 的元素和其他没有标签的元素。这个想法是具有此标签的元素是“标题”,直到下一个标签的以下元素是它的内容。如何连接属于 header 的列表元素以具有两个相等大小的列表:headers = ["<h1> question 1", "<h1> answer 1", "<h1> question 2", "<h> answer 2"]content = ["question 1 content question 1 more content", "answer 1 content answer 1 more content", "question 2 content", "answer 2 content"]这两个列表的长度相同,在这种情况下,每个列表有 4 个元素。我能够将这些部分分开,但您可以使用一些帮助来完成:list = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]headers = []content = []for i in list:    if "<h1>" in i:        headers.append(i)    if "<h1>" not in i:        tempContent = []        tempContent.append(i)        content.append(tempContent)关于如何组合这些文本以使它们一一对应的任何想法?谢谢!
查看完整描述

2 回答

?
catspeake

TA贡献1111条经验 获得超0个赞

假设在每个标题之后所有元素都是该标题的内容,并且第一个元素始终是标题 - 您可以使用itertools.groupby.


key可以是元素是否具有标题标签,这样标题的内容将在其后分组:


from itertools import groupby


lst = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]


headers = []

content = []


for key, values in groupby(lst, key=lambda x: "<h" in x):

    if key:

        headers.append(*values)

    else:

        content.append(" ".join(values))


print(headers)

print(content)

给出:


['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']

['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content']

您当前方法的问题是您总是只将一项添加到内容中。您要做的是累积temp_content列表,直到遇到下一个标题,然后才添加它并重置:


headers = []

content = []

temp_content = None


for i in list:

    if "<h" in i:

        if temp_content is not None:

            content.append(" ".join(temp_content))

            temp_content = []

        headers.append(i)


    else:

        temp_content.append(i)


查看完整回答
反对 回复 2022-11-01
?
慕勒3428872

TA贡献1848条经验 获得超6个赞

您可以在collections.defaultdict迭代列表时将标题和内容收集到 a 中。然后将键和值拆分为最后headerscontent列表。我们可以通过简单地检查一个字符串来检测标题。str.startswith "<h"

我还使用该continue语句在找到标头后立即进入下一次迭代。也可以在这里只使用一个else语句。

from collections import defaultdict


lst = [

    "<h1> question 1",

    "question 1 content",

    "question 1 more content",

    "<h1> answer 1",

    "answer 1 content",

    "answer 1 more content",

    "<h1> question 2",

    "question 2 content",

    "<h> answer 2",

    "answer 2 content",

]


header_map = defaultdict(list)


header = None

for item in lst:

    if item.startswith("<h"):

        header = item

        continue

    header_map[header].append(item)


headers = list(header_map)

print(headers)


content = [" ".join(v) for v in header_map.values()]

print(content)

输出:


['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']

['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content'


查看完整回答
反对 回复 2022-11-01
  • 2 回答
  • 0 关注
  • 101 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信