2 回答
TA贡献1111条经验 获得超0个赞
假设在每个标题之后所有元素都是该标题的内容,并且第一个元素始终是标题 - 您可以使用itertools.groupby.
key可以是元素是否具有标题标签,这样标题的内容将在其后分组:
from itertools import groupby
lst = ["<h1> question 1", "question 1 content", "question 1 more content", "<h1> answer 1", "answer 1 content", "answer 1 more content", "<h1> question 2", "question 2 content", "<h> answer 2", "answer 2 content"]
headers = []
content = []
for key, values in groupby(lst, key=lambda x: "<h" in x):
if key:
headers.append(*values)
else:
content.append(" ".join(values))
print(headers)
print(content)
给出:
['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']
['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content']
您当前方法的问题是您总是只将一项添加到内容中。您要做的是累积temp_content列表,直到遇到下一个标题,然后才添加它并重置:
headers = []
content = []
temp_content = None
for i in list:
if "<h" in i:
if temp_content is not None:
content.append(" ".join(temp_content))
temp_content = []
headers.append(i)
else:
temp_content.append(i)
TA贡献1848条经验 获得超6个赞
您可以在collections.defaultdict
迭代列表时将标题和内容收集到 a 中。然后将键和值拆分为最后headers
的content
列表。我们可以通过简单地检查一个字符串来检测标题。str.startswith
"<h"
我还使用该continue
语句在找到标头后立即进入下一次迭代。也可以在这里只使用一个else
语句。
from collections import defaultdict
lst = [
"<h1> question 1",
"question 1 content",
"question 1 more content",
"<h1> answer 1",
"answer 1 content",
"answer 1 more content",
"<h1> question 2",
"question 2 content",
"<h> answer 2",
"answer 2 content",
]
header_map = defaultdict(list)
header = None
for item in lst:
if item.startswith("<h"):
header = item
continue
header_map[header].append(item)
headers = list(header_map)
print(headers)
content = [" ".join(v) for v in header_map.values()]
print(content)
输出:
['<h1> question 1', '<h1> answer 1', '<h1> question 2', '<h> answer 2']
['question 1 content question 1 more content', 'answer 1 content answer 1 more content', 'question 2 content', 'answer 2 content'
添加回答
举报