2 回答
TA贡献2039条经验 获得超7个赞
一个简单的方法是使用zip。尝试:
import requests
from bs4 import BeautifulSoup as BS
source = '''
<h2><a href="..">Title 1</a></h2>
<ol>
<li>Line 1..</li>
<li>Line 2...</li>
</ol>
<h2><a href="..">Title 2</a></h2>
<ol>
<li>Line 2-1..</li>
<li>Line 2-2...</li>
</ol>
'''
html = BS(source, 'html.parser')
for title, element in zip(html.find_all('h2'), html.find_all('ol')):
print(title.text, element.text)
结果:
Title 1
Line 1..
Line 2...
Title 2
Line 2-1..
Line 2-2...
注意:如果数量不同,可以用itertools.zip_longest代替zip。
TA贡献1876条经验 获得超6个赞
另一个解决方案:您可以使用.find_previous:
from bs4 import BeautifulSoup
txt = '''
<h2><a href="..">Title 1</a></h2>
<ol>
<li>Line 1</li>
<li>Line 2</li>
...
</ol>
<h2><a href="..">Title 2</a></h2>
<ol>
<li>Line 2-1</li>
<li>Line 2-2</li>
...
</ol>
'''
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for li in soup.select('ol li'):
out.setdefault(li.find_previous('h2').text, []).append(li.text)
print(out)
印刷:
{'Title 1': ['Line 1', 'Line 2'],
'Title 2': ['Line 2-1', 'Line 2-2']}
添加回答
举报