2 回答
TA贡献1858条经验 获得超8个赞
使用正则表达式。
import re
html='''<div class="sk-expander-content" style="display: block;">
<ul>
<li>
<span>Third Party Liability</span>
<span>€756.62</span>
</li>
<li>
<span>Fire & Theft</span>
<span>€15.59</span>
</li>
</ul>
</div>
<div class="sk-expander-content" style="display: block;">
<ul>
<li>
<span>Fire & Theft</span>
<span>€756.62</span>
</li>
<li>
<span>Third Party Liability</span>
<span>€15.59</span>
</li>
</ul>
</div>'''
soup = BeautifulSoup(html, "html.parser")
for item in soup.find_all(class_="sk-expander-content"):
for span in item.find_all('span',text=re.compile("€(\d+).(\d+)")):
print(span.find_previous_sibling('span').text)
print(span.text)
输出:
Third Party Liability
€756.62
Fire & Theft
€15.59
Fire & Theft
€756.62
Third Party Liability
€15.59
更新:如果您想获取第一个节点值。然后使用find()而不是find_all()。
import re
html='''<div class="sk-expander-content" style="display: block;">
<ul>
<li>
<span>Third Party Liability</span>
<span>€756.62</span>
</li>
<li>
<span>Fire & Theft</span>
<span>€15.59</span>
</li>
</ul>
</div>
<div class="sk-expander-content" style="display: block;">
<ul>
<li>
<span>Fire & Theft</span>
<span>€756.62</span>
</li>
<li>
<span>Third Party Liability</span>
<span>€15.59</span>
</li>
</ul>
</div>'''
soup = BeautifulSoup(html, "html.parser")
for span in soup.find(class_="sk-expander-content").find_all('span',text=re.compile("€(\d+).(\d+)")):
print(span.find_previous_sibling('span').text)
print(span.text)
TA贡献1850条经验 获得超11个赞
from bs4 import BeautifulSoup
import re
html = """
<div class="sk-expander-content" style="display: block;">
<ul>
<li>
<span>Third Party Liability</span>
<span>€756.62</span>
</li>
<li>
<span>Fire & Theft</span>
<span>€15.59</span>
</li>
</ul>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
target = soup.select("div.sk-expander-content")
for tar in target:
data = [item.text for item in tar.findAll("span", text=re.compile("€"))]
print(data)
输出:
['€756.62', '€15.59']
注意:我使用了selectwhich returnResultSet来查找所有div.
- 2 回答
- 0 关注
- 136 浏览
添加回答
举报