首页猿问 Python 3...

Python 3 Beautifulsoup：获取带有特定文本的span标签值，该文本也随机放置在

Html5

呼如林 2024-01-22 15:50:52

我尝试在这里搜索这个，但老实说找不到答案，因为这应该很容易用 Selenium 来完成，但由于性能是一个重要因素，所以我正在考虑用 Beautifulsoup 来代替。场景：我需要抓取根据用户输入以随机方式生成的不同商品的价格，请参见下面的代码：<div class="sk-expander-content" style="display: block;"><ul> <li> <span>Third Party Liability</span> <span>€756.62</span> </li> <li> <span>Fire & Theft</span> <span>€15.59</span> </li></ul></div>如果这些选项是静态的并且总是显示在 html 中的相同位置，那么很容易抓取价格，但由于这些选项可以放置在中的任何位置div sk-expander-content，我不确定如何以动态方式找到它们。最好的方法是编写一个方法来传递我们正在查找的范围文本并返回欧元值。跨度标签的结构始终相同，第一个跨度始终是商品名称，第二个跨度始终是价格。我首先想到的是下面的代码，但我不确定这是否足够强大或者是否有意义：html = driver.page_sourcesoup = BeautifulSoup(html, "html.parser")div_i_need = soup.find_all("div", class_="sk-expander-content")[1]def price_scraper(text_to_find): for el in div_i_need.find_all(['ul', 'li', 'span']): if el.name == 'span': if el[0].text == text_to_find: return(el[1].text)我们将非常感谢您的帮助。

查看完整描述

2 回答

猛跑小猪

TA贡献1858条经验获得超8个赞

使用正则表达式。

import re

html='''<div class="sk-expander-content" style="display: block;">

<ul>

<li>

<span>Third Party Liability</span>

</li>

<li>

<span>Fire & Theft</span>

</li>

</ul>

</div>

<ul>

<li>

<span>Fire & Theft</span>

</li>

<li>

<span>Third Party Liability</span>

</li>

</ul>

</div>'''

soup = BeautifulSoup(html, "html.parser")

for item in soup.find_all(class_="sk-expander-content"):

for span in item.find_all('span',text=re.compile("€(\d+).(\d+)")):

print(span.find_previous_sibling('span').text)

print(span.text)

输出：

Third Party Liability

€756.62

Fire & Theft

€15.59

Fire & Theft

€756.62

Third Party Liability

€15.59

更新：如果您想获取第一个节点值。然后使用find()而不是find_all()。

import re

html='''<div class="sk-expander-content" style="display: block;">

<ul>

<li>

<span>Third Party Liability</span>

</li>

<li>

<span>Fire & Theft</span>

</li>

</ul>

</div>

<ul>

<li>

<span>Fire & Theft</span>

</li>

<li>

<span>Third Party Liability</span>

</li>

</ul>

</div>'''

soup = BeautifulSoup(html, "html.parser")

for span in soup.find(class_="sk-expander-content").find_all('span',text=re.compile("€(\d+).(\d+)")):

print(span.find_previous_sibling('span').text)

print(span.text)

反对回复 2024-01-22

慕盖茨4494581

TA贡献1850条经验获得超11个赞

from bs4 import BeautifulSoup

import re

html = """

<ul>

<li>

<span>Third Party Liability</span>

</li>

<li>

<span>Fire & Theft</span>

</li>

</ul>

</div>

"""

soup = BeautifulSoup(html, 'html.parser')

target = soup.select("div.sk-expander-content")

for tar in target:

data = [item.text for item in tar.findAll("span", text=re.compile("€"))]

print(data)

输出：

['€756.62', '€15.59']

注意：我使用了selectwhich returnResultSet来查找所有div.

反对回复 2024-01-22

2 回答
0 关注
452 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Python 3 Beautifulsoup：获取带有特定文本的span标签值，该文本也随机放置在

Python 3 Beautifulsoup：获取带有特定文本的span标签值，该文本也随机放置在

2 回答

添加回答