使用BeautifulSoup从html中提取除script标签内容之外的文本

我有这样的html<span class="age"> Ages 15 <span class="loc" id="loc_loads1"> </span> <script> getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1); </script></span>我正在尝试Age 15使用BeautifulSoup所以我写了python代码如下代码：from bs4 import BeautifulSoup as bsimport urllib3URL = 'html file'http = urllib3.PoolManager()page = http.request('GET', URL)soup = bs(page.data, 'html.parser')age = soup.find("span", {"class": "age"})print(age.text)输出：Age 15 getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);我只想要标签Age 15内的功能script。有没有办法只获取 text: Age 15？或者有什么方法可以排除script标签的内容？PS：script标签太多，URL不同。我不喜欢从输出中替换文本。

查看完整描述

2 回答

幕布斯7119047

TA贡献1794条经验获得超8个赞

用 .find(text=True)

前任：

from bs4 import BeautifulSoup

html = """<span class="age">

Ages 15

</span>

getCurrentLocationVal("loc_loads1",29.45218856,59.38139268,1);

</script>

</span>"""

soup = BeautifulSoup(html, "html.parser")

print(soup.find("span", {"class": "age"}).find(text=True).strip())

输出：

Ages 15

反对回复 2021-09-11

临摹微笑

TA贡献1982条经验获得超2个赞

迟到的答案，但为了将来参考，您还可以使用分解（）从中删除所有script元素html，即：

soup = BeautifulSoup(html, "html.parser")

# remove script and style elements

for script in soup(["script", "style"]):

script.decompose()

print(soup.find("span", {"class": "age"}).text.strip())

# Ages 15

反对回复 2021-09-11

热搜

最近搜索清空

使用BeautifulSoup从html中提取除script标签内容之外的文本

使用BeautifulSoup从html中提取除script标签内容之外的文本

2 回答

添加回答