Python Beautiful Soup - 删除的标签仍然影响输出

你好，from bs4 import BeautifulSouphtml = 'This is a test.'soup = BeautifulSoup(html, 'lxml')print(soup)for tag in soup.find_all('i'): tag.replace_with('is')print(soup)print("\n")print(soup.prettify())print("\n")for string in soup.stripped_strings: print(string)该程序输出以下内容：<html><body>This is a test.</body></html><html><body>This is a test.</body></html><html> <body> This is a test. </body></html>Thisisa test为什么呢？为什么字符串仍然分为三部分，就好像删除的标签仍然存在一样？如果我使用This is a test.（这是我替换标签后的输出）作为我的起始 html，一切都工作正常。我究竟做错了什么？提前致谢

查看完整描述

1 回答

守着星空守着你

TA贡献1799条经验获得超8个赞

看起来它替换is为is，但它没有替换树中的节点，并且它仍然is作为树中的单独项目运行。

您必须将树转换为字符串并再次解析它才能将其作为树中的单个节点。

html = str(soup)

#print(html)

soup = BeautifulSoup(html, 'lxml')

如果您希望文本作为一个字符串那么您可以尝试get_text(strip=True, separator=" ")

from bs4 import BeautifulSoup

html = 'This is a test.'

soup = BeautifulSoup(html, 'lxml')

print(soup.get_text(strip=True, separator=" "))

反对回复 2023-07-11

热搜

最近搜索清空

Python Beautiful Soup - 删除的标签仍然影响输出

Python Beautiful Soup - 删除的标签仍然影响输出

1 回答

添加回答