尝试使用 BeautifulSoup 获取元数据时出现意外结果

好的，这就是我正在尝试做的。我对 Python 还很陌生，我才刚刚掌握它。无论如何，使用这个小工具，我正在尝试从页面中提取数据。在这种情况下，我希望用户输入一个 URL 并让它返回<meta content=" % Likes, % Comments - @% on Instagram: “post description []”" name="description" /> 但是，替换%为帖子的喜欢/评论等数量。这是我的完整代码：from urllib.request import urlopenfrom bs4 import BeautifulSoupimport requestsimport reurl = "https://www.instagram.com/p/BsOGulcndj-/"page2 = requests.get(url)soup2 = BeautifulSoup(page2.content, 'html.parser')result = soup2.findAll('content', attrs={'content': 'description'})print (result)但是每当我运行它时，我都会得到[]. 我究竟做错了什么？

查看完整描述

2 回答

ITMISS

TA贡献1871条经验获得超8个赞

匹配这些标签的正确方法是：

result = soup2.findAll('meta', content=True, attrs={"name": "description"})

但是，html.parser不能<meta>正确解析标签。它没有意识到它们是自闭合的，所以它<head>在结果中包含了其余的大部分。我改为

soup2 = BeautifulSoup(page2.content, 'html5lib')

然后上面搜索的结果是：

[<meta content="46.3m Likes, 2.6m Comments - EGG GANG 🌍 (@world_record_egg) on Instagram: “Let’s set a world record together and get the most liked post on Instagram. Beating the current…”" name="description"/>]

反对回复 2021-10-12

热搜

最近搜索清空

尝试使用 BeautifulSoup 获取元数据时出现意外结果

尝试使用 BeautifulSoup 获取元数据时出现意外结果

2 回答

添加回答