首页猿问在 Python 中从 URL...

在 Python 中从 URL 读取 XML 文件

Python

米琪卡哇伊 2023-02-22 15:50:17

我想读取 count 中存在的整数tags。这是我写的代码：import xml.etree.ElementTree as ETimport urllib.request, urllib.parse, urllib.errorfrom bs4 import BeautifulSoupimport sslctx = ssl.create_default_context()ctx.check_hostname = Falsectx.verify_mode = ssl.CERT_NONEurl = 'http://py4e-data.dr-chuck.net/comments_42.xml'content1 = urllib.request.urlopen(url, context = ctx).read()soup = BeautifulSoup(content1, 'html.parser')tree = ET.fromstring(soup)tags = tree.findall('count')print(tags)它抛出一个错误：Traceback (most recent call last): File "C:\Users\Name\Desktop\Py4e\Me\Assi_15_01.py", line 15, in <module> tree = ET.fromstring(soup) File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1320, in XML parser.feed(text)TypeError: a bytes-like object is required, not 'BeautifulSoup'我能做些什么？更多信息：http://py4e-data.dr-chuck.net/comments_42.xml

查看完整描述

2 回答

SMILET

TA贡献1796条经验获得超4个赞

无需使用xml.etree，只需使用<count>BeautifulSoup 选择所有标签即可：

import requests

from bs4 import BeautifulSoup

url = 'http://py4e-data.dr-chuck.net/comments_42.xml'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for c in soup.select('count'):

print(int(c.text))

印刷：

反对回复 2023-02-22

白衣非少年

TA贡献1155条经验获得超0个赞

我认为您不需要使用 ElementTreee。只需将 BeautiflulSoup 更改为使用 lxml 解析器（将“html-parser”更改为“lxml”）并在汤上调用 findall 方法，而不是树（即 soup.findall('count')）。

反对回复 2023-02-22

2 回答
0 关注
174 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

在 Python 中从 URL 读取 XML 文件

在 Python 中从 URL 读取 XML 文件

2 回答

添加回答