为什么Python坚持使用ascii？

Python

POPMUISE 2021-03-05 18:14:23

使用“请求和精美的汤”解析HTML文件时，以下行在某些网页上引发异常：if 'var' in str(tag.string):这里是上下文：response = requests.get(url) soup = bs4.BeautifulSoup(response.text.encode('utf-8'))for tag in soup.findAll('script'): if 'var' in str(tag.string): # This is the line throwing the exception print(tag.string)这是例外：UnicodeDecodeError：'ascii'编解码器无法解码位置15的字节0xc3：序数不在范围内（128）我已经尝试过使用和不使用encode('utf-8')该BeautifulSoup行中的函数，这没有什么区别。我确实注意到，对于那些引发异常的页面Ã，即使response.encoding报告的编码为，但javascript的注释中还是有一个字符ISO-8859-1。我确实意识到我可以使用unicodedata.normalize删除有问题的字符，但是我更愿意将tag变量转换为utf-8并保留字符。以下方法均无法将变量更改为utf-8：tag.encode('utf-8')tag.decode('ISO-8859-1').encode('utf-8')tag.decode(response.encoding).encode('utf-8')为了将其转换为可用字符串，我必须怎么做utf-8？

查看完整描述

为什么Python坚持使用ascii？

为什么Python坚持使用ascii？

2 回答

添加回答

热搜

最近搜索清空

为什么Python坚持使用ascii？

为什么Python坚持使用ascii？

2 回答

添加回答