避免'字符参数不在范围内'python3解码

我正在尝试解码对requests.get()特定 url 的调用内容。导致问题的 url 在代码的多次运行中并不总是相同的，但是产生问题的请求内容的部分具有三个反斜杠，这在使用unicode-escape.作为在 Python 3.6.1 中运行的代码的简化版本r=b'\xf0\\\xebI'r.decode('unicode-escape').strip().replace('{','\n')产生以下错误：OverflowError: character argument not in range(0x110000)The above exception was the direct cause of the following exception:Traceback (most recent call last): File "<stdin>", line 1, in <module>OverflowError: decoding with 'unicode-escape' codec failed (OverflowError: character argument not in range(0x110000))我想跳过产生错误的部分。我是一个新手 python 程序员，所以非常感谢任何帮助。

查看完整描述

2 回答

慕姐8265434

TA贡献1813条经验获得超2个赞

这些步骤应该适用于您的情况

In [1]: r=b'\xf0\\\xebI'

#Decode to utf-8 using backslashreplace

In [2]: x=r.decode('utf-8', errors='backslashreplace')

In [3]: x

Out[3]: '\\xf0\\\\xebI'

#Replace the extra backslash

In [4]: y = x.replace('\\\\','\\')

In [5]: y

Out[5]: '\\xf0\\xebI'

#Encode to ascii and decode to unicode-escape

In [6]: z = y.encode('ascii').decode('unicode-escape')

In [7]: z

Out[7]: 'ðëI'

请注意，这也适用于双斜杠，您的正常情况

r=b'\xf0\\xebI'

x=r.decode('utf-8', errors='backslashreplace')

y = x.replace('\\\\','\\')

z = y.encode('ascii').decode('unicode-escape')

print(z)

#ðëI

反对回复 2022-01-18

炎炎设计

TA贡献1808条经验获得超4个赞

数据似乎被编码为 latin-1 *，因此最简单的解决方案是解码然后删除反斜杠。

>>> r=b'\xf0\\\xebI'

>>> r.decode('latin-1').replace('\\', '')

'ðëI'

*我猜是 latin-1（也称为 ISO-8859-1）——响应的内容类型标头应该指定使用的编码，它可能是其他 ISO-8859-* 编码之一。

反对回复 2022-01-18

热搜

最近搜索清空

避免'字符参数不在范围内'python3解码

避免'字符参数不在范围内'python3解码

2 回答

添加回答