首页猿问 u'\...

u'\ ufeff'在Python字符串中

Python

撒科打诨 2019-08-27 13:24:32

u'\ ufeff'在Python字符串中我得到了以下模式的错误：UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 155: ordinal not in range(128)不知道是什么u'\ufeff'，它在网络抓取时显示出来。我该如何纠正这种情况？该.replace()字符串的方法不能进行这项工作。

查看完整描述

3 回答

富国沪深

TA贡献1790条经验获得超9个赞

Unicode字符U+FEFF是字节顺序标记或BOM，用于区分大端和小端UTF-16编码。如果使用正确的编解码器解码网页，Python将为您删除它。例子：

#!python2

#coding: utf8

u = u'ABC'

e8 = u.encode('utf-8') # encode without BOM

e8s = u.encode('utf-8-sig') # encode with BOM

e16 = u.encode('utf-16') # encode with BOM

e16le = u.encode('utf-16le') # encode without BOM

e16be = u.encode('utf-16be') # encode without BOM

print 'utf-8 %r' % e8

print 'utf-8-sig %r' % e8s

print 'utf-16 %r' % e16

print 'utf-16le %r' % e16le

print 'utf-16be %r' % e16be

print 'utf-8 w/ BOM decoded with utf-8 %r' % e8s.decode('utf-8')

print 'utf-8 w/ BOM decoded with utf-8-sig %r' % e8s.decode('utf-8-sig')

print 'utf-16 w/ BOM decoded with utf-16 %r' % e16.decode('utf-16')

print 'utf-16 w/ BOM decoded with utf-16le %r' % e16.decode('utf-16le')

请注意，这EF BB BF是一个UTF-8编码的BOM。它不是UTF-8所必需的，但仅作为签名（通常在Windows上）。

输出：

utf-8 'ABC'

utf-8-sig '\xef\xbb\xbfABC'

utf-16 '\xff\xfeA\x00B\x00C\x00' # Adds BOM and encodes using native processor endian-ness.

utf-16le 'A\x00B\x00C\x00'

utf-16be '\x00A\x00B\x00C'

utf-8 w/ BOM decoded with utf-8 u'\ufeffABC' # doesn't remove BOM if present.

utf-8 w/ BOM decoded with utf-8-sig u'ABC' # removes BOM if present.

utf-16 w/ BOM decoded with utf-16 u'ABC' # *requires* BOM to be present.

utf-16 w/ BOM decoded with utf-16le u'\ufeffABC' # doesn't remove BOM if present.

请注意，utf-16编解码器需要 BOM存在，否则Python将不知道数据是大端还是小端。

反对回复 2019-08-27

缥缈止盈

TA贡献2041条经验获得超4个赞

该字符是BOM或“字节顺序标记”。它通常作为文件的前几个字节接收，告诉您如何解释其余数据的编码。您只需删除该字符即可继续。虽然，因为错误说你试图转换为'ascii'，你应该选择另一种编码，无论你想做什么。

反对回复 2019-08-27

3 回答
0 关注
1422 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

u'\ ufeff'在Python字符串中

u'\ ufeff'在Python字符串中

3 回答

添加回答