如何在 Python 中迭代 UTF-8？

我如何遍历 utf 8？import stringfor character in string.printable[1:]: print (character)想必 UTF-8 也有类似的方法吗？

查看完整描述

1 回答

红糖糍粑

TA贡献1815条经验获得超6个赞

想必 UTF-8 也有类似的方法吗？

你想知道哪些代码点在 ascii 范围之外是可打印的吗？或者你想要可打印字符的 utf8 编码？

要获取所有 unicode 的所有可打印代码点：

unicode_max = 0x10ffff

printable_glyphs = [ chr(x) for x in range(0, unicode_max+1) if chr(x).isprintable() ]

上面说了，utf8是一种编码。那时文本被映射到特定的字节，以便其他程序可以共享数据。

内存中的文本不是 utf8。每个字符/字形都有一个代码点。

转换为 utf-8

import unicodedata

monkey = unicodedata.lookup('monkey')

print(f"""

glyph: {monkey}

codepoint: Dec: {ord(monkey)}

codepoint: Hex: {hex(ord(monkey))}

utf8: { monkey.encode('utf8', errors='strict') }

utf16: { monkey.encode('utf16', errors='strict') }

utf32: { monkey.encode('utf32', errors='strict') }

""")

输出：

glyph: 🐒

codepoint: Dec: 128018

codepoint: Hex: 0x1f412

utf8: b'\xf0\x9f\x90\x92'

utf16: b'\xff\xfe=\xd8\x12\xdc'

utf32: b'\xff\xfe\x00\x00\x12\xf4\x01\x00'

反对回复 2023-01-04

热搜