首页猿问 Go：二进制编码机制

Go：二进制编码机制

神不在的星期二 2021-08-10 16:12:00

我正在尝试制作一个新的二进制编码包，因为标准的 Go 编码/二进制包并不完全符合我的要求。我不明白的是为什么编码/二进制使用x >>= 7inbinary.PutUvarint而不是x >>= 8. 如果我理解正确，这是故意将位移动 7 而不是 8，这导致存储 uint64 的总大小为 80 位，而不是 64 位，这显然是所需的位数。为什么？这是什么原因？这一定与它生成可变长度字节片这一事实有关，但为什么 >>7 会对此有所帮助？以下是二进制编码函数供您参考：func PutUvarint(buf []byte, x uint64) int { i := 0 for x >= 0x80 { buf[i] = byte(x) | 0x80 x >>= 7 i++ } buf[i] = byte(x) return i + 1}

查看完整描述

2 回答

繁星点点滴滴

TA贡献1803条经验获得超3个赞

编码/二进制/ varint.go

package binary

// This file implements "varint" encoding of 64-bit integers.

// The encoding is:

// - unsigned integers are serialized 7 bits at a time, starting with the

// least significant bits

// - the most significant bit (msb) in each output byte indicates if there

// is a continuation byte (msb = 1)

// - signed integers are mapped to unsigned integers using "zig-zag"

// encoding: Positive values x are written as 2*x + 0, negative values

// are written as 2*(^x) + 1; that is, negative numbers are complemented

// and whether to complement is encoded in bit 0.

// Design note:

// At most 10 bytes are needed for 64-bit values. The encoding could

// be more dense: a full 64-bit value needs an extra byte just to hold bit 63.

// Instead, the msb of the previous byte could be used to hold bit 63 since we

// know there can't be more than 64 bits. This is a trivial improvement and

// would reduce the maximum encoding length to 9 bytes. However, it breaks the

// invariant that the msb is always the "continuation bit" and thus makes the

// format incompatible with a varint encoding for larger numbers (say 128-bit).

一种用于无损数据压缩的经典技术是霍夫曼编码，其中较常见的符号通常比不太常见的符号使用更少的位来表示。在实践中，较小的数字最常出现。因此，如果我们可以有效地表示小数，即使表示大数的效率较低，我们也将获得无损数据压缩。

Uvarints 是一种使用一个或多个字节序列化无符号整数的方法。较小的数字占用较少的字节数。uvarint 中的每个字节，除了最后一个字节，都设置了最高有效位 (msb)——这表明还有更多的字节要来。每个字节的低 7 位用于以 7 位为一组存储数字，最低有效组在前。我们可以对任意位数的无符号整数执行此操作。

例如，

uint bits : uvarint bytes

7 7f : 1 7f

14 3fff : 2 ff7f

21 1fffff : 3 ffff7f

28 fffffff : 4 ffffff7f

35 7ffffffff : 5 ffffffff7f

42 3ffffffffff : 6 ffffffffff7f

49 1ffffffffffff : 7 ffffffffffff7f

56 ffffffffffffff : 8 ffffffffffffff7f

63 7fffffffffffffff : 9 ffffffffffffffff7f

64 ffffffffffffffff : 10 ffffffffffffffffff01

依此类推，对于我们想要的尽可能多的 uint 位。

如果我们有很多数字表示为 1 到 49 位的无符号 64 位整数序列化为 1 到 7 个字节的 uvarints，我们不会关心是否有一些数字表示为 57 到 64 位序列化为 9 到 10 个字节的 uvarint 的无符号 64 位整数。

反对回复 2021-08-10

2 回答
0 关注
274 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Go：二进制编码机制

Go：二进制编码机制

2 回答

添加回答