上文我们介绍了一致性模型的相关知识,现在我们来考察一下zookeeper的一致性模型。
常见误区
一开始看到网上有人说zookeeper满足了CAP的CP特性,我一直以为zookeeper至少也是Sequential Consistent。那zookeeper自己怎么说的呢?在它文档中,首先它宣称自己是“Sequential Consistency”,不过它的“Sequential Consistency”相比Leslie Lamport老哥的,似乎缩水了,怎么缩的呢?后面它羞羞答答的解释“Updates from a client will be applied in the order that they were sent”,你看,updates是in the order的,read呢,这可没说。
紧接着它来了个免责声明
Sometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is: Simultaneously Consistent Cross-Client Views : ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read. So, ZooKeeper by itself doesn’t guarantee that changes occur synchronously across all servers, but ZooKeeper primitives can be used to construct higher level functions that provide useful client synchronization.
这段话意思就是说,大家有时候对zookeeper期望过高了,zookeeper不能保证同一时刻所有client看到一致的数据。如果有A和B两个客户端,A把某个值更新了,然后它告诉B:“哈,我更新了,你去读吧”,这时B不一定能读到更新后的值。
要知道出现这种现象的原因,要看看zookeeper的实现原理。
在《ZooKeeper: Wait-free coordination for Internet-scale systems》一文中可以看出,zookeeper的所有node都是可以服务client的,我猜测是因为处理session过期、watch这些东西,一个leader真是独木难支。当client连接了一个follower以后,所有的读写请求都发给follower。
对于写请求,follower抱着“事不关己高高挂起”的心态forward给leader。
对于读请求,follower则从本地将数据返回给client。
所以A更新了以后,B客户端如果时乖命蹇,连接的是一个落后的follower,那么就无法读到最新的数据了。
如果client想确保每次到最新的数据,应该调用一个sync之后,再进行读操作,这一点请务必注意。
顺便说一句,zookeeper client为了感知其他client的修改,应该通过watch的形式。
Read-your-own-writes consistency
那么如果A更新了一个数据,它自己能不能读到最新的呢?如果不能,那真是尴尬啊,幸好是可以的。
文章里说
All requests that update ZooKeeper state are forwarded to the leader……The server that
receives the client request responds to the client when it delivers the corresponding state change.
所以A的写请求结束以后,它连接的node已经deliver了state change,以后的读操作顺理成章得到更新后的数据。
真正的Sequential Consistency
在官网中,它只说“Updates from a client will be applied in the order that they were sent”,有没有可能read出现乱序呢?有没有可能后来的read读到更旧的数据?
假设一个client连接到一个最新的follower上,它的一次read读取了最新的数据,然后client由于某种原因,需要重新连接一个zookeeper节点,恰好这次连接到一个stale状态的follower,那么这个client再一次read岂不是读取了旧的数据,造成了历史的倒退?
还好zookeeper还是处理了这种情况,文中指出,首先client记录了自己已经读取的最大的zxid,当client重新连接server的时候,server发现client的zxid比自己大,是不会和client重新建立session的。
If the client connects to a new server, that new server ensures that its view of the ZooKeeper data is at least as recent as the view of the client by checking the last zxid of the client against its last zxid. If the client has a more recent view than the server, the server does not reestablish the session with the client until the server has caught up.
如果client能正确处理zxid的话,我感觉zookeeper也具有Lamport老哥的Sequential Consistency,但是zxid的存储应该是一个比较难以解决的问题,所以zookeeper谨慎一点,把自己的Sequential Consistency缩了一下水。
如何理解Single System Image
zookeeper官网还说它保证了“Single System Image”,其解释为“A client will see the same view of the service regardless of the server that it connects to.”。实际上看来这个解释还是有一点误导性的。其实由上面zxid的原理可以看出,它表达的意思是“client只要连接过一次zookeeper,就不会有历史的倒退”。
这个问题我已经给zookeeper提交了PR。
原文作者:大神带我来搬砖
爱好历史和武侠,专注java、大数据的程序员小哥哥。
学习资料共享,技术问题讨论,希望和大家一起交流进步。
共同学习,写下你的评论
评论加载中...
作者其他优质文章