首页手记 LinkedBlockingQueue之cascadi...

LinkedBlockingQueue之cascading notifies和self-link

标签：

Java

本来以为LinkedBlockingQueue的实现应该很简单，就像ArrayBlockingQueue使用一个锁加一个数组一样，使用一个锁加一个单向链表就可以解决了，翻了下jdk8的实现，发现没有这么简单，看了下类开头的一段注释就有点懵逼了，这里复制如下：

    /*
     * A variant of the "two lock queue" algorithm.  The putLock gates
     * entry to put (and offer), and has an associated condition for
     * waiting puts.  Similarly for the takeLock.  The "count" field
     * that they both rely on is maintained as an atomic to avoid
     * needing to get both locks in most cases. Also, to minimize need
     * for puts to get takeLock and vice-versa, cascading notifies are
     * used. When a put notices that it has enabled at least one take,
     * it signals taker. That taker in turn signals others if more
     * items have been entered since the signal. And symmetrically for
     * takes signalling puts. Operations such as remove(Object) and
     * iterators acquire both locks.
     *
     * Visibility between writers and readers is provided as follows:
     *
     * Whenever an element is enqueued, the putLock is acquired and
     * count updated.  A subsequent reader guarantees visibility to the
     * enqueued Node by either acquiring the putLock (via fullyLock)
     * or by acquiring the takeLock, and then reading n = count.get();
     * this gives visibility to the first n items.
     *
     * To implement weakly consistent iterators, it appears we need to
     * keep all Nodes GC-reachable from a predecessor dequeued Node.
     * That would cause two problems:
     * - allow a rogue Iterator to cause unbounded memory retention
     * - cause cross-generational linking of old Nodes to new Nodes if
     *   a Node was tenured while live, which generational GCs have a
     *   hard time dealing with, causing repeated major collections.
     * However, only non-deleted Nodes need to be reachable from
     * dequeued Nodes, and reachability does not necessarily have to
     * be of the kind understood by the GC.  We use the trick of
     * linking a Node that has just been dequeued to itself.  Such a
     * self-link implicitly means to advance to head.next.
     */

这里涉及到两个问题：

对出队和入队使用两个不同的锁，即putLock和takeLock，为了避免更新元素计数的时候需要同时获取两个锁，这里使用一个AtomicInteger来计数。这样出队入队可以并发执行，提高性能。到这里还是可以理解的，但是里面又提到为了避免在put的时候需要获取takeLock或者take的时候需要获取putLock，使用了cascading notifies，这玩意还是第一次听到。
其实现的iterator是weakly consistent，即弱一致性迭代器，之前有所耳闻，但是也没有怎么深入去了解，看了下API的描述weakly consistent iterators，倒也挺简单，大概意思是迭代器可以和其他操作并发执行，且不会抛出ConcurrentModificationException，即不是fail-fast的；更重要的是遍历过程中的同步修改不一定会体现出来，比如第一个元素已经遍历过，然后被其他线程删除了，对迭代器来说就不可见了。到这里也还是可以理解的，但是上面的注释又提到了为了提高GC性能，使用了self-link，这也是个新名词。

下面结合代码谈谈自己的理解，如有错误之处望指正！

cascading notifies

Google的一下，一无所获；百度倒是拿到了一个知乎的回答，这是一个跟C++有关的，解释很简单，就是会遗漏notify，造成饥饿，然后给了个链接指向上面LinkedBlockingQueue的代码作为实现的参考。看来这是个冷门知识。还好上面的注释中有更详细的解释：

当一个put操作加入了一个元素，至少可以提供一次take操作的时候，就会signal一个taker，这个taker在做take操作的时候会检查从put操作通知他到到他去take这个过程是否有更多元素进来，有的话会去signal其他taker，然后其他taker有继续signal其他的taker，如此往下，形成级联通知(直译，貌似还挺能表达这个意思)。take操作对puter的通知也是类似的操作。

简单说就是puter自己通知puter，taker自己通知taker，这和传统的producer-consumer模式稍有不同。这里以put操作为例解释下代码：

1    public void put(E e) throws InterruptedException {2       if (e == null) throw new NullPointerException();3      int c = -1;4        Node<E> node = new Node<E>(e);5        final ReentrantLock putLock = this.putLock;6       final AtomicInteger count = this.count;7        putLock.lockInterruptibly();8        try {9           while (count.get() == capacity) {10                notFull.await();11            }12            enqueue(node);13            c = count.getAndIncrement();14           if (c + 1 < capacity)15               notFull.signal();16        } finally {17            putLock.unlock();18        }19        if (c == 0)20            signalNotEmpty();
       }

21    public E take() throws InterruptedException {22        E x;23        int c = -1;24        final AtomicInteger count = this.count;25        final ReentrantLock takeLock = this.takeLock;26        takeLock.lockInterruptibly();27        try {28            while (count.get() == 0) {29                notEmpty.await();30            }31            x = dequeue();32            c = count.getAndDecrement();33            if (c > 1)34                notEmpty.signal();35        } finally {36            takeLock.unlock();37       }38        if (c == capacity)39            signalNotFull();40        return x;41    }

    private void signalNotEmpty() {        final ReentrantLock takeLock = this.takeLock;
        takeLock.lock();        try {
            notEmpty.signal();
        } finally {
            takeLock.unlock();
        }
    }

take函数当没有元素的时候会调用notEmpty.await()阻塞(29#)，等待put元素进来，可能会有多个线程在这里阻塞。
现在调用put函数，插入一个元素。12#执行入队操作，13#递增计数，返回的是递增前的计数，如果递增前为0，说明之前队列是空的，调用signalNotEmpty(20#)，通知上一步阻塞的线程，但是这里需要获取到takeLock，而且只能唤醒其中一个线程，如果有多个线程其他继续阻塞。
take从29#唤醒继续执行，31#执行出队操作，32#递减引用计数，返回递减前的计数，然后判断递减前元素是否大于1，如果有则执行34#，唤醒第一步阻塞的线程。如此扩散下去。这里是是cascading notifies的关键，在take里通知其他taker，由于已经是在takeLock里，不需要重新获取。什么时候会满足33#这个条件呢？考虑这种场景：

put操作执行结束，有个take被唤醒之后31#出队执行完，32#递减计数还没有执行，此时count为1
这时候又put一个元素进来，13#递增计数执行结束，count变为2，此时不满足19#的条件，不会唤醒新的taker
take执行32#的递减，由于count为2，此时满足33#这个条件

由于put持有putLock，take持有takeLock，两者可以并发执行，上面的场景是可能出现的。
到这里cascading notifies的原理应该已经清楚了，那么如果不使用cascading notifies会出现什么情况呢？即上面33#34#不做判断会出现什么情况，继续上面的场景，如果第一步有两个take线程阻塞，第二步唤醒了1个，还有一个在阻塞，这时候因为不执行34#，虽然还有一个元素，但是阻塞的线程却不知道，造成饥饿。如果不使用cascading notifies有什么办法可以避免上面的饥饿出现吗？其实很简单，只要去掉19#判断递增前计数不是0，每次都去唤醒就可以，但是这样每次都要在put中获取takeLock，有一定的性能损耗，设计J.U.C的大神为了提高性能真的是煞费苦心啊！

self-link & weakly consistent

这次百度没找到，Google倒是找到了一条链接Self-linking and Latency + Life of a Twitter jvm enginee，但是要翻墙才能看，大概的内容是实现一个单向链表，如果没有使用self-link则在benchmark的时候gc会STW70s，如果使用了self-link则STW基本可以忽略，但是并没有解释具体的原因，而是给出了一个Yotube视频，是一个Twitter JVM工程师做的分享，里面讲了这个问题。这里截几张图围观下（原谅我渣渣的翻墙网速）：

image1

image2

image3

image4

image5

image6

这里其实是涉及到gc的跨代引用，队列已经进入Old Gen（这很正常，队列经常是一个长期对象），新分配的元素是在Young Gen分配：

image中两个新元素A和B入队列，在Young Gen
image2中A和B出队列，新元素C D E入队列，这时候A和B还在Young Gen，在minor gc的时候直接回收掉
image3中C元素进入Old Gen
image4中C元素出队列，但是是在Old Gen，需要Major GC才会回收，而Major GC发生的频率比较低，C会在Old Gen保留比较长时间
image5中D到J都已经出队列，但是由于有Old Gen的C的引用，在minor GC的时候不会回收
image6中D-I全部进入Old Gen

跨代引用造成的后果是大量本应该在Minor GC回收的对象进入Old Gen，在Minor GC的时候需要复制大量的对象，在Major的时候需要回收更多对象，而且还不好并行回收，因此GC压力很大。这里就是开头贴的注释里提到的两个问题：

allow a rogue Iterator to cause unbounded memory retention
cause cross-generational linking of old Nodes to new Nodes if a Node was tenured while live, which generational GCs have a hard time dealing with, causing repeated major collections.

这个问题能不能在GC里解决呢？很难，从上面的过程中可以看到每一步对于GC来说都是很合理的，从GC的角度并没法判断出那些有Old Gen引用的对象是没用的。因此只能在程序中解决。解决的方案其实很简单，而且也经常被使用，只要将出队列的元素的next指向null，比如上面image4中C出队列后指向null，而不是D，这样就消除了跨代引用。
到这里好像问题已经解决了，而且也没有self-link什么事，但是为了实现weakly consistent迭代器，指向null这种方法没法使用。看下LinkedBlockingQueue中迭代器的实现：

static class Node<E> {
    E item;    /**
     * One of:
     * - the real successor Node
     * - this Node, meaning the successor is head.next
     * - null, meaning there is no successor (this is the last node)
     */
    Node<E> next;
    Node(E x) { item = x; }
}public Iterator<E> iterator() {    return new Itr();
}private class Itr implements Iterator<E> {
        Itr() {            // 获取takeLock和putLock
            fullyLock();            try {                // 获取第一个节点，如果不为空则获取到节点的值
                current = head.next;                if (current != null)
                    currentElement = current.item;
            } finally {
                fullyUnlock();
            }
        }        public boolean hasNext() {            return current != null;
        }        public E next() {
            fullyLock();            try {                if (current == null)                    throw new NoSuchElementException();
                E x = currentElement;
                lastRet = current;
                current = nextNode(current);
                currentElement = (current == null) ? null : current.item;                return x;
            } finally {
                fullyUnlock();
            }
        }        private Node<E> nextNode(Node<E> p) {            for (;;) {
                Node<E> s = p.next;                if (s == p)                    return head.next;                if (s == null || s.item != null)                    return s;
                p = s;
            }
        }
}

迭代器是通过一个内部类Itr来实现，构造函数和next函数都需要获取takeLock和putLock，在next函数中会提前设置好current值，hasNext只有判断current是否为空即可。这些都没有特别的地方，关键在nextNode函数的s==p的时候，一个元素的next指向自己，这个时候就返回队列的第一个元素。什么时候会出现这种情况呢？我们看下出队列函数：

private E dequeue() {    // assert takeLock.isHeldByCurrentThread();
    // assert head.item == null;
    Node<E> h = head;
    Node<E> first = h.next;
    h.next = h; // help GC
    head = first;
    E x = first.item;
    first.item = null;    return x;
}

其中h.next=h就是让原来的head元素自己指向自己，即self-link，如果没有其他引用指向他就可以GC回收掉，因此一个出队列的元素就会满足上面nextNode中s==p这个条件，也就是在迭代器生成之后，队列有元素出队列了。考虑这种情况：

队列中最开始有A B C D四个元素
这个时候生成迭代器，current指向A，currentElement值为A
迭代还没开始，A B C出队列，且都是self-link，队列中只剩下D
由于A还有current引用，B和C 没有其他引用，这个时候如果GC了B和C可以回收掉
开始迭代，由于current指向A，不为空，且currentElement的值为A，因此A肯定会输出，然后再输出D，这里就体现了weakly consistent，A已经出队列，但是迭代的时候却还在。

因此一个简单的self-link就解决了上面所说单向链表的跨代GC问题。如果把h.next = h改成h.next = null可以吗？还是考虑上面的情况，在2中current指向A，但是A指向null，3和4都没问题，GC正常；但是5的时候会出问题，current指向A，不为空，且currentElement的值为A，因此A还是会输出；在nextNode(A)函数中Node<E> s = p.next;为null，s==null成立，直接返回null，迭代结束，不会输出D。
总结来说，self-link解决了两个问题：1. GC跨代引用问题 2. 作为已经出队列的元素的标识，这里可以看Node类中的注释，和开头贴的注释的最后一句：self-link含蓄地暗示要跳到head.next。