Rust源码分析：crossbeam之ms_queue(1)

name = "crossbeam"
version = "0.4.1"

crossbeam提供了一系列的并发数据结构。本文详细剖析其中的并发队列，也就是经典的msqueue。当然，由于Rust没有垃圾回收器，所以crossbeam需要在并发环境下，回收堆上的内存。这一部分任务主要由crossbeam-epoch来完成。由于篇幅所限，本文介绍构造msqueue的数据结构与算法，其他文章介绍crossbeam-epoch。

MsQueue<T>

我们来看构造函数，

#[derive(Debug)]
pub struct MsQueue<T> {
    head: CachePadded<Atomic<Node<T>>>,
    tail: CachePadded<Atomic<Node<T>>>,
}
impl<T> MsQueue<T> {
    /// Create a new, empty queue.
    pub fn new() -> MsQueue<T> {
        let q = MsQueue {
            head: CachePadded::new(Atomic::null()),
            tail: CachePadded::new(Atomic::null()),
        };
        let sentinel = Owned::new(Node {
            payload: Payload::Data(unsafe { mem::uninitialized() }),
            next: Atomic::null(),
        });
        let guard = epoch::pin();
        let sentinel = sentinel.into_shared(&guard);
        q.head.store(sentinel, Relaxed);
        q.tail.store(sentinel, Relaxed);
        q
    }

可以看到一个MsQueue包含{head, tail}，同时head跟tail都作为CachePadded存在，而一个CachePadded包含一个Atomic。

再然后我们构造了一个sentinel，作为Owned存在。Owned里面包含节点Node。而Node里面包含了payload（大概是我们要存放的数据），next应该是指向下一个的节点：当然这个是Atomic。

"let guard = epoch::pin()"，这一行是为了配合内存回收的，我们暂时不关注。

接着的是

let sentinel = sentinel.into_shared(&guard);

sentinel 意思是哨兵，应该只是用于构造初始化队列，我们来看看这一行具体做了什么：

Owned<T>里面有个into_shared方法：

impl<T> Owned<T> {
.....
    pub fn into_shared<'g>(self, _: &'g Guard) -> Shared<'g, T> {
        unsafe { Shared::from_usize(self.into_usize()) }
    }
}

简单点说就是从一个Owned<T>得到了一个Shared<'g, T>，顾名思义，这里似乎把一个独享数据变为共享数据吧？我们再看看Owned.into_usize方法以及Owned本身的结构：

/// A trait for either `Owned` or `Shared` pointers.
pub trait Pointer<T> {
    /// Returns the machine representation of the pointer.
    fn into_usize(self) -> usize;

    /// Returns a new pointer pointing to the tagged pointer `data`.
    unsafe fn from_usize(data: usize) -> Self;
}

pub struct Owned<T> {
    data: usize,
    _marker: PhantomData<Box<T>>,
}

impl<T> Pointer<T> for Owned<T> {
    #[inline]
    fn into_usize(self) -> usize {
        let data = self.data;
        mem::forget(self);
        data
    }
    .......
}

也就是说Owned<T>其实只是包含一个data：usize（可作为地址）和一个_marker（PhantomData）。而into_usize方法也只是返回了data。

这里的mem::forget(self)，这是主要是禁止执行self对象的drop方法。

原因在于我们把回收Node节点堆内存的工作放到了Owned对象的drop方法里。

这部分内容，在分析crossbeam-epoch时详细说明。

我们再来看看Shared::from_usize方法与Shared本身的结构：

pub struct Shared<'g, T: 'g> {
    data: usize,
    _marker: PhantomData<(&'g (), *const T)>,
}

impl<'g, T> Pointer<T> for Shared<'g, T> {
    #[inline]
    fn into_usize(self) -> usize {
        self.data
    }

    #[inline]
    unsafe fn from_usize(data: usize) -> Self {
        Shared {
            data: data,
            _marker: PhantomData,
        }
    }
}

所以我们可以看到，Shared与Owned几乎一模一样，除了shared多了个类型参数'g。

而这里的from_usize也只是从一个usize构造了一个Shared。

综合起来"into_shared"，只是从一个Owned本身传递一个data，然后基于它构造了一个Shared，同时这个Owned跟Shared结构完全一样除了类型参数。

接着看下面的代码：

        let sentinel = sentinel.into_shared(&guard);
        q.head.store(sentinel, Relaxed);
        q.tail.store(sentinel, Relaxed);
        q

这里的意思很明显了，就是把head和tail设置为sentinel。我们来看看实际的store做了什么：

    pub fn store<'g, P: Pointer<T>>(&self, new: P, ord: Ordering) {
        self.data.store(new.into_usize(), ord);
    }

Atomic的结构是：

pub struct Atomic<T> {
    data: AtomicUsize,
    _marker: PhantomData<*mut T>,
}

所以这里实质上只是把Owned或者Shared（注意他们俩都是实现了Pointer）含有的usize，放到自己的AtomicUsize里面，从而具备了原子操作的能力 head 和 tail。实际上还是拿到这个地址而已。(usize本质是Node的地址）。

好了，我们接着看MSQueue提供的操作：pub fn push(&self, t: T) 和 pub fn pop(&self) -> T。

pub fn push(&self, t: T)

    /// Add `t` to the back of the queue, possibly waking up threads
    /// blocked on `pop`.
    pub fn push(&self, t: T) {
        /// We may or may not need to allocate a node; once we do,
        /// we cache that allocation.
        enum Cache<T> {
            Data(T),
            Node(Owned<Node<T>>),
        }

        impl<T> Cache<T> {
            /// Extract the node if cached, or allocate if not.
            fn into_node(self) -> Owned<Node<T>> {
                match self {
                    Cache::Data(t) => Owned::new(Node {
                        payload: Payload::Data(ManuallyDrop::new(t)),
                        next: Atomic::null(),
                    }),
                    Cache::Node(n) => n,
                }
            }

            /// Extract the data from the cache, deallocating any cached node.
            fn into_data(self) -> T {
                match self {
                    Cache::Data(t) => t,
                    Cache::Node(node) => match (*node.into_box()).payload {
                        Payload::Data(t) => ManuallyDrop::into_inner(t),
                        _ => unreachable!(),
                    },
                }
            }
        }

        let mut cache = Cache::Data(t); // don't allocate up front
        let guard = epoch::pin();

        loop {
            // We push onto the tail, so we'll start optimistically by looking
            // there first.
            let tail_shared = self.tail.load(Acquire, &guard);
            let tail_ref = unsafe { tail_shared.as_ref() }.unwrap();

            // Is the queue in Data mode (empty queues can be viewed as either mode)?
            if tail_ref.is_data() || self.head.load(Relaxed, &guard) == tail_shared {
                // Attempt to push onto the `tail` snapshot; fails if
                // `tail.next` has changed, which will always be the case if the
                // queue has transitioned to blocking mode.
                match self.push_internal(&guard, tail_shared, cache.into_node()) {
                    Ok(_) => return,
                    Err(n) => {
                        // replace the cache, retry whole thing
                        cache = Cache::Node(n)
                    }
                }
            } else {
                // Queue is in blocking mode. Attempt to unblock a thread.
                let head_shared = self.head.load(Acquire, &guard);
                let head = unsafe { head_shared.as_ref() }.unwrap();
                // Get a handle on the first blocked node. Racy, so queue might
                // be empty or in data mode by the time we see it.
                let next_shared = head.next.load(Acquire, &guard);
                let request = unsafe { next_shared.as_ref() }.and_then(|next| match next.payload {
                    Payload::Blocked(signal) => Some((next_shared, signal)),
                    Payload::Data(_) => None,
                });
                if let Some((blocked_node, signal)) = request {
                    // race to dequeue the node
                    if self.head
                        .compare_and_set(head_shared, blocked_node, Release, &guard)
                        .is_ok()
                    {
                        unsafe {
                            // signal the thread
                            (*signal).data = Some(cache.into_data());
                            let thread = (*signal).thread.clone();

                            (*signal).ready.store(true, Release);
                            thread.unpark();
                            guard.defer(move || head_shared.into_owned());
                            return;
                        }
                    }
                }
            }
        }
    }

首先是有一个数据结构Cache：

        enum Cache<T> {
            Data(T),
            Node(Owned<Node<T>>),
        }

在我们的用法中，它的主要意义是如果创建了一次Cache::Node，那么这个Owned<Node<T>>就会被我们保留着用，从而更节省。

接下来Cache提供了两个方法：into_node跟into_data。

        impl<T> Cache<T> {
            /// Extract the node if cached, or allocate if not.
            fn into_node(self) -> Owned<Node<T>> {
                match self {
                    Cache::Data(t) => Owned::new(Node {
                        payload: Payload::Data(ManuallyDrop::new(t)),
                        next: Atomic::null(),
                    }),
                    Cache::Node(n) => n,
                }
            }

            /// Extract the data from the cache, deallocating any cached node.
            fn into_data(self) -> T {
                match self {
                    Cache::Data(t) => t,
                    Cache::Node(node) => match (*node.into_box()).payload {
                        Payload::Data(t) => ManuallyDrop::into_inner(t),
                        _ => unreachable!(),
                    },
                }
            }
        }

从into_node可以看出假如本身已经是Cache::Node了，那么我们就不需要构造Owned了。

而into_data用于已经构造好了Cache对象，再从中抽取数据T的情况。

我们接下来进入具体的逻辑部分：

        let mut cache = Cache::Data(t); // don't allocate up front
        let guard = epoch::pin();

首先把数据 t 封装到一个Cache::Data里。

“let guard = epoch::pin();”跟内存回收相关的，我们先不考虑。

接下来是我们的loop主逻辑：

        loop {
            // We push onto the tail, so we'll start optimistically by looking
            // there first.
            let tail_shared = self.tail.load(Acquire, &guard);
            let tail_ref = unsafe { tail_shared.as_ref() }.unwrap();

            // Is the queue in Data mode (empty queues can be viewed as either mode)?
            if tail_ref.is_data() || self.head.load(Relaxed, &guard) == tail_shared {
                // Attempt to push onto the `tail` snapshot; fails if
                // `tail.next` has changed, which will always be the case if the
                // queue has transitioned to blocking mode.
                match self.push_internal(&guard, tail_shared, cache.into_node()) {
                    Ok(_) => return,
                    Err(n) => {
                        // replace the cache, retry whole thing
                        cache = Cache::Node(n)
                    }
                }
            } else {
                // Queue is in blocking mode. Attempt to unblock a thread.
                let head_shared = self.head.load(Acquire, &guard);
                let head = unsafe { head_shared.as_ref() }.unwrap();
                // Get a handle on the first blocked node. Racy, so queue might
                // be empty or in data mode by the time we see it.
                let next_shared = head.next.load(Acquire, &guard);
                let request = unsafe { next_shared.as_ref() }.and_then(|next| match next.payload {
                    Payload::Blocked(signal) => Some((next_shared, signal)),
                    Payload::Data(_) => None,
                });
                if let Some((blocked_node, signal)) = request {
                    // race to dequeue the node
                    if self.head
                        .compare_and_set(head_shared, blocked_node, Release, &guard)
                        .is_ok()
                    {
                        unsafe {
                            // signal the thread
                            (*signal).data = Some(cache.into_data());
                            let thread = (*signal).thread.clone();

                            (*signal).ready.store(true, Release);
                            thread.unpark();
                            guard.defer(move || head_shared.into_owned());
                            return;
                        }
                    }
                }
            }
        }

注意，从主要的算法上来说，对于这个非阻塞的mpmc队列，由于多并发环境下的不确定性，所以假如执行完成的话，是通过一个原子操作(这里是compareAndSwap系列)来完成。失败的话，那么就重新读取最新数据，再试一次。所以这里外层有个loop。

稍微要说一下，作者实现的这个队列，并不是一个单纯的，每个调用都有返回的并发队列。而是主动加入了用户态调度 (park/unpark) 的接口。对于push接口而言没什么差别，因为push用于都是塞入一个数据。但是pop接口的话有两种，如下：

    /// Attempt to dequeue from the front.
    ///
    /// Returns `None` if the queue is observed to be empty.
    pub fn try_pop(&self) -> Option<T> {

    /// Dequeue an element from the front of the queue, blocking if the queue is
    /// empty.
    pub fn pop(&self) -> T {

从注释中我们可以看出：

try_pop(&self)-> Option<T>，有数据则返回，或者没数据返回None。
pop(&self)-> T，有数据则返回，或者等待，直到队列被塞入数据再返回（所以这里的返回值是T）。

那么这个究竟是怎么实现的呢？

简单点来说，陷入等待的pop调用会在等待之前塞入一个Node，这个节点不同于之前介绍的存有数据的Node，里面包含了pop线程的handler（thread）、判断该次调用可否返回数据（ready）、返回数据本身这些信息（data）。所以Node提供一个 is_data 方法来判断是否为数据，以下为相应的代码：

#[derive(Debug)]
struct Node<T> {
    payload: Payload<T>,
    next: Atomic<Node<T>>,
}

#[derive(Debug)]
enum Payload<T> {
    /// A node with actual data that can be popped.
    Data(ManuallyDrop<T>),
    /// A node representing a blocked request for data.
    Blocked(*mut Signal<T>),
}

/// A blocked request for data, which includes a slot to write the data.
#[derive(Debug)]
struct Signal<T> {
    /// Thread to unpark when data is ready.
    thread: Thread,
    /// The actual data, when available.
    data: Option<T>,
    /// Is the data ready? Needed to cope with spurious wakeups.
    ready: AtomicBool,
}

impl<T> Node<T> {
    fn is_data(&self) -> bool {
        if let Payload::Data(_) = self.payload {
            true
        } else {
            false
        }
    }
}

这里的Payload::Data 给push使用，Payload::Blocked 给pop等待时使用。

同时Signal用于传递信息。

OK，说了这么一大堆，对于我们push来说，一定要区分队列中都是数据，或者队列中都是等待着的pop"调用"，这两种情况。而这两种情况是通过tail节点来区分的。

我们来看代码：

            // We push onto the tail, so we'll start optimistically by looking
            // there first.
            let tail_shared = self.tail.load(Acquire, &guard);
            let tail_ref = unsafe { tail_shared.as_ref() }.unwrap();

先是通过tail(Atomic），来得到Shared对象。（如果是第一次调用的话，它应该是sentinel）。接着Shared通过as_ref()方法，以及unwrap方法：

    pub fn as_raw(&self) -> *const T {
        let (raw, _) = decompose_data::<T>(self.data);
        raw
    }

    pub unsafe fn as_ref(&self) -> Option<&'g T> {
        self.as_raw().as_ref()
    }

从而得到了我们的T对象tail_ref，这里就是Node。

接下来是区分队列中究竟是存有数据(包含空队列的情况)，还是已经有等待的pop调用：

            // Is the queue in Data mode (empty queues can be viewed as either mode)?
            if tail_ref.is_data() || self.head.load(Relaxed, &guard) == tail_shared {
                       ..............
            } else {
                       ..............
            }

Data Mode

我们先来看对应数据模式的情况：

                // Attempt to push onto the `tail` snapshot; fails if
                // `tail.next` has changed, which will always be the case if the
                // queue has transitioned to blocking mode.
                match self.push_internal(&guard, tail_shared, cache.into_node()) {
                    Ok(_) => return,
                    Err(n) => {
                        // replace the cache, retry whole thing
                        cache = Cache::Node(n)
                    }
                }

注意这里终于调用了into_node方法，同时我也发现这是唯一调用into_node的地方。如果返回Ok(_)的话，忽略返回值，成功结束。如果返回Err(n)的话，我们要重新构造cache，并且重试，注意这里的n显然就是之前构造好的Owned<Node<T>>。

我们接着看push_internal方法：

    /// Attempt to atomically place `n` into the `next` pointer of `onto`.
    ///
    /// If unsuccessful, returns ownership of `n`, possibly updating
    /// the queue's `tail` pointer.
    fn push_internal(
        &self,
        guard: &epoch::Guard,
        onto: Shared<Node<T>>,
        n: Owned<Node<T>>,
    ) -> Result<(), Owned<Node<T>>> {
        // is `onto` the actual tail?
        let next_atomic = &unsafe { onto.as_ref() }.unwrap().next;
        let next_shared = next_atomic.load(Acquire, guard);
        if unsafe { next_shared.as_ref() }.is_some() {
            // if not, try to "help" by moving the tail pointer forward
            let _ = self.tail.compare_and_set(onto, next_shared, Release, guard);
            Err(n)
        } else {
            // looks like the actual tail; attempt to link in `n`
            next_atomic
                .compare_and_set(Shared::null(), n, Release, guard)
                .map(|shared| {
                    // try to move the tail pointer forward
                    let _ = self.tail.compare_and_set(onto, shared, Release, guard);
                })
                .map_err(|e| e.new)
        }
    }

这里的onto使我们的tail节点，n是我们构造的Owned<Node<T>>节点。

首先是

let next_atomic = &unsafe { onto.as_ref() }.unwrap().next;

我们直接得到next_atomic，也就是tail节点中的next这个Atomic属性。紧接着我们又通过next_atomic.load得到其中的Shared对象，注意假如这个next_atomic是由Atomic::null()构造的，那么目前Node对象还是不存在的。当然我们这里依然可以从next_atomic成功构造出next_shared对象：

let next_shared = next_atomic.load(Acquire, guard);

但是接下来这一行代码是重点：

        if unsafe { next_shared.as_ref() }.is_some() {
            ...............
        } else {
            ...............
        }

next_shared对象在调用as_ref之后返回了一个Option对象，里面包裹着Node。

所以这里调用is_some()进行区分，假如Some<T>的话则说明已经被其他线程添加了下一个节点，所以我们只需要更新tail节点并且重试。假如None的话我们才能进行继续工作，看一下源代码：

        if unsafe { next_shared.as_ref() }.is_some() {
            // if not, try to "help" by moving the tail pointer forward
            let _ = self.tail.compare_and_set(onto, next_shared, Release, guard);
            Err(n)
        } else {
            // looks like the actual tail; attempt to link in `n`
            next_atomic
                .compare_and_set(Shared::null(), n, Release, guard)
                .map(|shared| {
                    // try to move the tail pointer forward
                    let _ = self.tail.compare_and_set(onto, shared, Release, guard);
                })
                .map_err(|e| e.new)
        }

这里的重点是Atomic.compare_and_set方法。同时我们看到了两种map的处理方式：

map(...)：用新的尾节点shared替换掉久的尾节点onto，因为shared已经被链接到onto的下一个节点，这件事情一定会完成。
map_err(...)：说明compare_and_set操作失败，"|e| e.new"，此时我们将重新拿到之前构造的Owned<Node<T>>，并返回。

我们再来看Atomic.compare_and_set方法：

    pub fn compare_and_set<'g, O, P>(
        &self,
        current: Shared<T>,
        new: P,
        ord: O,
        _: &'g Guard,
    ) -> Result<Shared<'g, T>, CompareAndSetError<'g, T, P>>
    where
        O: CompareAndSetOrdering,
        P: Pointer<T>,
    {
        let new = new.into_usize();
        self.data
            .compare_exchange(current.into_usize(), new, ord.success(), ord.failure())
            .map(|_| unsafe { Shared::from_usize(new) })
            .map_err(|current| unsafe {
                CompareAndSetError {
                    current: Shared::from_usize(current),
                    new: P::from_usize(new),
                }
            })
    }

注意，这里的current就是Shared::null()，作为我们的预期。这里的new就是我们的Owned<Node<T>>，而这里是采用了P: Pointer<T>。通过实现Pointer这个trait（Owned，Shared），能达到泛型的目的：

/// A trait for either `Owned` or `Shared` pointers.
pub trait Pointer<T> {
    /// Returns the machine representation of the pointer.
    fn into_usize(self) -> usize;

    /// Returns a new pointer pointing to the tagged pointer `data`.
    unsafe fn from_usize(data: usize) -> Self;
}

我们看到首先是：

let new = new.into_usize();

这里new作为usize，也就是Node的地址。

然后是：

        self.data
            .compare_exchange(current.into_usize(), new, ord.success(), ord.failure())
            .map(|_| unsafe { Shared::from_usize(new) })
            .map_err(|current| unsafe {
                CompareAndSetError {
                    current: Shared::from_usize(current),
                    new: P::from_usize(new),
                }
            })

这里的self.data为标准库的AtomicUsize，通过compare_exchange方法来实现原子的比较并交换。所以这里的current.into_usize()跟new都作为usize存在，从而对两个数值进行比较，而且底层通过指令来实现原子操作。

多说一句，我们看下AtomicUsize.compare_exchange的返回类型：

pub fn compare_exchange(
    &self, 
    current: usize, 
    new: usize, 
    success: Ordering, 
    failure: Ordering
) -> Result<usize, usize>

compare_exchange

意思是返回的Result假如是Ok(value)的话，那么操作成功的同时value==current，也就是之前的预期值。假如是Err(value)的话，说明操作失败的同时，value包含了最新看到的值。

续。

编辑于 2018-09-20 21:52

高并发

队列（数据结构）

Rust（编程语言）

Rust源码分析：crossbeam之ms_queue(1)

MsQueue<T>

pub fn push(&self, t: T)

Data Mode

文章被以下专栏收录

Practical Programming Technology