C++ memory order与happen-before

三四

喜欢发呆

C++ Concurrency in Action - chapter 5

---The C++ memory model and operations on atomic types

先吐槽一下，看了C++ Concurrency in Action英文版，才发现中文翻译真心渣渣啊。

1 前言

这里只是总结了自己对于C++11 memory order的一些疑问，主要内容来自C++ Concurrency in Action这本书。

主要是列了几点自己的原来的一直不明白的问题，所以可能不是很循序渐进。另外有错误的话请帮忙指出，不胜感谢。

2 happens-before

书上定义如下：

it specifies which operations see the effects of which other operations.
if operation A on one thread inter-thread happens-before operation B on another thread, then A happens-before B.

单线程之间的执行顺序当然是很直观的，上一条语句先于下一条语句执行。但是我们所谈到的happens before关系指的是线程间的执行关系。

也就是线程A与线程B之间存在一个先后执行的顺序。happens before其实是综合了inter-thread happens before和synchronize-with两个关系。

3 Inter-thread happens before

inter-thread happens-before is relatively simple and relies on the synchronizes-with relationship.
if operation A in one thread synchronizes-with operation B in another thread, then A inter-thread happens- before B

4 synchronizes-with

The synchronizes-with relationship is something that you can get only between operations on atomic types. Operations on a data structure (such as locking a mutex) might provide this relationship if the data structure contains atomic types and the operations on that data structure perform the appropriate atomic operations internally, but fundamentally it comes only from operations on atomic types.
if thread A stores a value and thread B reads that value, there’s a synchronizes-with relationship between the store in thread A and the load in thread B.

从synchronizes-with的定义中我们可以看出，这种关系讲的是线程之间的原子操作关系。

语言都是有点晦涩的，所以有个比较直观的例子：

这里面假设write_x_then_y()与read_y_then_x()运行在不同的thread。

write_x_then_y()中的 y.store(true,std::memory_order_release);与read_y_then_x()的while(!y.load(std::memory_order_acquire));是一种synchronizes-with关系， y.store(true,std::memory_order_release);与x.load(std::memory_order_relaxed)是一种happens-before关系。

void write_x_then_y()
{
    x.store(true,std::memory_order_relaxed);
    y.store(true,std::memory_order_release);
}
void read_y_then_x()
{
    while(!y.load(std::memory_order_acquire));
    if(x.load(std::memory_order_relaxed)){
        ++z;
     }
}

5 memory order

顺序一致性order限制： memory_order_seq_cst (顺序一致的, 是可选项中最强的限制)
acquired-release order限制：memory_order_release, memory_order_acquire, memory_order_consume, memory_order_acl_release
relaxed order限制：memory_order_relaxed

多线程中要保证race-condition情况下的正确运行，locking mutex或者atomic限制是必要的，atomic的限制其实就是一种synchronizes-with和happens-before的限制，确保在线程之间运行的顺序保证。

（1）memory order对CPU开销的影响

为什么说memory_order_seq_cst是最强的限制，因为CPU有自己的cache（高速缓存）所以在多核情况下，每个CPU都有自己的cache，一个变量可能被多个cache读取了，所以一个CPU修改了变量就需要与其他cache进行同步，这样就牵扯到CPU之间的通信。所以顺序一致会增加CPU之间通信逻辑，因此相对消耗更多指令。

If these systems have many processors, these additional synchronization instructions may take a significant amount of time, thus reducing the overall performance of the system.

（2）使用不同的memory order对于happens before和synchronizes-with的影响

sequence consistency order:

If all operations on instances of atomic types are sequentially consistent, 
the behavior of a multithreaded program is as if all these operations were
 performed in some particular sequence by a single thread.

如果一个原子类型的memory order是顺序一致的，那么在多线程系统中来自所有线程的操作在这个原子变量上也是表现出一种类似与在单线程上的操作顺序。

memory order relaxed
relaxed的memory order不保证任何的指令执行顺序

acquire and release

Acquire-release ordering is a step up from relaxed ordering; there’s still no total order of operations, 
but it does introduce some synchronization.

acquire 和 release虽然并没有sequence consistency那样强的约束，但是比relaxed order约束

要强一个等级，另外acquire 和 release也引入了synchronize的关系。

Synchronization is pairwise, between the thread that does the release and the thread that does the acquire.
A release operation synchronizes-with an acquire operation that reads the value written.
This means that different threads can still see different orderings, but these orderings are restricted.

synchronize其实是一个配对的操作，如果对于一个atomic的变量只有load或只有store，那么其并不会存在synchronize的关系。 synchronize关系存在于acquire和release之间。

acquire-release虽然不保证happends-before的关系，但是acquire和release对于同一线程中的原子操作具有其他的副作用, 看下面的例子：

// write_x_then_y和read_y_then_x各自执行在一个线程中
// x原子变量采用的是relaxed order， y原子变量采用的是acquire-release order
// 两个线程中的y原子存在synchronizes-with的关系，read_y_then_x的load与
// write_x_then_y的y.store存在一种happens-before的关系
// write_x_then_y的y.store执行后能保证read_y_then_x的x.load读到的x一定是true。
// 虽然relaxed并不保证happens-before关系，但是在同一线程里，release会保证在其之前的原子
// store操作都能被看见， acquire能保证通线程中的后续的load都能读到最新指。
// 所以当y.load为true的时候，x肯定可以读到最新值。所以即使这里x用的是relaxed操作，所以其也能
// 达到acquire-release的作用。
// 具体为什么会这样，后续单独讲解
void write_x_then_y()
{
    x.store(true,std::memory_order_relaxed);
    y.store(true,std::memory_order_release);
}
void read_y_then_x()
{
    while(!y.load(std::memory_order_acquire));
    if(x.load(std::memory_order_relaxed)){
        ++z;
     }
}

由于acquire-release的这个福利，所以以下的代码就会是正确的：

std::atomic<int> data[5];
std::atomic<bool> sync1(false),sync2(false);
void thread_1()
{
    data[0].store(42,std::memory_order_relaxed);
    data[1].store(97,std::memory_order_relaxed);
    data[2].store(17,std::memory_order_relaxed);
    data[3].store(-141,std::memory_order_relaxed);
    data[4].store(2003,std::memory_order_relaxed);
    sync1.store(true,std::memory_order_release);
}
void thread_2()
{
    while(!sync1.load(std::memory_order_acquire));
    sync2.store(true,std::memory_order_release);
}
void thread_3()
{
    // 由于release和acquire带来的副作用，这里的assert一定不会fired
    // 如果acquire-release被换成relaxed，那么这些assert在多线程环境下
    // 大概率会被fired
    while(!sync2.load(std::memory_order_acquire));
    assert(data[0].load(std::memory_order_relaxed)==42);
    assert(data[1].load(std::memory_order_relaxed)==97);
    assert(data[2].load(std::memory_order_relaxed)==17);
    assert(data[3].load(std::memory_order_relaxed)==-141);
    assert(data[4].load(std::memory_order_relaxed)==2003);
}

memory_order_consume

虽然consume也是acquire-release类型的一种，但是他与真正的acquire却存在一定的差异，consume与release配合并不能得到上述副作用。

consume order引入了一种data-dependency的关系，这种关系与inter-thread happends before存在差别。

Data-dependency引入了两种新的关系：dependency-ordered- before and` carries-a-dependency-to.

dependency-ordered-before
the dependency-ordered-before relationship can apply between threads.
dependency-ordered-before关系通过使用memory_order_consume引入，但是对应的store操作需要使用release, acq_rel,或者seq_cst。
carries-a-dependency-to
if the result of an operation A is used as an operand for an operation B, then A carries-a-dependency-to B.
在同一线程中，如果操作A会在操作B中被用到，那么A就是carries-a-dependency-to B, 且这种关系具有传递性.

struct X {
	int i;
    std::string s;
};
std::atomic<X*> p;
std::atomic<int> a;
void create_x()
{
    X* x=new X;
    x->i=42;
    x->s=”hello”;
    a.store(99,std::memory_order_relaxed);
    p.store(x,std::memory_order_release);
}
void use_x()
{
	X* x;
	while(!(x=p.load(std::memory_order_consume)))
     	std::this_thread::sleep(std::chrono::microseconds(1));
	assert(x->i==42);
  	assert(x->s==”hello”);
   	assert(a.load(std::memory_order_relaxed)==99);
}
int main() {
    std::thread t1(create_x);
    std::thread t2(use_x);
    t1.join();
    t2.join();
}
// 这个例子中use_x中的p.load carries-a-dependency-to x, use_x中的consume只能保证x的值是被flush的，
// 也就是说assert(x->i==42);和assert(x->s==”hello”);是不会fired，但是assert(a.load(std::memory_order_relaxed)==99);
// 则有可能被fired
// 简单点说就是consume的特性只会对其carries-a-dependency-to的变量具有synchronsizes-with功能

6 关于memory order与happens-before和synchronizes-with的关系

happens-before和synchronizes-with是开发者在业务逻辑中维护的一种关系，memory order与这两种关系并没有必然联系，只是通常在这两种逻辑关系中我们需要保证一些共享变量的在各个core上的可见性，而memory order就是开发者用来定制可见性的一个工具。

seq_cst: 这两种关系下的共享原子变量是最新的
acquire-release: 保证了多线程之间的具备synchronizes-with的关系共享原子变量的多核可见性，但并不保证happens-before的关系下共享原子变量多核处理器之间可见性顺序。
relaxed：对任何关系下的共享变量多核间可见性顺序都不保证。

下面这个例子中：

write_x() 中的x.store与 read_x_then_y() 中的x.load是一个具备synchronizes-with关系的原子变量，x.store又与 read_x_then_y() 中的y.load具备happens-before的关系，但是当x.load(std::memory_order_acquire)返回true的时候并不能保证y.load一定是最新的，同样的道理也是作用在 write_y() 与 read_y_then_x() 上，所以这时候虽然满足了synchronizes-with关系的可见性顺序，但是并不能保证happens-before关系下的可见性顺序，所以会导致main函数下的ssert(z.load()!=0);有概率是会被fire的。

#include <atomic>
#include <thread>
#include <assert.h>
std::atomic<bool> x,y;

std::atomic<int> z;
void write_x()
{

    x.store(true,std::memory_order_release);
}
void write_y()
{
    y.store(true,std::memory_order_release);
}
void read_x_then_y()
{
    while(!x.load(std::memory_order_acquire));
    if(y.load(std::memory_order_acquire))
            ++z;
}

void read_y_then_x()
{
     while(!y.load(std::memory_order_acquire));
 
   if(x.load(std::memory_order_acquire))
           ++z;
}

int main() {
    x=false;
    y=false;
 
   z=0;
   std::thread a(write_x);
 
   std::thread b(write_y);
 
   std::thread c(read_x_then_y);
 
   std::thread d(read_y_then_x);
 
   a.join();
    b.join();
    c.join();
 
   d.join();
    assert(z.load()!=0);

}

memory order通常实施于atomic类型的变量中，并不具备锁的作用，atomic可以有不同类型的memory order参数，但是无论什么order参数都不影响atomic自己本身操作的原子性，memory order只是影响的是多线程多处理器并行的场景下线程之间对于原子变量存取结果的可见性问题。

synchorines-with指的是同一个变量的执行顺序，比如说原子变量a的读需要synchronizes-with原子变量a的写。

happends-before则更强一点，指的是多个变量之间的执行顺序，比如如下的例子

// x, y都是原子变量
void write_x()
{
    x.store(true,std::memory_order_release);
}

void read_x_then_y()
{
    while(!x.load(std::memory_order_acquire)); // 原子变量y的写入需要先经过x的load通过。
    if(y.load(std::memory_order_acquire))
	++z;
}
// 上述的关系中，如果write_x与read_x_then_y是运行在不同的线程中，
// 那么这里write_x中的x与read_x_then_y的x之间的关系就是一种synchronizes-with的关系，
// 而write_x中的x操作与read_x_then_y中的y load操作就是一种happens-before的关系

编辑于 2018-09-25 12:35

C++