高并发编程--多处理器编程中的一致性问题(下)

高并发编程--多处理器编程中的一致性问题(下)

(接上文)

4 C++ Memory model

4.0 写在前面

C++ memory order是对atomic操作的一种约束,通常这个atomic是一个多线程共享的变量,因此我们要约定当前CPU操作这个变量的顺序。我们所谈到的memory order都是针对共享变量的,这个变量可以是atomic的也可以是non-atomic的,但是一定是共享的。对于那些线程运行时私有的变量我们不讨论他的order,因为他的order不会被reorder,或者即使被reorder了也是符合它在当前线程中执行的语意,这点无论是CPU还是compiler都是至少要保证的底线。CPU和compiler的优化也是基于单线程的,也就是说如果这个变量只在单线程中执行,那CPU和compiler的优化对他是没有任何影响的。

另外,memory order中对于同一个shared memory location虽然也会有reorder,比如TSO中的store->load特性,对同一个memory location也是存在reorder,但是其通过使用bypassing来保证这种reorder不改变当期memory location在单线程中原有的意义。

4.1 memory model的意义

C++11在标准库中引入了memory model,这应该是C++11最重要的特性之一了。C++11引入memory model的意义在于我们可以在high level language层面实现对在多处理器中多线程共享内存交互的控制。我们可以在语言层面忽略compiler,CPU arch的不同对多线程编程的影响了。我们的多线程可以跨平台了。

The memory model means that C++ code now has a standardized library to call regardless of who made the compiler and on what platform it's running. There's a standard way to control how different threads talk to the processor's memory.[7]

4.2 memory model 与 memory order

C++ atomic操作数中有一个选项可以指定对应的memory_order,这里的memory order可以理解为上面章节中的memory order。C++11中提供了六种不同memory_order选项,不同的选项会定义不同的memory consistency类型。

  namespace std {
      typedef enum memory_order {
      memory_order_relaxed, memory_order_consume, memory_order_acquire,
        memory_order_release, memory_order_acq_rel, memory_order_seq_cst
  } memory_order;
  The enumeration memory_order specifies the detailed regular (non-atomic) memory synchronization order as defined in 1.10 and may provide for operation ordering. [10]      

memory order指定了对应的对共享内存的operation order的关系。memory order也是一致性模型的一种反映。

  • memory_order_seq_cst 顺序一致性模型,这个是默认提供的最强的一致性模型。
  • memory_order_release/acquire/consume 提供release、acquire或者consume, release语意的一致性保障
  • memory_order_relaxed 提供松散一致性模型保障,不提供operation order保证。

4.3 memory_order_seq_cst

enforcing sequential consistency is expensive on some platforms, and there are some frequently used idioms for which sequential consistency is not required.[9]

顺序一致性,也是默认的选项,这个选项不允许reorder,那么也会带来一些性能损失,关于顺序一致性第3节也已经描述比较多了,这里也不赘述。

4.4 memory_order_release/acquire/acq_rel

a store-release operation synchronizes with all load-acquire operations reading the stored value.

All Operations in the releasing thread preceding the store-release happen-before all operations following the load-acquire in the acquiring thread.[12]

Acquire and Release语意:
Acquire semantics: is a property that can only apply to operations that read from shared memory, whether they are read-modify-write operations or plain loads. The operation is then considered a read-acquire. Acquire semantics prevent memory reordering of the read-acquire with any read or write operation that follows it in program order.[13]
Release semantics: is a property that can only apply to operations that write to shared memory, whether they are read-modify-write operations or plain stores. The operation is then considered a write-release. Release semantics prevent memory reordering of the write-release with any read or write operation that precedes it in program order.[13] acquire和release可以通过添加memory barrier(fence)实现。
  • synchronize with与happends before  Synchronized-with relation exists only between the releasing thread and the acquiring thread.[14] happends before与synchronize with可以参考我的另一篇文章C++ memory order与happen-before.
 atomic<bool> f=false;
 atomic<bool> g=false;
 int n;
 
 // thread1
 n = 42;         // op6
 f.store(true, memory_order_release);  // op1
 
 // thread2
 while(!f.load(memory_order_acquire));   // op2
 g.store(true, memory_order_release);  // op3
 
 // thread3
 while(!g.load(memory_order_acquire)); // op4
 assert(42 == n);      // op5

上述示例中op1与op2是一种synchronize-with的关系,因此op6与op3是一种happends-before关系,那么能够保证在op2执行后op6的结果是对op2以后的操作是可见的,op3与op4也是synchronize-with的关系,那么op2与op5也就存在happends-before的关系,可以推导op6与op5也存在happends-before的关系,那么op6在op5执行前是可见的。因此这里的assert是成功的。

  • one-way release与acquire是one-way的,也就是说对于release语意,在这个release之上的操作不能够被reorder到release之下,在acquire之下的操作不能够被reorder到acquire之上。release语句只能向下移不能向上移,acquire语句只能向上移不能向下移。
  atomic<bool> f1=false;
 atomic<bool> f2=false;
 
 // thread1
 f1.store(true, memory_order_release);
 if (!f2.load(memory_order_acquire)) {
  // critical section
 }
 // f1.store(true, memory_order_release);
 
 // thread2
 f2.store(true, memory_order_release);
 if (!f1.load(memory_order_acquire)) {
  // critical section
 }
 // f2.store(true, memory_order_release);

上述的两个线程可以同时进入临界区吗? 答案是可以的。 因为f1,f2是两个memory location,那么可以reorder,但是release和acquire又对reorder做了约束,这里reorder之后f1,f2的两个store操作可以下移到下面,这是符合release和acquire寓意的,这时候两个线程就可以同时进入临界区了。这个例子中如果要防止同时进入临界区需要使用seq_cst这种memory_order。release和acquire不是SC,所以允许在memory order曲线上存在交叉。

4.5 memory_order_relaxed

std::memory_order specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation. [5]

The memory_order_relaxed arguments above mean “ensure these operations are atomic, but don’t impose any ordering constraints/memory barriers that aren’t already there.”[13]

relaxed order允序单线程上不同的memory location进行reorder,但是对于同一个memory location不能进行reorder。

  • 示例一
 atomic<bool> f=false;
 atomic<bool> g=false;
 
 // thread1
 f.store(true, memory_order_relaxed);
 g.store(true, memory_order_relaxed);
 
 // thread2
 while(!g.load(memory_order_relaxed));
 assert(f.load(memory_order_relaxed));

因为relaxed的order允许f与g进行任意的reorder。如果thread1中的f和g的storere order了(这并不影响他们在单线程中执行的意义),那么thread2的assert就会有可能失败。f.load()发生在f.store之前就会fail。

  • 示例二
 // thread1
 void process() {
  while(!stop.load(std::memory_order_relaxed)) {
  
     }
 }
 
 // main
 int main() {
  thread t(process);
  stop.store(true, std::memory_order_release);
  t.join();
  return 0;
 }

示例2是我们在多线程编程中常用的模式,但是thread1里while循环中用到的是memory_order_relaxed,这里用relax对吗? 答案是对的。

4.6 memory order 与 cache有关系吗

C++ memory model是在抽象底层机器实现的基础上约定的语言层面上的规则。memory order是作用于atomic变量上的选项,为此原子变量约定操作规则或者说是reorder规则。也就是说C++ 实现memory order与cache的 store buffer,invalid message queue没有直接关系。

C++标准库中的memory order其实与具体的machine无关的,在实现上会通过memory barrier(FENCE)来进行order,memory barrier在不同CPU类型上表现不一定一致。

4.7 C++ Memory model与machine memory model

谈过硬件的memory model也谈过C++的memory model。这两者有什么关系,冲突吗,或者兼容吗?

软件层的memory model是需要在硬件层的memory model上执行的,所以对于强一致性模型的一些系统来说如果其是SC的,那么无论上层用什么order都不会影响底层的SC,但是上层的order还有一个作用是组织编译器优化,因为编译器层面没有memory order而言,编译器层面就是fence。

如果你上层指定的都是relaxed的memory order,且硬件系统是SC的,但是compiler在编译的时候通过优化将代码优化成reorder后的形式,那么CPU执行出来的结果也是reorder的,对于编码人员来说。

在x86系统中,第三节讨论过,TSO只允许storeload reorder,所以对于storestore操作使用relaxed也不会在CPU层面reorder,但是可能会在compiler层面就被reorder了。

5 Synchronization

这部分单独放在另一篇博客中。

  • lock insight
  • lock为什么能够防止并发同时进入临界区
  • lock为什么能够保证在有CPU/compiler reorder和cache存在的情况下后续进入临界区的线程能够看到最新的值


6 资源推荐

c++ memory model

herb sutter的一个关于C++Memory Model的share。part1,part2.


7 References

  1. M. Mizuno, M. Raynal, J.Z. Zhou. Sequential Consistency in Distributed Systems.
  2. Scott Meyers and Andrei Alexandrescu. C++ and the Perils of Double-Checked Locking.
  3. Daniel J. Sorin, Mark D. Hill, and David A. Wood. A Primer on Memory Consistency and Cache Coherence.
  4. Paul E. McKenney .Memory Barriers: a Hardware View for Software Hackers.
  5. C++ memory order
  6. out of order execution
  7. The New C++: Lay down your guns, knives, and clubs
  8. c++ memory model
  9. H. Boehm, S. V. Adve.Foundations of the C++ Concurrency Memory Model
  10. C++ Standard - 2012-01-16 - Working Draft (N3337).pdf
  11. CPU Cache and Memory Ordering
  12. think cell talk memory model
  13. Acquire and Release Semantics
  14. C++ Memory model

编辑于 2018-11-13

文章被以下专栏收录