Why does a std::atomic store with sequential consistency use XCHG?

mov-store + mfence and xchg are both valid ways to implement a sequential-consistency store on x86. The implicit lock prefix on an xchg with memory makes it a full memory barrier, like all atomic RMW operations on x86. (x86’s memory-ordering rules essentially make that full-barrier effect the only option for any atomic RMW: it’s both … Read more

Do I have to use atomic for “exit” bool variable?

Do I have to use atomic for “exit” bool variable? Yes. Either use atomic<bool>, or use manual synchronization through (for instance) an std::mutex. Your program currently contains a data race, with one thread potentially reading a variable while another thread is writing it. This is Undefined Behavior. Per Paragraph 1.10/21 of the C++11 Standard: The … Read more

Why set the stop flag using `memory_order_seq_cst`, if you check it with `memory_order_relaxed`?

mo_relaxed is fine for both load and store of a stop flag There’s also no meaningful latency benefit to stronger memory orders, even if latency of seeing a change to a keep_running or exit_now flag was important. IDK why Herb thinks stop.store shouldn’t be relaxed; in his talk, his slides have a comment that says … Read more

Acquire/Release versus Sequentially Consistent memory order

The C++11 memory ordering parameters for atomic operations specify constraints on the ordering. If you do a store with std::memory_order_release, and a load from another thread reads the value with std::memory_order_acquire then subsequent read operations from the second thread will see any values stored to any memory location by the first thread that were prior … Read more

Acquire/release semantics with 4 threads

You are thinking in terms of sequential consistency, the strongest (and default) memory order. If this memory order is used, all accesses to atomic variables constitute a total order, and the assertion indeed cannot be triggered. However, in this program, a weaker memory order is used (release stores and acquire loads). This means, by definition … Read more

C++11: the difference between memory_order_relaxed and memory_order_consume

Question 1 No. memory_order_relaxed imposes no memory order at all: Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation. While memory_order_consume imposes memory ordering on data dependent reads (on the current thread) A load operation with this memory order performs a consume operation on the affected memory location: … Read more

Memory order consume usage in C11

consume is cheaper than acquire. All CPUs (except DEC Alpha AXP’s famously weak memory model1) do it for free, unlike acquire. (Except on x86 and SPARC-TSO, where the hardware has acq/rel memory ordering without extra barriers or special instructions.) On ARM/AArch64/PowerPC/MIPS/etc weakly-ordered ISAs, consume and relaxed are the only orderings that don’t require any extra … Read more

Implementing 64 bit atomic counter with 32 bit atomics

This is a known pattern, called a SeqLock. https://en.wikipedia.org/wiki/Seqlock. (With the simplification that there’s only one writer so no extra support for excluding simultaneous writers is needed.) You don’t need or want the increment of the counter variable itself to use atomic RMW operations. (Unless you’re on a system that can do that cheaply with … Read more