Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees?

Basically no significant effect on inter-core latency, and definitely never worth using “blindly” without careful profiling, if you suspect there might be any contention from later loads missing in cache. It’s a common misconception that asm barriers are needed to make the store buffer commit to cache. In fact barriers just make this core wait … Read more

Is there any compiler barrier which is equal to asm(“” ::: “memory”) in C++11?

re: your edit: But I do not want to use atomic variable. Why not? If it’s for performance reasons, use them with memory_order_relaxed and atomic_signal_fence(mo_whatever) to block compiler reordering without any runtime overhead other than the compiler barrier potentially blocking some compile-time optimizations, depending on the surrounding code. If it’s for some other reason, then … Read more

Is LFENCE serializing on AMD processors?

AMD has always in their manual described their implementation of LFENCE as a load serializing instruction Acts as a barrier to force strong memory ordering (serialization) between load instructions preceding the LFENCE and load instructions that follow the LFENCE. The original use case for LFENCE was ordering WC memory type loads. However, after the speculative … Read more

Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?

Bottom line (TL;DR): LFENCE alone indeed seems useless for memory ordering, however it does not make SFENCE a substitute for MFENCE. The “arithmetic” logic in the question is not applicable. Here is an excerpt from Intel’s Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in … Read more

How does a mutex lock and unlock functions prevents CPU reordering?

The short answer is that the body of the pthread_mutex_lock and pthread_mutex_unlock calls will include the necessary platform-specific memory barriers which will prevent the CPU from moving memory accesses within the critical section outside of it. The instruction flow will move from the calling code into the lock and unlock functions via a call instruction, … Read more