The short answer is that the body of the pthread_mutex_lock
and pthread_mutex_unlock
calls will include the necessary platform-specific memory barriers which will prevent the CPU from moving memory accesses within the critical section outside of it. The instruction flow will move from the calling code into the lock
and unlock
functions via a call
instruction, and it is this dynamic instruction trace you have to consider for the purposes of reordering – not the static sequence you see in an assembly listing.
On x86 specifically, you probably won’t find explicit, standalone memory barriers inside those methods, since you’ll already have lock
-prefixed instructions in order to perform the actual locking and unlocking atomically, and these instructions imply a full memory barrier, which prevents the CPU reordering you are concerned about.
For example, on my Ubuntu 16.04 system with glibc 2.23, pthread_mutex_lock
is implemented using a lock cmpxchg
(compare-and-exchange) and pthread_mutex_unlock
is implemented using lock dec
(decrement), both of which have full barrier semantics.