Why does a std::atomic store with sequential consistency use XCHG?

mov-store + mfence and xchg are both valid ways to implement a sequential-consistency store on x86. The implicit lock prefix on an xchg with memory makes it a full memory barrier, like all atomic RMW operations on x86. (x86’s memory-ordering rules essentially make that full-barrier effect the only option for any atomic RMW: it’s both … Read more

CPUID implementations in C++

Accessing raw CPUID information is actually very easy, here is a C++ class for that which works in Windows, Linux and OSX: #ifndef CPUID_H #define CPUID_H #ifdef _WIN32 #include <limits.h> #include <intrin.h> typedef unsigned __int32 uint32_t; #else #include <stdint.h> #endif class CPUID { uint32_t regs[4]; public: explicit CPUID(unsigned i) { #ifdef _WIN32 __cpuid((int *)regs, (int)i); … Read more

LEA or ADD instruction?

One significant difference between LEA and ADD on x86 CPUs is the execution unit which actually performs the instruction. Modern x86 CPUs are superscalar and have multiple execution units that operate in parallel, with the pipeline feeding them somewhat like round-robin (bar stalls). Thing is, LEA is processed by (one of) the unit(s) dealing with … Read more

Difference between JE/JNE and JZ/JNZ

JE and JZ are just different names for exactly the same thing: a conditional jump when ZF (the “zero” flag) is equal to 1. (Similarly, JNE and JNZ are just different names for a conditional jump when ZF is equal to 0.) You could use them interchangeably, but you should use them depending on what … Read more