How to count clock cycles with RDTSC in GCC x86? [duplicate]

The other answers work, but you can avoid inline assembly by using GCC’s __rdtsc intrinsic, available by including x86intrin.h. It is defined at: gcc/config/i386/ia32intrin.h: /* rdtsc */ extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) __rdtsc (void) { return __builtin_ia32_rdtsc (); }

rdtsc accuracy across CPU cores

X86_FEATURE_CONSTANT_TSC + X86_FEATURE_NONSTOP_TSC bits in cpuid (edx=x80000007, bit #8; check unsynchronized_tsc function of linux kernel for more checks) Intel’s Designer’s vol3b, section 16.11.1 Invariant TSC it says the following “16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated … Read more

Difference between rdtscp, rdtsc : memory and cpuid / rdtsc?

As mentioned in a comment, there’s a difference between a compiler barrier and a processor barrier. volatile and memory in the asm statement act as a compiler barrier, but the processor is still free to reorder instructions. Processor barrier are special instructions that must be explicitly given, e.g. rdtscp, cpuid, memory fence instructions (mfence, lfence, … Read more

RDTSCP in NASM always returns the same value (timing a single instruction)

Your first code (leading to the title question) is buggy because it overwrites the rdtsc and rdtscp results with the cpuid results in EAX,EBX,ECX and EDX. Use lfence instead of cpuid; on Intel since forever and AMD with Spectre mitigation enabled, lfence will serialize the instruction stream and thus do what you want with rdtsc. … Read more