How can I mitigate the impact of the Intel jcc erratum on gcc?

By compiler: GCC: -Wa,-mbranches-within-32B-boundaries clang (10+): -mbranches-within-32B-boundaries compiler option directly, not -Wa. MSVC: /QIntel-jcc-erratum See Intel JCC Erratum – what is the effect of prefixes used for mitigation? ICC: TODO, look for docs. The GNU toolchain does mitigation in the assembler, with as -mbranches-within-32B-boundaries, which enables (GAS manual: x86 options): -malign-branch-boundary=32 (care about 32-byte boundaries). … Read more

Alignment requirements for atomic x86 instructions vs. MS’s InterlockedCompareExchange documentation?

x86 does not require alignment for a lock cmpxchg instruction to be atomic. However, alignment is necessary for good performance. This should be no surprise, backward compatibility means that software written with a manual from 14 years ago will still run on today’s processors. Modern CPUs even have a performance counter specifically for split-lock detection … Read more

Do current x86 architectures support non-temporal loads (from “normal” memory)?

To answer specifically the headline question: Yes, recent1 mainstream Intel CPUs support non-temporal loads on normal 2 memory – but only “indirectly” via non-temporal prefetch instructions, rather than directly using non-temporal load instructions like movntdqa. This is in contrast to non-temporal stores where you can just use the corresponding non-temporal store instructions3 directly. The basic … Read more

Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?

Bottom line (TL;DR): LFENCE alone indeed seems useless for memory ordering, however it does not make SFENCE a substitute for MFENCE. The “arithmetic” logic in the question is not applicable. Here is an excerpt from Intel’s Software Developers Manual, volume 3, section 8.2.2 (the edition 325384-052US of September 2014), the same that I used in … Read more