What are the costs of failed store-to-load forwarding on x86?

It is not really a full answer, but still evidence that the penalty is visible. MSVC 2022 benchmark, compiler with /std:c++latest. #include <chrono> #include <iostream> struct alignas(16) S { char* a; int* b; }; extern “C” void init_fused_copy_unfused(int n, S & s2, S & s1); extern “C” void init_fused_copy_fused(int n, S & s2, S & … Read more

Is LFENCE serializing on AMD processors?

AMD has always in their manual described their implementation of LFENCE as a load serializing instruction Acts as a barrier to force strong memory ordering (serialization) between load instructions preceding the LFENCE and load instructions that follow the LFENCE. The original use case for LFENCE was ordering WC memory type loads. However, after the speculative … Read more

How can I mitigate the impact of the Intel jcc erratum on gcc?

By compiler: GCC: -Wa,-mbranches-within-32B-boundaries clang (10+): -mbranches-within-32B-boundaries compiler option directly, not -Wa. MSVC: /QIntel-jcc-erratum See Intel JCC Erratum – what is the effect of prefixes used for mitigation? ICC: TODO, look for docs. The GNU toolchain does mitigation in the assembler, with as -mbranches-within-32B-boundaries, which enables (GAS manual: x86 options): -malign-branch-boundary=32 (care about 32-byte boundaries). … Read more