fma – Make Me Engineer

Optimize for fast multiplication but slow addition: FMA and doubledouble

May 17, 2023 by Tarik

To answer my third question I found a faster solution for double-double addition. I found an alternative definition in the paper Implementation of float-float operators on graphics hardware. Theorem 5 (Add22 theorem) Let be ah+al and bh+bl the float-float arguments of the following algorithm: Add22 (ah ,al ,bh ,bl) 1 r = ah ⊕ bh … Read more

How to use Fused Multiply-Add (FMA) instructions with SSE/AVX

October 9, 2022 by Tarik

The compiler is allowed to fuse a separated add and multiply, even though this changes the final result (by making it more accurate). An FMA has only one rounding (it effectively keeps infinite precision for the internal temporary multiply result), while an ADD + MUL has two. The IEEE and C standards allow this when … Read more

Obtaining peak bandwidth on Haswell in the L1 cache: only getting 62%

July 14, 2022 by Tarik

IACA Analysis Using IACA (the Intel Architecture Code Analyzer) reveals that macro-op fusion is indeed occurring, and that it is not the problem. It is Mysticial who is correct: The problem is that the store isn’t using Port 7 at all. IACA reports the following: Intel(R) Architecture Code Analyzer Version – 2.1 Analyzed File – … Read more