sse – Make Me Engineer

How to sum __m256 horizontally?

November 1, 2022 by Tarik

This version should be optimal for both Intel Sandy/Ivy Bridge and AMD Bulldozer, and later CPUs. // x = ( x7, x6, x5, x4, x3, x2, x1, x0 ) float sum8(__m256 x) { // hiQuad = ( x7, x6, x5, x4 ) const __m128 hiQuad = _mm256_extractf128_ps(x, 1); // loQuad = ( x3, x2, x1, … Read more

practical BigNum AVX/SSE possible?

July 12, 2022 by Tarik

I think it may be possible to implement BigNum with SIMD efficiently but not in the way you suggest. Instead of implementing a single BigNum using a SIMD register (or with an array of SIMD registers) you should process multiple BigNums at once. Let’s consider 128-bit addition. Let 128-bit integers be defined by a pair … Read more