Sum reduction of unsigned bytes without overflow, using SSE2 on Intel
You can abuse PSADBW to calculate horizontal sums of bytes without overflow. For example: pxor xmm0, xmm0 psadbw xmm0, [a + 0] ; sum in 2x 64-bit chunks pxor xmm1, xmm1 psadbw xmm1, [a + 16] paddw xmm0, xmm1 ; accumulate vertically pshufd xmm1, xmm0, 2 ; bring down the high half paddw xmm0, xmm1 … Read more