Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?
Several of the answers in the question you link talk about rewriting the code to be branchless and thus avoiding any branch prediction issues. That’s what your updated compiler is doing. Specifically, clang++ 10 with -O3 vectorizes the inner loop. See the code on godbolt, lines 36-67 of the assembly. The code is a little … Read more