Performance differences between debug and release builds

The C# compiler itself doesn’t alter the emitted IL a great deal in the Release build. Notable is that it no longer emits the NOP opcodes that allow you to set a breakpoint on a curly brace. The big one is the optimizer that’s built into the JIT compiler. I know it makes the following optimizations:

  • Method inlining. A method call is replaced by the injecting the code of the method. This is a big one, it makes property accessors essentially free.

  • CPU register allocation. Local variables and method arguments can stay stored in a CPU register without ever (or less frequently) being stored back to the stack frame. This is a big one, notable for making debugging optimized code so difficult. And giving the volatile keyword a meaning.

  • Array index checking elimination. An important optimization when working with arrays (all .NET collection classes use an array internally). When the JIT compiler can verify that a loop never indexes an array out of bounds then it will eliminate the index check. Big one.

  • Loop unrolling. Loops with small bodies are improved by repeating the code up to 4 times in the body and looping less. Reduces the branch cost and improves the processor’s super-scalar execution options.

  • Dead code elimination. A statement like if (false) { // } gets completely eliminated. This can occur due to constant folding and inlining. Other cases is where the JIT compiler can determine that the code has no possible side-effect. This optimization is what makes profiling code so tricky.

  • Code hoisting. Code inside a loop that is not affected by the loop can be moved out of the loop. The optimizer of a C compiler will spend a lot more time on finding opportunities to hoist. It is however an expensive optimization due to the required data flow analysis and the jitter can’t afford the time so only hoists obvious cases. Forcing .NET programmers to write better source code and hoist themselves.

  • Common sub-expression elimination. x = y + 4; z = y + 4; becomes z = x; Pretty common in statements like dest[ix+1] = src[ix+1]; written for readability without introducing a helper variable. No need to compromise readability.

  • Constant folding. x = 1 + 2; becomes x = 3; This simple example is caught early by the compiler, but happens at JIT time when other optimizations make this possible.

  • Copy propagation. x = a; y = x; becomes y = a; This helps the register allocator make better decisions. It is a big deal in the x86 jitter because it has few registers to work with. Having it select the right ones is critical to perf.

These are very important optimizations that can make a great deal of difference when, for example, you profile the Debug build of your app and compare it to the Release build. That only really matters though when the code is on your critical path, the 5 to 10% of the code you write that actually affects the perf of your program. The JIT optimizer isn’t smart enough to know up front what is critical, it can only apply the “turn it to eleven” dial for all the code.

The effective result of these optimizations on your program’s execution time is often affected by code that runs elsewhere. Reading a file, executing a dbase query, etc. Making the work the JIT optimizer does completely invisible. It doesn’t mind though 🙂

The JIT optimizer is pretty reliable code, mostly because it has been put to the test millions of times. It is extremely rare to have problems in the Release build version of your program. It does happen however. Both the x64 and the x86 jitters have had problems with structs. The x86 jitter has trouble with floating point consistency, producing subtly different results when the intermediates of a floating point calculation are kept in a FPU register at 80-bit precision instead of getting truncated when flushed to memory.

Leave a Comment