How does deferred LINQ query execution actually work?

Your query can be written like this in method syntax:

var query = numbers.Where(value => value >= threshold);

Or:

Func<int, bool> predicate = delegate(value) {
    return value >= threshold;
}
IEnumerable<int> query = numbers.Where(predicate);

These pieces of code (including your own query in query syntax) are all equivalent.

When you unroll the query like that, you see that predicate is an anonymous method and threshold is a closure in that method. That means it will assume the value at the time of execution. The compiler will generate an actual (non-anonymous) method that will take care of that. The method will not be executed when it’s declared, but for each item when query is enumerated (the execution is deferred). Since the enumeration happens after the value of threshold is changed (and threshold is a closure), the new value is used.

When you set numbers to null, you set the reference to nowhere, but the object still exists. The IEnumerable returned by Where (and referenced in query) still references it and it does not matter that the initial reference is null now.

That explains the behavior: numbers and threshold play different roles in the deferred execution. numbers is a reference to the array that is enumerated, while threshold is a local variable, whose scope is ”forwarded“ to the anonymous method.

Extension, part 1: Modification of the closure during the enumeration

You can take your example one step further when you replace the line…

var result = query.ToList();

…with:

List<int> result = new List<int>();
foreach(int value in query) {
    threshold = 8;
    result.Add(value);
}

What you are doing is to change the value of threshold during the iteration of your array. When you hit the body of the loop the first time (when value is 3), you change the threshold to 8, which means the values 5 and 7 will be skipped and the next value to be added to the list is 9. The reason is that the value of threshold will be evaluated again on each iteration and the then valid value will be used. And since the threshold has changed to 8, the numbers 5 and 7 do not evaluate as greater or equal anymore.

Extension, part 2: Entity Framework is different

To make things more complicated, when you use LINQ providers that create a different query from your original and then execute it, things are slightly different. The most common examples are Entity Framework (EF) and LINQ2SQL (now largely superseded by EF). These providers create an SQL query from the original query before the enumeration. Since this time the value of the closure is evaluated only once (it actually is not a closure, because the compiler generates an expression tree and not an anonymous method), changes in threshold during the enumeration have no effect on the result. These changes happen after the query is submitted to the database.

The lesson from this is that you have to be always aware which flavor of LINQ you are using and that some understanding of its inner workings is an advantage.

Leave a Comment

tech