On a system without significant load, an async call has a slightly bigger overhead. While the I/O operation itself is asynchronous regardless, blocking can be faster than thread-pool task switching.
How much overhead? Let’s look at your timing numbers. 30ms for a blocking call, 450ms for an asynchronous call. 32 kiB packet size means you need you need about fifty individual I/O operations. That means we have roughly 8ms of overhead on each packet, which corresponds pretty well with your measurements over different packet sizes. That doesn’t sound like overhead just from being asynchronous, even though the asynchronous versions need to do a lot more work than the synchronous. It sounds like the synchronous version is (simplified) 1 request -> 50 responses, while the asynchronous version ends up being 1 request -> 1 response -> 1 request -> 1 response -> …, paying the cost over and over again.
ExecuteReader works just as well as
ExecuteReaderAsync. The next operation is
Read followed by a
GetFieldValue – and an interesting thing happens there. If either of the two is async, the whole operation is slow. So there’s certainly something very different happening once you start making things truly asynchronous – a
Read will be fast, and then the async
GetFieldValueAsync will be slow, or you can start with the slow
ReadAsync, and then both
GetFieldValueAsync are fast. The first asynchronous read from the stream is slow, and the slowness depends entirely on the size of the whole row. If I add more rows of the same size, reading each row takes the same amount of time as if I only have one row, so it’s obvious that the data is still being streamed row by row – it just seems to prefer to read the whole row at once once you start any asynchronous read. If I read the first row asynchronously, and the second synchronously – the second row being read will be fast again.
So we can see that the problem is a big size of an individual row and/or column. It doesn’t matter how much data you have in total – reading a million small rows asynchronously is just as fast as synchronously. But add just a single field that’s too big to fit in a single packet, and you mysteriously incur a cost at asynchronously reading that data – as if each packet needed a separate request packet, and the server couldn’t just send all the data at once. Using
CommandBehavior.SequentialAccess does improve the performance as expected, but the massive gap between sync and async still exists.
The best performance I got was when doing the whole thing properly. That means using
CommandBehavior.SequentialAccess, as well as streaming the data explicitly:
using (var reader = await cmd.ExecuteReaderAsync(CommandBehavior.SequentialAccess))
while (await reader.ReadAsync())
var data = await reader.GetTextReader(0).ReadToEndAsync();
With this, the difference between sync and async becomes hard to measure, and changing the packet size no longer incurs the ridiculous overhead as before.
If you want good performance in edge cases, make sure to use the best tools available – in this case, stream large column data rather than relying on helpers like