Update:
This article in my blog summarizes both my answer and my comments to another answers, and shows actual execution plans:
- IN vs. JOIN vs. EXISTS
SELECT *
FROM a
WHERE a.c IN (SELECT d FROM b)
SELECT a.*
FROM a
JOIN b
ON a.c = b.d
These queries are not equivalent. They can yield different results if your table b
is not key preserved (i. e. the values of b.d
are not unique).
The equivalent of the first query is the following:
SELECT a.*
FROM a
JOIN (
SELECT DISTINCT d
FROM b
) bo
ON a.c = bo.d
If b.d
is UNIQUE
and marked as such (with a UNIQUE INDEX
or UNIQUE CONSTRAINT
), then these queries are identical and most probably will use identical plans, since SQL Server
is smart enough to take this into account.
SQL Server
can employ one of the following methods to run this query:
-
If there is an index on
a.c
,d
isUNIQUE
andb
is relatively small compared toa
, then the condition is propagated into the subquery and the plainINNER JOIN
is used (withb
leading) -
If there is an index on
b.d
andd
is notUNIQUE
, then the condition is also propagated andLEFT SEMI JOIN
is used. It can also be used for the condition above. -
If there is an index on both
b.d
anda.c
and they are large, thenMERGE SEMI JOIN
is used -
If there is no index on any table, then a hash table is built on
b
andHASH SEMI JOIN
is used.
Neither of these methods reevaluates the whole subquery each time.
See this entry in my blog for more detail on how this works:
- Counting missing rows: SQL Server
There are links for all RDBMS
‘s of the big four.