There are basically 4 techniques for this task, all of them standard SQL.
Often fastest in Postgres.
SELECT ip FROM login_log l WHERE NOT EXISTS ( SELECT -- SELECT list mostly irrelevant; can just be empty in Postgres FROM ip_location WHERE ip = l.ip );
- What is easier to read in EXISTS subqueries?
LEFT JOIN / IS NULL
Sometimes this is fastest. Often shortest. Often results in the same query plan as
SELECT l.ip FROM login_log l LEFT JOIN ip_location i USING (ip) -- short for: ON i.ip = l.ip WHERE i.ip IS NULL;
Short. Not as easily integrated in more complex queries.
SELECT ip FROM login_log EXCEPT ALL -- "ALL" keeps duplicates and makes it faster SELECT ip FROM ip_location;
Note that (per documentation):
duplicates are eliminated unless
EXCEPT ALLis used.
Typically, you’ll want the
ALL keyword. If you don’t care, still use it because it makes the query faster.
Only good without
NULL values or if you know to handle
NULL properly. I would not use it for this purpose. Also, performance can deteriorate with bigger tables.
SELECT ip FROM login_log WHERE ip NOT IN ( SELECT DISTINCT ip -- DISTINCT is optional FROM ip_location );
NOT IN carries a “trap” for
NULL values on either side:
- Find records where join doesn’t exist
Similar question on dba.SE targeted at MySQL:
- Select rows where value of second column is not present in first column