9.5 and newer:
PostgreSQL 9.5 and newer support
INSERT ... ON CONFLICT (key) DO UPDATE (and
ON CONFLICT (key) DO NOTHING), i.e. upsert.
ON DUPLICATE KEY UPDATE.
For usage see the manual – specifically the conflict_action clause in the syntax diagram, and the explanatory text.
Unlike the solutions for 9.4 and older that are given below, this feature works with multiple conflicting rows and it doesn’t require exclusive locking or a retry loop.
The commit adding the feature is here and the discussion around its development is here.
If you’re on 9.5 and don’t need to be backward-compatible you can stop reading now.
9.4 and older:
PostgreSQL doesn’t have any built-in
MERGE) facility, and doing it efficiently in the face of concurrent use is very difficult.
This article discusses the problem in useful detail.
In general you must choose between two options:
- Individual insert/update operations in a retry loop; or
- Locking the table and doing batch merge
Individual row retry loop
Using individual row upserts in a retry loop is the reasonable option if you want many connections concurrently trying to perform inserts.
The PostgreSQL documentation contains a useful procedure that’ll let you do this in a loop inside the database. It guards against lost updates and insert races, unlike most naive solutions. It will only work in
READ COMMITTED mode and is only safe if it’s the only thing you do in the transaction, though. The function won’t work correctly if triggers or secondary unique keys cause unique violations.
This strategy is very inefficient. Whenever practical you should queue up work and do a bulk upsert as described below instead.
Many attempted solutions to this problem fail to consider rollbacks, so they result in incomplete updates. Two transactions race with each other; one of them successfully
INSERTs; the other gets a duplicate key error and does an
UPDATE instead. The
UPDATE blocks waiting for the
INSERT to rollback or commit. When it rolls back, the
UPDATE condition re-check matches zero rows, so even though the
UPDATE commits it hasn’t actually done the upsert you expected. You have to check the result row counts and re-try where necessary.
Some attempted solutions also fail to consider SELECT races. If you try the obvious and simple:
-- THIS IS WRONG. DO NOT COPY IT. It's an EXAMPLE. BEGIN; UPDATE testtable SET somedata="blah" WHERE id = 2; -- Remember, this is WRONG. Do NOT COPY IT. INSERT INTO testtable (id, somedata) SELECT 2, 'blah' WHERE NOT EXISTS (SELECT 1 FROM testtable WHERE testtable.id = 2); COMMIT;
then when two run at once there are several failure modes. One is the already discussed issue with an update re-check. Another is where both
UPDATE at the same time, matching zero rows and continuing. Then they both do the
EXISTS test, which happens before the
INSERT. Both get zero rows, so both do the
INSERT. One fails with a duplicate key error.
This is why you need a re-try loop. You might think that you can prevent duplicate key errors or lost updates with clever SQL, but you can’t. You need to check row counts or handle duplicate key errors (depending on the chosen approach) and re-try.
Please don’t roll your own solution for this. Like with message queuing, it’s probably wrong.
Bulk upsert with lock
Sometimes you want to do a bulk upsert, where you have a new data set that you want to merge into an older existing data set. This is vastly more efficient than individual row upserts and should be preferred whenever practical.
In this case, you typically follow the following process:
COPYor bulk-insert the new data into the temp table
LOCKthe target table
IN EXCLUSIVE MODE. This permits other transactions to
SELECT, but not make any changes to the table.
UPDATE ... FROMof existing records using the values in the temp table;
INSERTof rows that don’t already exist in the target table;
COMMIT, releasing the lock.
For example, for the example given in the question, using multi-valued
INSERT to populate the temp table:
BEGIN; CREATE TEMPORARY TABLE newvals(id integer, somedata text); INSERT INTO newvals(id, somedata) VALUES (2, 'Joe'), (3, 'Alan'); LOCK TABLE testtable IN EXCLUSIVE MODE; UPDATE testtable SET somedata = newvals.somedata FROM newvals WHERE newvals.id = testtable.id; INSERT INTO testtable SELECT newvals.id, newvals.somedata FROM newvals LEFT OUTER JOIN testtable ON (testtable.id = newvals.id) WHERE testtable.id IS NULL; COMMIT;
- UPSERT wiki page
- UPSERTisms in Postgres
- Insert, on duplicate update in PostgreSQL?
- Upsert with a transaction
- Is SELECT or INSERT in a function prone to race conditions?
MERGEon the PostgreSQL wiki
- Most idiomatic way to implement UPSERT in Postgresql nowadays
MERGE actually has poorly defined concurrency semantics and is not suitable for upserting without locking a table first.
It’s a really useful OLAP statement for data merging, but it’s not actually a useful solution for concurrency-safe upsert. There’s lots of advice to people using other DBMSes to use
MERGE for upserts, but it’s actually wrong.
INSERT ... ON DUPLICATE KEY UPDATEin MySQL
MERGEfrom MS SQL Server (but see above about
MERGEfrom Oracle (but see above about