Use of the PostgreSQL Upsert (INSERT ON CONFLICT DO) Function

Background

Upsert (INSERT ON CONFLICT DO) is a new function of PostgreSQL 9.5. When a constraint error occurs during data insertion, data insertion is rolled back or changed to update. The syntax for the same is as follows:

Command:     INSERT  
Description: create new rows in a table  
Syntax:  
[ WITH [ RECURSIVE ] with_query [, ...] ]  
INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ]  
    { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }  
    [ ON CONFLICT [ conflict_target ] conflict_action ]  
    [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]  
  
where conflict_target can be one of:  
  
    ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ]  
    ON CONSTRAINT constraint_name  
  
and conflict_action is one of:  
  
    DO NOTHING  
    DO UPDATE SET { column_name = { expression | DEFAULT } |  
                    ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) |  
                    ( column_name [, ...] ) = ( sub-SELECT )  
                  } [, ...]  
              [ WHERE condition ]

For versions earlier than PostgreSQL 9.5, use functions or the with syntax to implement functions similar to upsert.

Upsert Usage Example for Versions Later Than 9.5

Execute the following command to create a test table with one field as the unique key or primary key.

create table test(id int primary key, info text, crt_time timestamp);

Step 1) If a table does not exist, insert it. Otherwise, update it.

test03=# insert into test values (1,'test',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time;  
INSERT 0 1  
  
test03=# select * from test;  
 id | info |          crt_time            
----+------+----------------------------  
  1 | test | 2017-04-24 15:27:25.393948  
(1 row)  
  
test03=# insert into test values (1,'hello digoal',now()) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time;  
INSERT 0 1  
  
test03=# select * from test;  
 id |     info     |          crt_time            
----+--------------+----------------------------  
  1 | hello digoal | 2017-04-24 15:27:39.140877  
(1 row)

Step 2) If the data does not exist, insert it. Otherwise, do not perform any operation.

test03=# insert into test values (1,'hello digoal',now()) on conflict (id) do nothing;  
INSERT 0 0  
test03=# insert into test values (1,'pu',now()) on conflict (id) do nothing;  
INSERT 0 0  
test03=# insert into test values (2,'pu',now()) on conflict (id) do nothing;  
INSERT 0 1  
test03=# select * from test;  
 id |     info     |          crt_time            
----+--------------+----------------------------  
  1 | hello digoal | 2017-04-24 15:27:39.140877  
  2 | pu           | 2017-04-24 15:28:20.37392  
(2 rows)

Upsert Usage Example for Versions Earlier Than 9.5

Use different methods to meet the actual requirements.

1) Functions

test03=# create or replace function f_upsert(int,text,timestamp) returns void as $$  
declare  
  res int;  
begin  
  update test set info=$2,crt_time=$3 where id=$1;  
  if not found then  
    insert into test (id,info,crt_time) values ($1,$2,$3);  
  end if;  
  exception when SQLSTATE '23505' then  
    -- return;  -- on conflict (id) do nothing
    update test set info=$2,crt_time=$3 where id=$1;  -- insert into test values ($1,$2,$3) on conflict (id) do update set info=excluded.info,crt_time=excluded.crt_time
end;  
$$ language plpgsql strict;  
CREATE FUNCTION  
  
test03=# select f_upsert(1,'digoal',now()::timestamp);  
 f_upsert   
----------  
   
(1 row)  
  
test03=# select * from test;  
 id |  info  |          crt_time            
----+--------+----------------------------  
  2 | pu     | 2017-04-24 15:28:20.37392  
  1 | digoal | 2017-04-24 15:31:29.254325  
(2 rows)  
  
test03=# select f_upsert(1,'digoal001',now()::timestamp);  
 f_upsert   
----------  
   
(1 row)  
  
test03=# select * from test;  
 id |   info    |         crt_time            
----+-----------+---------------------------  
  2 | pu        | 2017-04-24 15:28:20.37392  
  1 | digoal001 | 2017-04-24 15:31:38.0529  
(2 rows)  
  
test03=# select f_upsert(3,'hello',now()::timestamp);  
 f_upsert   
----------  
   
(1 row)  
  
test03=# select * from test;  
 id |   info    |         crt_time            
----+-----------+---------------------------  
  2 | pu        | 2017-04-24 15:28:20.37392  
  1 | digoal001 | 2017-04-24 15:31:38.0529  
  3 | hello     | 2017-04-24 15:31:49.14291  
(3 rows)

Use functions to upsert data. To write data in batches, create variables as shown below.

create table batch (id int primary key, info text, crt_time timestamp);

create or replace function merge_batch(VARIADIC i batch[])
returns void as $$
declare var batch;
begin
  foreach var in array i loop
    update batch set info=var.info,crt_time=var.crt_time where id=var.id;
    if not found then
      insert into batch values (var.*);
    end if;
  end loop;
exception when others then
  return; 
end;
$$ language plpgsql strict;

select merge_batch((1,'abc','2000-01-01'),(1,'abcd','2017-02-02'));

2) With Syntax - Method 1

create table test(id int primary key, info text, crt_time timestamp);

If the exists, update it. Otherwise, insert it.

with upsert as (update test set info=$info,crt_time=$crt_time where id=$id returning *) insert into test select $id,$info,$crt_time where not exists (select 1 from upsert where id=$id);

Replace the variables and perform a test.

with upsert as (update test set info='test',crt_time=now() where id=1 returning *) insert into test select 1,'test',now() where not exists (select 1 from upsert where id=1);

When a nonexistent value is inserted, only one session has the successfully inserted value, whereas the other session returns a primary key constraint error.

3) With Syntax - Method 2

Ensure concurrency even if the table does not have a primary key or a unique constraint.

create table test(id int, info text, crt_time timestamp);

3.1) Data Inserted to Only One Session.

When the same data item is updated, the first established session locks the record, whereas the session that is established later enters the waiting state.

with     
  w1 as(select ('x'||substr(md5('$id'),1,16))::bit(64)::bigint as tra_id),    
  upsert as (update test set info=$info,crt_time=$crt_time where id=$id returning *)    
  insert into test select $id, $info, $crt_time from w1     
    where pg_try_advisory_xact_lock(tra_id) and not exists (select 1 from upsert where id=$id);

Next, replace the variables and perform a test.

with     
  w1 as(select ('x'||substr(md5('1'),1,16))::bit(64)::bigint as tra_id),    
  upsert as (update test set info='digoal0123',crt_time=now() where id=1 returning *)    
  insert into test select 1, 'digoal0123', now() from w1     
    where pg_try_advisory_xact_lock(tra_id) and not exists (select 1 from upsert where id=1);    
  
INSERT 0 0  
  
test03=# select * from test;  
 id |    info    |         crt_time            
----+------------+---------------------------  
  2 | pu         | 2017-04-24 15:28:20.37392  
  3 | hello      | 2017-04-24 15:31:49.14291  
  1 | digoal0123 | 2017-04-24 15:31:38.0529  
(3 rows)  
  
with     
  w1 as(select ('x'||substr(md5('4'),1,16))::bit(64)::bigint as tra_id),    
  upsert as (update test set info='digoal0123',crt_time=now() where id=4 returning *)    
  insert into test select 4, 'digoal0123', now() from w1     
    where pg_try_advisory_xact_lock(tra_id) and not exists (select 1 from upsert where id=4);    
  
INSERT 0 1  
  
test03=# select * from test;  
 id |    info    |          crt_time            
----+------------+----------------------------  
  2 | pu         | 2017-04-24 15:28:20.37392  
  3 | hello      | 2017-04-24 15:31:49.14291  
  1 | digoal0123 | 2017-04-24 15:31:38.0529  
  4 | digoal0123 | 2017-04-24 15:38:39.801908  
(4 rows)

3.2) Data Inserted to Only One Session.

When the same data item is updated, the session that is established first updates the data, whereas the session that is established later directly returns an error.

with w1 as(select ('x'||substr(md5('$id'),1,16))::bit(64)::bigint as tra_id),    
  upsert as (update test set info=$info,crt_time=$crt_time from w1 where pg_try_advisory_xact_lock(tra_id) and id=$id returning *)    
  insert into test select $id,$info,$crt_time from w1   
    where pg_try_advisory_xact_lock(tra_id) and not exists (select 1 from upsert where id=$id);

Next, replace the variables and perform a test.

with w1 as(select ('x'||substr(md5('1'),1,16))::bit(64)::bigint as tra_id),    
  upsert as (update test set info='test',crt_time=now() from w1 where pg_try_advisory_xact_lock(tra_id) and id=1 returning *)    
  insert into test select 1,'test',now() from w1   
    where pg_try_advisory_xact_lock(tra_id) and not exists (select 1 from upsert where id=1);    
  
INSERT 0 0  
  
test03=# select * from test;  
 id |    info    |          crt_time            
----+------------+----------------------------  
  2 | pu         | 2017-04-24 15:28:20.37392  
  3 | hello      | 2017-04-24 15:31:49.14291  
  4 | digoal0123 | 2017-04-24 15:42:50.912887  
  1 | test       | 2017-04-24 15:44:44.245167  
(4 rows)

4) Rule Method

PostgreSQL has supported Rule syntax for a long time. Create a rule with Rule syntax. If the rule exists, update it. Otherwise, insert it. However, when using the volatile function, do not directly use exists. Otherwise, it will be processed as an immutable function.

If it exists, do not insert it (ignore it). Otherwise, update it.

Implement idempotent writing so no problems occur while writing during resumable uploads.

postgres=# create table e(id int primary key, info text);
CREATE TABLE

Execute the following code to create a volatile function.

CREATE OR REPLACE FUNCTION public.ff(integer)
 RETURNS boolean
 LANGUAGE sql
 STRICT
 volatile
AS $function$
  select true from e where id=$1 limit 1;
$function$;

Create a rule as shown below.

postgres=# create rule r1 as on insert to e where ff(NEW.id) do instead update e set info=NEW.info where id=NEW.id;
CREATE RULE

postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 0
postgres=# insert into e values (2,'test'),(2,'test');
INSERT 0 1
postgres=# insert into e values (3,'test'),(3,'test');
INSERT 0 1
postgres=# truncate e;
TRUNCATE TABLE
postgres=# select * from b;
 id | info 
----+------
  1 | a
  1 | b
(2 rows)

postgres=# insert into e select * from b;
INSERT 0 1
postgres=# \d+ b
                                     Table "public.b"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 id     | integer |           |          |         | plain    |              | 
 info   | text    |           |          |         | extended |              |

Implement performance stress testing as shown below.

vi test.sql

\set id random(1,1000000)
insert into e values (:id, md5(random()::text));

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1000

progress: 90.0 s, 132056.5 tps, lat 0.220 ms stddev 0.055
progress: 91.0 s, 131656.9 tps, lat 0.220 ms stddev 0.040
progress: 92.0 s, 134941.0 tps, lat 0.215 ms stddev 0.090
progress: 93.0 s, 134324.5 tps, lat 0.216 ms stddev 0.076
progress: 94.0 s, 136699.4 tps, lat 0.212 ms stddev 0.070
progress: 95.0 s, 139291.4 tps, lat 0.208 ms stddev 0.067
progress: 96.0 s, 136073.2 tps, lat 0.213 ms stddev 0.076
progress: 97.0 s, 135804.6 tps, lat 0.214 ms stddev 0.076
progress: 98.0 s, 146037.6 tps, lat 0.199 ms stddev 0.069
progress: 99.0 s, 129619.5 tps, lat 0.224 ms stddev 0.049
progress: 100.0 s, 129230.0 tps, lat 0.224 ms stddev 0.047
progress: 101.0 s, 131048.4 tps, lat 0.221 ms stddev 0.055
progress: 102.0 s, 128808.0 tps, lat 0.225 ms stddev 0.048
progress: 103.0 s, 128954.6 tps, lat 0.225 ms stddev 0.048
progress: 104.0 s, 131227.9 tps, lat 0.221 ms stddev 0.042
progress: 105.0 s, 129604.0 tps, lat 0.224 ms stddev 0.057

Always take the following precautions for using the Rule syntax.

1) The use of exists in the Rule syntax is only suitable for a single insert statement (use the volatile function to solve this problem).

postgres=# create table e(id int, info text);
CREATE TABLE
postgres=# create rule r1 as on insert to e where exists (select 1 from e t1 where t1.id=NEW.id limit 1) do instead nothing;
CREATE RULE
  
在一个语句中插入多条，如果多条中有重复，则在RULE中判断条件时仅判断一次（类似immutable函数）。  
  
postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 2
postgres=# select * from e;
 id | info 
----+------
  1 | test
  1 | test
(2 rows)

2) Rule syntax does not support copy statements. Repeated copy statements also cause problems.

In any case, it is recommended that tables requiring upsert have a primary key.

Community

Use of the PostgreSQL Upsert (INSERT ON CONFLICT DO) Function

Background

Upsert Usage Example for Versions Later Than 9.5

Upsert Usage Example for Versions Earlier Than 9.5

1) Functions

2) With Syntax - Method 1

3) With Syntax - Method 2

3.1) Data Inserted to Only One Session.

3.2) Data Inserted to Only One Session.

4) Rule Method

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL

FinTech on Cloud Solution