Implement UPSERT with INSERT ON CONFLICT - Hologres

INSERT ON CONFLICT gives you atomic upsert semantics: insert a row if no conflict exists, or update (or skip) the existing row if a duplicate primary key is detected.

Use this statement when you import data directly with SQL. For Data Integration and Flink-based writes, see Use cases.

Key concepts

The statement supports three conflict-resolution modes:

Mode	SQL clause	Behavior on duplicate primary key
InsertOrIgnore	`DO NOTHING`	Discards the incoming row; the existing row is unchanged
InsertOrUpdate	`DO UPDATE SET <columns>`	Updates only the columns you list; unlisted columns keep their current values
InsertOrReplace	`DO UPDATE SET <all columns>` with `null` for missing columns	Overwrites the full row; missing columns become `null`

Choose a mode:

Use InsertOrIgnore when the first write wins and you never want to overwrite existing data.
Use InsertOrUpdate when you need selective column updates and missing columns must be preserved.
Use InsertOrReplace when you want a complete row replacement, with missing columns explicitly set to null.

Use cases

INSERT ON CONFLICT applies when you import data using SQL statements.

Data Integration (DataWorks)

Data Integration has a built-in INSERT ON CONFLICT feature. Configure the Write Conflict Policy in the Hologres Writer:

Ignore — equivalent to InsertOrIgnore
Replace — equivalent to InsertOrReplace

The policy applies to both offline and real-time synchronization. To update data during synchronization, the Hologres table must have a primary key.

Flink

The default write conflict policy for Flink writes is InsertOrIgnore. This requires a primary key on the sink table and keeps the first entry when a duplicate is encountered. When you use the ctas syntax, the default changes to InsertOrUpdate.

Syntax

INSERT INTO <table_name> [ AS <alias> ] [ ( <column_name> [, ...] ) ]
    { VALUES ( { <expression> } [, ...] ) [, ...] | <query> }
    [ ON CONFLICT [ conflict_target ] conflict_action ]

-- conflict_target
ON CONSTRAINT constraint_name

-- conflict_action
DO NOTHING
DO UPDATE SET { <column_name> = { <expression> } |
                ( <column_name> [, ...] ) = ( { <expression> } [, ...] ) |
              } [, ...]
          [ WHERE condition ]

About `excluded`:

excluded is a virtual table alias that refers to the row proposed for insertion — the row that triggered the conflict. It is not an alias for the source table. Use excluded.<column_name> to reference a specific column from the incoming row, or ROW(excluded.*) to reference all columns in DDL order.

Parameters:

Parameter	Description
`table_name`	The destination table
`alias`	An alternative name for the destination table
`column_name`	A target column in the destination table
`DO NOTHING`	InsertOrIgnore — skips the insertion when a duplicate primary key exists
`DO UPDATE`	InsertOrUpdate or InsertOrReplace — updates the existing row when a duplicate primary key exists
`expression`	The value to write. Use `excluded.<column_name>` to reference the incoming row's value for that column. Use `ROW(excluded.*)` to reference all columns of the incoming row in DDL order.

How it works

INSERT ON CONFLICT executes the same internal path as UPDATE. Update performance depends on the table's storage format.

Column-oriented tables

Tables without a primary key deliver the highest write throughput. For tables with a primary key:

InsertOrIgnore > InsertOrReplace >= InsertOrUpdate (full row) > InsertOrUpdate (partial column)

Row-oriented tables

InsertOrReplace = InsertOrUpdate (full row) >= InsertOrUpdate (partial column) >= InsertOrIgnore

For the best performance on row-oriented tables, keep the column order in the DO UPDATE SET clause consistent with the INSERT clause and perform a full-row update:

INSERT INTO test1 (a, b, c)
SELECT d, e, f FROM test2
ON CONFLICT (a)
DO UPDATE SET (a, b, c) = ROW(excluded.*);

Columns with default values are not updated by DO UPDATE, which reduces performance. To implement InsertOrReplace via SQL, pass null explicitly in the VALUES list. Tools like Flink and Data Integration add null automatically for missing columns.

Limitations

The ON CONFLICT clause must reference all primary key columns.
When the Hologres High-QPS Engine (HQE) executes INSERT ON CONFLICT, operation order is not guaranteed — keep-first and keep-last semantics are not supported. The behavior is keep-any. To enforce keep-last on duplicate rows in the source, set:
```
set hg_experimental_affect_row_multiple_times_keep_last = on;
```
Source data must not contain duplicate rows. PostgreSQL semantics require each row in the VALUES list or source query to be unique. If the source contains duplicates (for example, from a SELECT subquery that returns two rows with the same primary key), the statement fails. To avoid this, deduplicate at the source before the ON CONFLICT clause:
```
-- Deduplicate the source with a subquery before upserting
INSERT INTO target_table (a, b, c)
SELECT a, MAX(b), MAX(c) FROM source_table GROUP BY a
ON CONFLICT (a)
DO UPDATE SET (a, b, c) = ROW(excluded.*);
```

Examples

Set up the test table

BEGIN;
CREATE TABLE test1 (
    a int NOT NULL PRIMARY KEY,
    b int,
    c int
);
COMMIT;

INSERT INTO test1 VALUES (1, 2, 3);

The examples below are independent of each other. Each starts from the initial state above.

InsertOrIgnore — skip on conflict

If a duplicate primary key exists, the incoming row is discarded.

INSERT INTO test1 (a, b, c) VALUES (1, 1, 1)
ON CONFLICT (a)
DO NOTHING;

Result:

a | b | c
1 | 2 | 3    -- unchanged

InsertOrUpdate — update specific columns

Only the columns listed in SET are updated. Unlisted columns keep their current values.

-- Partial-column update: only b is updated, c keeps its value
INSERT INTO test1 (a, b, c) VALUES (1, 1, 1)
ON CONFLICT (a)
DO UPDATE SET b = excluded.b;

Result:

a | b | c
1 | 1 | 3    -- c unchanged

InsertOrUpdate — full-row update

Two equivalent forms:

-- Method 1: list all columns explicitly
INSERT INTO test1 (a, b, c) VALUES (1, 1, 1)
ON CONFLICT (a)
DO UPDATE SET b = excluded.b, c = excluded.c;

-- Method 2: use ROW(excluded.*) shorthand
INSERT INTO test1 (a, b, c) VALUES (1, 1, 1)
ON CONFLICT (a)
DO UPDATE SET (a, b, c) = ROW(excluded.*);

Result:

a | b | c
1 | 1 | 1

InsertOrReplace — overwrite with null for missing columns

To replace an entire row and fill missing columns with null, pass null explicitly in the VALUES list.

INSERT INTO test1 (a, b, c) VALUES (1, 1, null)
ON CONFLICT (a)
DO UPDATE SET b = excluded.b, c = excluded.c;

Result:

a | b | c
1 | 1 | \N

Upsert from another table

-- Prepare source table
CREATE TABLE test2 (
    d int NOT NULL PRIMARY KEY,
    e int,
    f int
);
INSERT INTO test2 VALUES (1, 5, 6);

-- Replace rows in test1 with matching rows from test2
INSERT INTO test1 (a, b, c)
SELECT d, e, f FROM test2
ON CONFLICT (a)
DO UPDATE SET (a, b, c) = ROW(excluded.*);

Result:

a | b | c
1 | 5 | 6

To remap columns (for example, test2.e updates test1.c, and test2.f updates test1.b):

INSERT INTO test1 (a, b, c)
SELECT d, e, f FROM test2
ON CONFLICT (a)
DO UPDATE SET (a, c, b) = ROW(excluded.*);

Result:

a | b | c
1 | 6 | 5

Troubleshooting

Error: duplicate key value violates unique constraint / Update row with Key multiple times

Symptoms:

duplicate key value violates unique constraint
-- or --
Update row with Key (xxx)=(yyy) multiple times

Cause: The source data contains duplicate rows. The following example reproduces the error:

-- This fails: the source contains (1, 2, 3) twice
INSERT INTO test1 VALUES (1, 2, 3), (1, 2, 3)
ON CONFLICT (a)
DO UPDATE SET (a, b, c) = ROW(excluded.*);
-- ERROR: internal error or constraint violation

Table state after the error: test1 is unchanged.

Fix: Set the keep-last flag to resolve duplicate source rows automatically:

set hg_experimental_affect_row_multiple_times_keep_last = on;

Alternatively, deduplicate at the source before upserting (see Limitations).

Error: duplicate key caused by expired TTL

Cause: A source table has a time to live (TTL) configured. After the TTL expires, stale rows may not be cleaned up immediately, leaving duplicate primary keys in the source.

Fix: Starting from Hologres V1.3.23, you can use the following command to remove duplicate primary key rows caused by an expired TTL. The default policy is keep-last. This command is available only in Hologres V1.3.23 and later — if your instance is an earlier version, upgrade first.

In principle, primary keys should not be duplicated. This command cleans up only duplicate PKs caused by an expired TTL, not general PK duplication.

call public.hg_remove_duplicated_pk('<schema>.<table_name>');

Example:

BEGIN;
CREATE TABLE tbl_1 (a int NOT NULL PRIMARY KEY, b int, c int);
CREATE TABLE tbl_2 (d int NOT NULL PRIMARY KEY, e int, f int);
CALL set_table_property('tbl_2', 'time_to_live_in_seconds', '300');
COMMIT;

INSERT INTO tbl_1 VALUES (1, 1, 1), (2, 3, 4);
INSERT INTO tbl_2 VALUES (1, 5, 6);

-- After 300 seconds, insert a row with the same primary key into tbl_2.
INSERT INTO tbl_2 VALUES (1, 3, 6);

-- This fails: tbl_2 now has duplicate primary keys caused by TTL expiry.
INSERT INTO tbl_1 (a, b, c)
SELECT d, e, f FROM tbl_2
ON CONFLICT (a)
DO UPDATE SET (a, b, c) = ROW(excluded.*);
-- ERROR: internal error: Duplicate keys detected when building hash table.

-- Clean up duplicate primary keys in tbl_2, then retry.
call public.hg_remove_duplicated_pk('tbl_2');

-- The import now succeeds.
INSERT INTO tbl_1 (a, b, c)
SELECT d, e, f FROM tbl_2
ON CONFLICT (a)
DO UPDATE SET (a, b, c) = ROW(excluded.*);

Error: out-of-memory (OOM)

Symptom:

Total memory used by all existing queries exceeded memory limitation

Cause: The instance has insufficient memory for a large-volume write task.

Fix:

Use Serverless Computing (Hologres V2.1.17+) to offload the task to serverless resources, reducing OOM risk without reserving extra instance capacity.
Follow the steps in Troubleshoot common OOM errors.

What's next

UPDATE — understand the underlying update mechanism
Serverless Computing overview — offload large write jobs to serverless resources (Hologres V2.1.17+)
Hologres Writer — configure write conflict policy in Data Integration