Use UPDATE to modify data in Transactional and Delta tables - MaxCompute

The UPDATE statement modifies column values in one or more rows of a Transactional table or Delta Table.

Prerequisites

Before you run a DELETE or UPDATE statement, ensure that you have:

Select and Update permissions on the target Transactional table or Delta Table

For more information, see MaxCompute permissions.

Limitations

UPDATE and DELETE work only on Transactional tables and Delta Tables. For configuration details, see Parameters for Transaction Table and Delta Table.
UPDATE on a Delta Table cannot modify primary key (PK) columns.

Syntax

-- Method 1: Update individual columns
UPDATE <table_name> [[AS] alias] SET <col1_name> = <value1> [, <col2_name> = <value2> ...] [WHERE <where_condition>];

-- Method 2: Update multiple columns as a tuple
UPDATE <table_name> [[AS] alias] SET (<col1_name> [, <col2_name> ...]) = (<value1> [, <value2> ...]) [WHERE <where_condition>];

-- Method 3: Update with a FROM clause (join update)
UPDATE <table_name> [[AS] alias]
    SET <col1_name> = <value1> [, <col2_name> = <value2>, ...]
    [FROM <additional_tables>]
    [WHERE <where_condition>];

Parameters

Required parameters

Parameter	Description
`table_name`	The name of the Transactional table or Delta Table to update.
`col1_name`, `col2_name`	The columns to update. At least one column is required.
`value1`, `value2`	The new values for the specified columns. At least one value is required.

Optional parameters

Parameter	Description	Default
`alias`	An alias for the target table.	No alias
`where_condition`	A filter expression that selects which rows to update. See SELECT syntax for supported expressions.	All rows in the table are updated if omitted.
`additional_tables`	One or more source tables for a join update, specified in a FROM clause.	No join

FROM clause

The FROM clause lets you update rows in the target table based on matching rows in one or more source tables. This simplifies join updates compared to subquery-based syntax.

The following table shows the same join update written with and without a FROM clause.

Approach	Example
Without FROM clause	`UPDATE target SET v = (SELECT MIN(v) FROM src GROUP BY k WHERE target.k = src.key) WHERE target.k IN (SELECT k FROM src);`
With FROM clause	`UPDATE target SET v = b.v FROM (SELECT k, MIN(v) AS v FROM src GROUP BY k) b WHERE target.k = b.k;`

The FROM clause version is shorter and easier to read. The subquery version requires both an extra WHERE condition (for intersection-only updates) and an aggregate in the subquery (to ensure the source produces a single value per target row).

Multi-joined rows

A multi-joined row occurs when a target row matches more than one row in the source table during a join update. The result is nondeterministic: MaxCompute uses one of the matched source rows to perform the update, but does not guarantee which one.

To avoid nondeterministic updates, use an aggregate in a subquery to ensure each target row maps to exactly one source value:

-- Produces a unique value per key before joining
UPDATE target SET v = b.v
FROM (SELECT k, MIN(v) AS v FROM src GROUP BY k) b
WHERE target.k = b.k;

Usage notes

Choose the right operation for your workload

Scenario	Recommended operation
Small number of rows to update; updates and subsequent reads are both infrequent	`UPDATE` or `DELETE`
More than 5% of rows to update infrequently; subsequent reads are frequent	`INSERT OVERWRITE` or `INSERT INTO`

For example, if a pipeline deletes or updates 10% of a table's rows ten times a day, compare the total cost and read performance degradation of repeated DELETE/UPDATE operations against using INSERT OVERWRITE for each run, then choose the more efficient approach for your workload. See Insert or overwrite data (INSERT INTO \| INSERT OVERWRITE).

Update in batches, not row by row

MaxCompute runs each UPDATE statement as a batch process. Each statement consumes resources and incurs fees based on the amount of data scanned, regardless of how many rows are changed. Submitting many row-level statements — for example, from a Python script that generates one UPDATE per row — accumulates high costs and reduces efficiency.

Consolidate updates into a single statement where possible:

-- Recommended: one statement updates all matching rows
UPDATE table1 SET col1 = (SELECT value1 FROM table2 WHERE table1.id = table2.id AND table1.region = table2.region);

-- Not recommended: one statement per row
UPDATE table1 SET col1 = 1 WHERE id = '2021063001' AND region = 'beijing';
UPDATE table1 SET col1 = 2 WHERE id = '2021063002' AND region = 'beijing';

Storage impact of DELETE and UPDATE

Deleting or updating rows generates Delta files rather than immediately reducing storage. To reclaim storage after multiple operations, merge the table's base files and Delta files. See Merge transactional table files.

Examples

Example 1: Update rows in a non-partitioned table

Create a Transactional table, insert data, and update all rows where id = 2.

-- Create a Transactional table
CREATE TABLE IF NOT EXISTS acid_update(id BIGINT) tblproperties ("transactional"="true");

-- Insert data
INSERT OVERWRITE TABLE acid_update VALUES(1),(2),(3),(2);

-- Verify the inserted data
SELECT * FROM acid_update;
-- Result:
-- +------------+
-- | id         |
-- +------------+
-- | 1          |
-- | 2          |
-- | 3          |
-- | 2          |
-- +------------+

-- Update: set id to 4 for all rows where id is 2
UPDATE acid_update SET id = 4 WHERE id = 2;

-- Verify the result
SELECT * FROM acid_update;
-- Result: both rows with id=2 are updated to 4
-- +------------+
-- | id         |
-- +------------+
-- | 1          |
-- | 3          |
-- | 4          |
-- | 4          |
-- +------------+

Example 2: Update rows in a partitioned table

Create a partitioned Transactional table, insert data into partition ds='2019', and update rows in that partition.

-- Create a partitioned Transactional table
CREATE TABLE IF NOT EXISTS acid_update_pt(id BIGINT) PARTITIONED BY(ds STRING) tblproperties ("transactional"="true");

-- Add a partition
ALTER TABLE acid_update_pt ADD IF NOT EXISTS PARTITION (ds='2019');

-- Insert data
INSERT OVERWRITE TABLE acid_update_pt PARTITION (ds='2019') VALUES(1),(2),(3);

-- Verify the inserted data
SELECT * FROM acid_update_pt WHERE ds = '2019';
-- Result:
-- +------------+------------+
-- | id         | ds         |
-- +------------+------------+
-- | 1          | 2019       |
-- | 2          | 2019       |
-- | 3          | 2019       |
-- +------------+------------+

-- Update: set id to 4 for the row where ds='2019' and id=2
UPDATE acid_update_pt SET id = 4 WHERE ds = '2019' AND id = 2;

-- Verify the result
SELECT * FROM acid_update_pt WHERE ds = '2019';
-- Result:
-- +------------+------------+
-- | id         | ds         |
-- +------------+------------+
-- | 4          | 2019       |
-- | 1          | 2019       |
-- | 3          | 2019       |
-- +------------+------------+

Example 3: Update multiple columns using four methods

This example demonstrates four ways to update multiple columns in a Transactional table by joining with a source table.

-- Create the target Transactional table and a source table
CREATE TABLE IF NOT EXISTS acid_update_t(id INT, value1 INT, value2 INT) tblproperties ("transactional"="true");
CREATE TABLE IF NOT EXISTS acid_update_s(id INT, value1 INT, value2 INT);

-- Insert data
INSERT OVERWRITE TABLE acid_update_t VALUES(2,20,21),(3,30,31),(4,40,41);
INSERT OVERWRITE TABLE acid_update_s VALUES(1,100,101),(2,200,201),(3,300,301);

-- Method 1: Update with constant values (updates all rows)
UPDATE acid_update_t SET (value1, value2) = (60, 61);
SELECT * FROM acid_update_t;
-- Result:
-- +------------+------------+------------+
-- | id         | value1     | value2     |
-- +------------+------------+------------+
-- | 2          | 60         | 61         |
-- | 3          | 60         | 61         |
-- | 4          | 60         | 61         |
-- +------------+------------+------------+

-- Method 2: Join update (left join from acid_update_t to acid_update_s)
-- Rows in acid_update_t with no match in acid_update_s are set to NULL
UPDATE acid_update_t SET (value1, value2) = (SELECT value1, value2 FROM acid_update_s WHERE acid_update_t.id = acid_update_s.id);
SELECT * FROM acid_update_t;
-- Result:
-- +------------+------------+------------+
-- | id         | value1     | value2     |
-- +------------+------------+------------+
-- | 2          | 200        | 201        |
-- | 3          | 300        | 301        |
-- | 4          | NULL       | NULL       |
-- +------------+------------+------------+

-- Method 3: Join update restricted to the intersection (rows with a matching id in both tables)
UPDATE acid_update_t SET (value1, value2) = (SELECT value1, value2 FROM acid_update_s WHERE acid_update_t.id = acid_update_s.id) WHERE acid_update_t.id IN (SELECT id FROM acid_update_s);
SELECT * FROM acid_update_t;
-- Result: same as Method 2 because id=4 has no match and is unchanged from the Method 2 result
-- +------------+------------+------------+
-- | id         | value1     | value2     |
-- +------------+------------+------------+
-- | 2          | 200        | 201        |
-- | 3          | 300        | 301        |
-- | 4          | NULL       | NULL       |
-- +------------+------------+------------+

-- Method 4: Join update with aggregation (uses MAX to handle multiple matching source rows)
UPDATE acid_update_t SET (id, value1, value2) = (SELECT id, MAX(value1), MAX(value2) FROM acid_update_s WHERE acid_update_t.id = acid_update_s.id GROUP BY acid_update_s.id) WHERE acid_update_t.id IN (SELECT id FROM acid_update_s);
SELECT * FROM acid_update_t;
-- Result:
-- +------------+------------+------------+
-- | id         | value1     | value2     |
-- +------------+------------+------------+
-- | 2          | 200        | 201        |
-- | 3          | 300        | 301        |
-- | 4          | NULL       | NULL       |
-- +------------+------------+------------+

Example 4: Join update using the FROM clause (two tables)

The FROM clause simplifies join updates. Both statements below produce the same result.

-- Create the target and source tables
CREATE TABLE IF NOT EXISTS acid_update_t(id BIGINT, value1 BIGINT, value2 BIGINT) tblproperties ("transactional"="true");
CREATE TABLE IF NOT EXISTS acid_update_s(id BIGINT, value1 BIGINT, value2 BIGINT);

-- Insert data
INSERT OVERWRITE TABLE acid_update_t VALUES(2,20,21),(3,30,31),(4,40,41);
INSERT OVERWRITE TABLE acid_update_s VALUES(1,100,101),(2,200,201),(3,300,301);

-- Join update using the FROM clause
UPDATE acid_update_t SET value1 = b.value1, value2 = b.value2
FROM acid_update_s b WHERE acid_update_t.id = b.id;

-- Equivalent statement using a table alias for the target table
UPDATE acid_update_t a SET a.value1 = b.value1, a.value2 = b.value2
FROM acid_update_s b WHERE a.id = b.id;

-- Verify the result: rows with id=2 and id=3 are updated; id=4 has no match and is unchanged
SELECT * FROM acid_update_t;
-- Result:
-- +------------+------------+------------+
-- | id         | value1     | value2     |
-- +------------+------------+------------+
-- | 4          | 40         | 41         |
-- | 2          | 200        | 201        |
-- | 3          | 300        | 301        |
-- +------------+------------+------------+

Example 5: Join update with multiple source tables and filter conditions

This example joins three tables and applies filters to both the source and target tables. Only rows where all conditions are satisfied are updated.

-- Create three tables
CREATE TABLE IF NOT EXISTS acid_update_t(id BIGINT, value1 BIGINT, value2 BIGINT) tblproperties ("transactional"="true");
CREATE TABLE IF NOT EXISTS acid_update_s(id BIGINT, value1 BIGINT, value2 BIGINT);
CREATE TABLE IF NOT EXISTS acid_update_m(id BIGINT, value1 BIGINT, value2 BIGINT);

-- Insert data
INSERT OVERWRITE TABLE acid_update_t VALUES(2,20,21),(3,30,31),(4,40,41),(5,50,51);
INSERT OVERWRITE TABLE acid_update_s VALUES(1,100,101),(2,200,201),(3,300,301),(4,400,401),(5,500,501);
INSERT OVERWRITE TABLE acid_update_m VALUES(3,30,101),(4,400,201),(5,300,301);

-- Join update: filter both the source and target tables in the WHERE clause
-- Only rows with id > 2 where neither the target's value1 nor the source's value1
-- appears in acid_update_m for the same id are updated
UPDATE acid_update_t
SET value1 = acid_update_s.value1, value2 = acid_update_s.value2
FROM acid_update_s
WHERE acid_update_t.id = acid_update_s.id
  AND acid_update_s.id > 2
  AND acid_update_t.value1 NOT IN (SELECT value1 FROM acid_update_m WHERE id = acid_update_t.id)
  AND acid_update_s.value1 NOT IN (SELECT value1 FROM acid_update_m WHERE id = acid_update_s.id);

-- Verify the result: only id=5 satisfies all conditions
SELECT * FROM acid_update_t;
-- Result:
-- +------------+------------+------------+
-- | id         | value1     | value2     |
-- +------------+------------+------------+
-- | 5          | 500        | 501        |
-- | 2          | 20         | 21         |
-- | 3          | 30         | 31         |
-- | 4          | 40         | 41         |
-- +------------+------------+------------+

Example 6: Update rows in a Delta Table

Create a Delta Table, insert data, and update the val column for a specific row using two equivalent methods.

-- Create a Delta Table with a primary key
CREATE TABLE IF NOT EXISTS mf_dt (
    pk  BIGINT NOT NULL PRIMARY KEY,
    val BIGINT NOT NULL
)
PARTITIONED BY(dd STRING, hh STRING)
tblproperties ("transactional"="true");

-- Insert data
INSERT OVERWRITE TABLE mf_dt PARTITION (dd='01', hh='02')
VALUES (1, 1), (2, 2), (3, 3);

-- Verify the inserted data
SELECT * FROM mf_dt WHERE dd='01' AND hh='02';
-- Result:
-- +------------+------------+----+----+
-- | pk         | val        | dd | hh |
-- +------------+------------+----+----+
-- | 1          | 1          | 01 | 02 |
-- | 3          | 3          | 01 | 02 |
-- | 2          | 2          | 01 | 02 |
-- +------------+------------+----+----+

-- Method 1: Update with a direct value
UPDATE mf_dt SET val = 30 WHERE pk = 3 AND dd='01' AND hh='02';

-- Method 2: Update using a FROM clause with inline values
UPDATE mf_dt SET val = delta.val
FROM (SELECT pk, val FROM VALUES (3, 30) t (pk, val)) delta
WHERE delta.pk = mf_dt.pk AND mf_dt.dd='01' AND mf_dt.hh='02';

-- Verify the result
SELECT * FROM mf_dt WHERE dd='01' AND hh='02';
-- Result: val for pk=3 is updated from 3 to 30
-- +------------+------------+----+----+
-- | pk         | val        | dd | hh |
-- +------------+------------+----+----+
-- | 1          | 1          | 01 | 02 |
-- | 3          | 30         | 01 | 02 |
-- | 2          | 2          | 01 | 02 |
-- +------------+------------+----+----+

What's next

DELETE: Delete rows from a Transactional table or Delta Table.
ALTER TABLE: Merge base files and Delta files to reduce storage after multiple update or delete operations.