Use INSERT INTO and INSERT OVERWRITE to insert or overwrite data in tables and static partitions - MaxCompute

This topic describes how to use the INSERT INTO and INSERT OVERWRITE statements to insert data into or overwrite data in a table or a static partition.

Supported platforms:

Prerequisites

Before you execute the INSERT INTO or INSERT OVERWRITE statement, ensure that you have the Update permission on the destination table and the Select permission on the metadata of the source table. For more information, see MaxCompute permissions.

Features

You can use the INSERT INTO or INSERT OVERWRITE statement to save the execution results of SELECT statements to a destination table. Differences between the two statements:

INSERT INTO: Appends data to a table or a static partition. You can specify values for partition key columns to insert data into a specific partition. For inserting small amounts of test data, use the VALUES clause. See VALUES.
INSERT OVERWRITE: Clears the existing data in a specified table or static partition and inserts new data.
Note
- MaxCompute INSERT syntax differs from MySQL or Oracle. You must include the TABLE keyword after INSERT OVERWRITE. For INSERT INTO, the TABLE keyword is optional.
- Repeatedly executing INSERT OVERWRITE on the same partition may result in different file splitting logic, causing variations in file size when viewed with the DESC command. However, the total data size remains consistent and storage costs are unaffected.

For information about inserting data into dynamic partitions, see Insert or overwrite data into dynamic partitions (DYNAMIC PARTITION).

Limitations

General limits:
- INSERT INTO: Cannot be used to insert data into clustered tables.
- INSERT OVERWRITE: Does not support specifying target columns. To insert data into specific columns, use INSERT INTO. Example: INSERT INTO t(a) VALUES ('1'); inserts '1' into column 'a' and NULL (or default value) into other columns.
- MaxCompute does not provide table locking during INSERT operations. Avoid executing multiple INSERT INTO or INSERT OVERWRITE statements on the same table simultaneously.

Limits for Delta tables:
- INSERT OVERWRITE: Automatically deduplicates data based on primary keys, keeping only the first record. It overwrites the entire table or partition. Default deduplication ensures primary key uniqueness.
- INSERT INTO: By default, inserts all data without deduplication, even if primary keys conflict. To enable deduplication, set odps.sql.insert.acidtable.deduplicate.enable to true.

Syntax

INSERT {INTO|OVERWRITE} TABLE <table_name> [PARTITION (<pt_spec>)] [(<col_name> [,<col_name> ...)]]
<select_statement>
FROM <from_statement>
[ZORDER BY <zcol_name> [, <zcol_name> ...]];

Parameters:

Parameter	Required	Description
table_name	Yes	The name of the destination table.
pt_spec	No	The destination partition. Must use constants (expressions are not supported). Format: `(partition_col1 = value1, partition_col2 = value2, ...)`.
col_name	No	The target column name. Not supported with `INSERT OVERWRITE`.
select_statement	Yes	The `SELECT` clause querying the source data. See SELECT syntax. Note Mappings are based on column order, not column names. For static partitions, do not include partition key columns in the `select_statement`.
from_statement	Yes	The `FROM` clause specifying the data source (e.g., source table name).
ZORDER BY <zcol_name> [, <zcol_name> ...]	No	Co-locates rows with similar data records to improve query filtering and compression. `ZORDER BY x, y` groups rows with similar x and y values.

ZORDER BY vs. SORT BY:

ZORDER BY: Supports local (default) and global modes. Local Z-Ordering sorts data within a single file. Global Z-Ordering (requires set odps.sql.default.zorder.type=global;) sorts data globally for better data skipping.
ZORDER BY limits:
- Sorts only one partition at a time.
- Supports 2 to 4 fields.
- Not supported for clustered tables.
- Can be used with DISTRIBUTE BY but not with ORDER BY, CLUSTER BY, or SORT BY.
Note
Z-Ordering consumes more resources and time during insertion.
SORT BY: Sorts data within a single file. If SORT BY is not specified, local Z-Ordering applies by default.

Examples: Standard tables

Example 1: Append data to a non-partitioned table

-- Create tables
CREATE TABLE IF NOT EXISTS websites (id INT, name STRING, url STRING);
CREATE TABLE IF NOT EXISTS apps (id INT, app_name STRING, url STRING);

-- Append data to the apps table
INSERT INTO apps (id, app_name, url) VALUES (1, 'Aliyun', 'https://www.aliyun.com');

-- Copy data from apps to websites
INSERT INTO websites (id, name, url) SELECT id, app_name, url FROM apps;

-- Verify data
SELECT * FROM websites;

Result:

+------------+------------+------------+
| id         | name       | url        |
+------------+------------+------------+
| 1          | Aliyun     | https://www.aliyun.com |
+------------+------------+------------+

Example 2: Append data to a partitioned table

-- Create table
CREATE TABLE IF NOT EXISTS sale_detail (
  shop_name     STRING,
  customer_id   STRING,
  total_price   DOUBLE
) PARTITIONED BY (sale_date STRING, region STRING);

-- Add partition (optional, can be created automatically)
ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china');

-- Append data
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') 
VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);

-- Verify data
SET odps.sql.allow.fullscan=true; 
SELECT * FROM sale_detail;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Example 3: Overwrite data

-- Create table with same schema
CREATE TABLE sale_detail_insert LIKE sale_detail;

-- Add partition (optional)
ALTER TABLE sale_detail_insert ADD PARTITION (sale_date='2013', region='china');

-- Overwrite data. Partition columns (sale_date, region) are specified in PARTITION(), so do not include them in SELECT.
SET odps.sql.allow.fullscan=true;
INSERT OVERWRITE TABLE sale_detail_insert PARTITION (sale_date='2013', region='china')
  SELECT shop_name, customer_id, total_price 
  FROM sale_detail
  ZORDER BY customer_id, total_price;

-- Verify data
SELECT * FROM sale_detail_insert;

Result:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Example 4: Overwrite data with adjusted column order

Data mapping is based on column order, not names.

SET odps.sql.allow.fullscan=true;
INSERT OVERWRITE TABLE sale_detail_insert PARTITION (sale_date='2013', region='china')
    SELECT customer_id, shop_name, total_price FROM sale_detail;    

SELECT * FROM sale_detail_insert;

Result (Notice the swapped columns):

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| c1         | s1          | 100.1       | 2013       | china      |
| c2         | s2          | 100.2       | 2013       | china      |
| c3         | s3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

Example 5: Incorrect usage (Partition columns in SELECT)

Do not include partition key columns in the SELECT clause when inserting into a static partition.

-- This will fail because sale_date and region are partition keys
INSERT OVERWRITE TABLE sale_detail_insert PARTITION (sale_date='2013', region='china')
   SELECT shop_name, customer_id, total_price, sale_date, region FROM sale_detail;

Example 6: Incorrect usage (Expressions in PARTITION)

PARTITION() values must be constants.

-- This will fail because datepart() is an expression
INSERT OVERWRITE TABLE sale_detail_insert PARTITION (sale_date=datepart('2016-09-18 01:10:00', 'yyyy') , region='china')
   SELECT shop_name, customer_id, total_price FROM sale_detail;

Example 7: Global Z-Ordering

-- Create source table and insert data
CREATE TABLE mf_src (key STRING, value STRING);
INSERT OVERWRITE TABLE mf_src
SELECT a, b FROM VALUES ('1', '1'),('3', '3'),('2', '2') AS t(a, b);

-- Create target table
CREATE TABLE mf_zorder_src LIKE mf_src;

-- Insert with global Z-Ordering
SET odps.sql.default.zorder.type=global;
INSERT OVERWRITE TABLE mf_zorder_src
SELECT key, value FROM mf_src 
ZORDER BY key, value;

SELECT * FROM mf_zorder_src;

Result:

+-----+-------+
| key | value |
+-----+-------+
| 1   | 1     |
| 2   | 2     |
| 3   | 3     |
+-----+-------+

Example 8: Overwrite with Z-Ordering

SET odps.sql.default.zorder.type=global;
INSERT OVERWRITE TABLE target
SELECT key, value FROM target 
ZORDER BY key, value;

Examples: Delta tables

Operations on Delta tables, including automatic deduplication.

-- Create a Delta table
CREATE TABLE IF NOT EXISTS mf_dt (
  pk BIGINT NOT NULL PRIMARY KEY, 
  val BIGINT NOT NULL
) PARTITIONED BY (dd STRING, hh STRING) 
TBLPROPERTIES ("transactional"="true");
                  
-- INSERT OVERWRITE (Auto-deduplication: keeps one row for pk=1, 2, 3)
INSERT OVERWRITE TABLE mf_dt PARTITION (dd='01', hh='01') 
                 VALUES (1, 1), (2, 2), (3, 3);
            
SELECT * FROM mf_dt WHERE dd='01' AND hh='01';
-- Result:
+------------+------------+----+----+
| pk         | val        | dd | hh |
+------------+------------+----+----+
| 1          | 1          | 01 | 01 |
| 3          | 3          | 01 | 01 |
| 2          | 2          | 01 | 01 |
+------------+------------+----+----+

-- INSERT INTO (Appends data, no deduplication by default)
INSERT INTO TABLE mf_dt PARTITION(dd='01', hh='01') 
            VALUES (3, 30), (4, 4), (5, 5);
            
SELECT * FROM mf_dt WHERE dd='01' AND hh='01';
-- Result (Note: pk=3 is duplicated):
+------------+------------+----+----+
| pk         | val        | dd | hh |
+------------+------------+----+----+
| 1          | 1          | 01 | 01 |
| 3          | 30         | 01 | 01 |
| 4          | 4          | 01 | 01 |
| 5          | 5          | 01 | 01 |
| 2          | 2          | 01 | 01 |
| 3          | 3          | 01 | 01 |
+------------+------------+----+----+

-- INSERT OVERWRITE (Overwrites partition, removes duplicates)
INSERT OVERWRITE TABLE mf_dt PARTITION (dd='01', hh='01') 
                 VALUES (1, 1), (2, 2), (3, 3);

SELECT * FROM mf_dt WHERE dd='01' AND hh='01';
-- Result: 
+------------+------------+----+----+
| pk         | val        | dd | hh |
+------------+------------+----+----+
| 1          | 1          | 01 | 01 |
| 3          | 3          | 01 | 01 |
| 2          | 2          | 01 | 01 |
+------------+------------+----+----+

Best practices for Z-Ordering

Evaluate the trade-off between sorting costs (compute) and query benefits (storage/downstream compute). Z-Ordering is not a one-size-fits-all solution.

When to prefer Clustered Indexing (ORDER BY)

Prefix filtering: If filters always use a prefix (e.g., `a`, `a AND b`, `a AND b AND c`), `ORDER BY a, b, c` is more effective than `ZORDER BY` because `ORDER BY` prioritizes the first column.
Join optimization: For frequent join keys, hash clustering or range clustering works better. Clustered indexing helps the SQL engine optimize joins (e.g., sort-merge join), whereas Z-Ordering is often invisible to the engine for such optimizations.
Grouping/Sorting: If you frequently `GROUP BY` or `ORDER BY` specific fields, clustered indexing provides better performance.

Suggestions for using Z-Ordering

Select fields frequently used in filters, especially those used in joins for filtering.
Limit the number of ZORDER columns to 4. Too many columns degrade performance. For single-column sorting, use clustered indexing.
Avoid fields with very low cardinality (e.g., gender) or extremely high cardinality. Low cardinality offers little sorting benefit. High cardinality increases sorting memory costs.
Ensure the data volume is moderate. Small tables don't benefit much; very large tables incur high sorting latency.