MaxCompute allows you to execute the INSERT INTO or INSERT OVERWRITE statement to insert or update data into a destination table or static partitions.

Before you execute these statements, make sure that you are granted the ALTER permission on the destination table and the DESCRIBE permission on metadata in the source table. For more information, see Authorize users.

You can execute the INSERT INTO or INSERT OVERWRITE statement on the following platforms:

Description

When you use MaxCompute SQL to process data, you can execute the INSERT INTO or INSERT OVERWRITE statement to save the execution results of the SELECT statements to a destination table. Differences between the two statements:
  • INSERT INTO: inserts data into a table or a static partition of a table. You can specify the values of partition key columns in this statement to insert data into a specified partition. If you want to insert a small amount of test data, you can use this statement with VALUES.
  • INSERT OVERWRITE: clears the original data from a specified table, and then inserts data into the table or its static partitions.
    Note
    • The INSERT syntax in MaxCompute is different from that in MySQL or Oracle. You must add both the TABLE keyword and table_name in INSERT INTO or INSERT OVERWRITE.
    • If you execute the INSERT OVERWRITE statement on the same partition several times, the size of the partition into which data is inserted may be different every time you run the DESC command. This is because the logic to split files changes after you sequentially execute the SELECT and INSERT OVERWRITE statements for the same partition in a table. The total length of data remains the same after you execute INSERT OVERWRITE. You do not need to worry about the storage fees.

For more information about how to insert data into a dynamic partition, see Insert data into dynamic partitions (DYNAMIC PARTITION).

Limits

When you execute INSERT INTO or INSERT OVERWRITE to insert or update data into a table or a static partition of a table, take note of the following limits:
  • INSERT INTO: This statement cannot be used to insert data into a clustered table.
  • INSERT OVERWRITE: This statement does not allow you to specify the columns into which you want to insert data. You can execute the INSERT INTO statement to specify the columns.

Syntax

insert {into|overwrite} table <table_name> [partition (<pt_spec>)] [(<col_name> [,<col_name> ...)]
<select_statement>
from <from_statement>
[zorder by <zcol_name> [, <zcol_name> ...]] ;
  • table_name: required. The name of the destination table into which you want to insert data.
  • pt_spec: optional. The partition into which you want to insert data. Only constants are allowed. Expressions, such as functions, are not allowed. The value is in the format of (partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...).
  • col_name: optional. The name of the column in the destination table into which you want to insert data. INSERT OVERWRITE does not allow you to specify [(<col_name> [,<col_name> ...)].
  • select_statement: required. The SELECT clause that is used to query data that you want to insert into the destination table from the source table.
    Note
    • The mappings between the source and destination tables are based on the column sequence in select_statement, instead of the mappings between column names in the tables.
    • If the destination table has static partitions and you want to insert data into a static partition, partition key columns cannot be included in select_statement.
  • from_statement: required. The FROM clause that indicates the data source, such as a source table name.
  • zorder by <zcol_name> [, <zcol_name> ...]: optional. If you write data to a table or partition, you can use this clause to co-locate rows with similar data records based on the columns specified in select_statement. This improves filtering performance for queries and reduces storage costs. The ORDER BY x, y clause sorts data records based on the ordering of x coming before y. The ZORDER BY x, y clause co-locates rows with similar x values and rows with similar y values. In terms of column-based data filtering and sorting in an SQL SELECT statement, the ORDER BY clause filters and sorts data based on x, whereas the ZORDER BY clause filters and sorts data based on x or on both x and y. This increases the column compression ratio.
    Note
    • If you use the ZORDER BY clause to write data, more resources and time are consumed.
    • If the destination table is a clustered table, the ZORDER BY clause is not supported.

Examples

  • Example 1: Execute the INSERT INTO statement to append data to a partitioned table named sale_detail. Sample commands:
    -- Create a partitioned table named sale_detail.
    create table if not exists sale_detail
    (
    shop_name     string,
    customer_id   string,
    total_price   double
    )
    partitioned by (sale_date string, region string);
    
    -- Add a partition to the sale_detail table.
    alter table sale_detail add partition (sale_date='2013', region='china');
    
    -- Append data to the sale_detail table. The abbreviated form of INSERT INTO TABLE table_name is INSERT INTO table_name. However, the TABLE keyword in INSERT OVERWRITE TABLE table_name cannot be omitted.
    insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to view data in the sale_detail table.
    set odps.sql.allow.fullscan=true; 
    select * from sale_detail;
    
    -- The following result is returned:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+
  • Example 2: Execute the INSERT OVERWRITE statement to update the data in the sale_detail_insert table. Sample commands:
    -- Create the sale_detail_insert table that has the same schema as the sale_detail table.
    create table sale_detail_insert like sale_detail;
    
    -- Add a partition to the sale_detail_insert table.
    alter table sale_detail_insert add partition (sale_date='2013', region='china');
    
    -- Extract data from the sale_detail table and insert the data into the sale_detail_insert table. Names of partition key columns in the sale_detail_insert table do not need to be declared and cannot be rearranged.
    -- If the sale_detail_insert table contains static partitions, the values of partition key columns are declared in PARTITION(). These values do not need to be included in select_statement. You need only to search for column names based on the sequence of common columns in the sale_detail_insert table and sequentially map the declared column values to the columns in the sale_detail_insert table. If the sale_detail_insert table contains dynamic partitions, the names of partition key columns must be included in select_statement. For more information, see Insert data into dynamic partitions (DYNAMIC PARTITION).
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
      select 
      shop_name, 
      customer_id,
      total_price 
      from sale_detail
      zorder by customer_id, total_price;
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to view the data in the sale_detail_insert table.
    set odps.sql.allow.fullscan=true;
    select * from sale_detail_insert;
    
    -- The following result is returned:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+
  • Example 3: Execute the INSERT OVERWRITE statement to update the data in the sale_detail_insert table and adjust the sequence of columns in select_statement. The mappings between the source and destination tables are based on the sequence of columns in select_statement, instead of the mappings between column names in the two tables. Sample commands:
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
        select customer_id, shop_name, total_price from sale_detail;   
    select * from sale_detail_insert;                   
    The following result is returned:
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | c1         | s1          | 100.1       | 2013       | china      |
    | c2         | s2          | 100.2       | 2013       | china      |
    | c3         | s3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+

    When you create the sale_detail_insert table, the column sequence is defined as shop_name string, customer_id string, and then total_price bigint. However, you insert the data from the sale_detail table to the sale_detail_insert table based on the sequence of customer_id, shop_name, and then total_price. As a result, the data in the sale_detail.customer_id column is inserted into the sale_detail_insert.shop_name column, and the data in the sale_detail.shop_name column is inserted into the sale_detail_insert.customer_id column.

  • Example 4: If you insert data into a partition, its partition key columns cannot be included in select_statement. After the following statement is executed, an error is returned. This is because sale_date and region are partition key columns. These columns cannot be included in select_statement if INSERT OVERWRITE or INSERT INTO statement is used to insert or update data into a static partition. Example of incorrect usage:
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
       select shop_name, customer_id, total_price, sale_date, region from sale_detail;
  • Example 5: pt_spec in PARTITION() must be constants instead of expressions. Example of incorrect usage:
    insert overwrite table sale_detail_insert partition (sale_date=datepart('2016-09-18 01:10:00', 'yyyy') , region='china')
       select shop_name, customer_id, total_price from sale_detail;