This topic describes how to use the INSERT OVERWRITE and INSERT INTO statements to update table data.

Descriptions

Syntax
INSERT OVERWRITE|INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [(col1,col2 ...)]
select_statement
FROM from_statement;

Function

If MaxCompute SQL is used to process data, INSERT OVERWRITE and INSERT INTO are used to save calculation results to a destination table.
  • INSERT INTO: inserts data into a table or into a partition of a table. However, you cannot use INSERT INTO to insert data into a hash clustering table.
  • INSERT OVERWRITE: clears the original data from a table, and then inserts data into the table or its partition. If you use INSERT OVERWRITE, you cannot specify the columns into which data is inserted. If you need to specify the columns, use INSERT INTO instead.
    Note
    • The syntax of INSERT statements in MaxCompute differs from that of the commonly used INSERT statements in MySQL or Oracle. To execute INSERT OVERWRITE or INSERT INTO in MaxCompute, you must add keyword TABLE before tablename in the statement.
    • If you repeat the INSERT OVERWRITE operation on a partition, the volume of the data queried by using DESCRIBE may vary. The reason is that the logic used to split a file changes after you use SELECT to extract the data from a partition of a table and then use INSERT OVERWRITE to insert the data into the same partition. The total data length remains unchanged after the INSERT OVERWRITE operation. As a result, the storage fees remain unchanged.
Parameters
  • tablename: the name of the destination table into which you want to insert data.
  • PARTITION (partcol1=val1, partcol2=val2 ...): the name of the partition into which you want to insert data. The value must be a constant. It cannot be an expression, such as a function.
  • select_statement: the SELECT clause used to query the data you want to insert from the source table.
    Note
    • The mappings between the source and destination tables depend on the column sequence in the SELECT clause, rather than the mappings of column names between tables.
    • When you insert data into a partition, partitioning columns cannot exist in the SELECT clause.
  • from_statement: the FROM clause used to indicate the data source. For example, the value can be a source table name.
Examples
  • Calculate the amount of sales of different regions listed in the sale_detail table and then insert the obtained data into the sale_detail_insert table.
    -- Create the sale_detail_insert destination table.
    create table sale_detail_insert like sale_detail;
    
    -- Add a partition to the destination table.
    alter table sale_detail_insert add partition(sale_date='2013', region='china');
    
    -- Extract the data from sale_detail and then insert it into sale_detail_insert.
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
      select shop_name, customer_id,total_price from sale_detail;
  • The mappings between the source and destination tables depend on the column sequence in the SELECT clause, rather than the mappings of column names between tables. The following provides an example.
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
        select customer_id, shop_name, total_price from sale_detail;                      

    When you create the sale_detail_insert table, the column sequence is defined as shop_name string, customer_id string, and then total_price bigint. However, you insert the data from sale_detail to sale_detail_insert based on the sequence of customer_id, shop_name, and then total_price. As a result, the data in the sale_detail.customer_id column is inserted into the sale_detail_insert.shop_name column, and the data in the sale_detail.shop_name column is inserted into the sale_detail_insert.customer_id column.

  • When you insert data into a partition, partitioning columns cannot exist in the SELECT clause. In the following example, an error is returned because sale_date,region is a partitioning column that cannot exist in an INSERT statement for a static partition.
    insert overwrite table sale_detail_insert partition (sale_date='2013', region='china')
       select shop_name, customer_id, total_price, sale_date, region  from sale_detail;
  • partition must be constants, not expressions. The following example shows an incorrect usage.
    insert overwrite table sale_detail_insert partition (sale_date=datepart('2016-09-18 01:10:00', 'yyyy') , region='china')
       select shop_name, customer_id, total_price from sale_detail;

Precautions for dynamic partitions

If you need to update table data to a dynamic partition, note the following points:
  • If you perform the insert into partition operation but the specified partition does not exist, a partition is automatically created.
  • If you perform multiple insert into partition operations in parallel but the specified partitions do not exist, only one partition is automatically created.
  • If concurrent insert into partition operations are required, you must create a partition in advance to avoid issues caused by concurrent operations.
For more information about dynamic partitions, see Insert data in dynamic partition mode (DYNAMIC PARTITION).