This topic describes how to use the INSERT OVERWRITE and INSERT INTO statements to update table data.
INSERT OVERWRITE|INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [(col1,col2 ...)] select_statement FROM from_statement;
INSERT INTOare used to save calculation results to a destination table.
- INSERT INTO: inserts data into a table or into a partition of a table. However, you cannot use
INSERT INTOto insert data into a hash clustering table.
- INSERT OVERWRITE: clears the original data from a table, and then inserts data into the table or its
partition. If you use
INSERT OVERWRITE, you cannot specify the columns into which data is inserted. If you need to specify the columns, use
- The syntax of
INSERTstatements in MaxCompute differs from that of the commonly used
INSERTstatements in MySQL or Oracle. To execute
INSERT INTOin MaxCompute, you must add keyword
tablenamein the statement.
- If you repeat the
INSERT OVERWRITEoperation on a partition, the volume of the data queried by using
DESCRIBEmay vary. The reason is that the logic used to split a file changes after you use
SELECTto extract the data from a partition of a table and then use
INSERT OVERWRITEto insert the data into the same partition. The total data length remains unchanged after the
INSERT OVERWRITEoperation. As a result, the storage fees remain unchanged.
- The syntax of
- tablename: the name of the destination table into which you want to insert data.
- PARTITION (partcol1=val1, partcol2=val2 ...): the name of the partition into which you want to insert data. The value must be a constant. It cannot be an expression, such as a function.
- select_statement: the SELECT clause used to query the data you want to insert from the source table.
- The mappings between the source and destination tables depend on the column sequence in the SELECT clause, rather than the mappings of column names between tables.
- When you insert data into a partition, partitioning columns cannot exist in the SELECT clause.
- from_statement: the FROM clause used to indicate the data source. For example, the value can be a source table name.
- Calculate the amount of sales of different regions listed in the
sale_detailtable and then insert the obtained data into the
-- Create the sale_detail_insert destination table. create table sale_detail_insert like sale_detail; -- Add a partition to the destination table. alter table sale_detail_insert add partition(sale_date='2013', region='china'); -- Extract the data from sale_detail and then insert it into sale_detail_insert. insert overwrite table sale_detail_insert partition (sale_date='2013', region='china') select shop_name, customer_id,total_price from sale_detail;
- The mappings between the source and destination tables depend on the column sequence
SELECTclause, rather than the mappings of column names between tables. The following provides an example.
insert overwrite table sale_detail_insert partition (sale_date='2013', region='china') select customer_id, shop_name, total_price from sale_detail;
When you create the
sale_detail_inserttable, the column sequence is defined as
customer_id string, and then
total_price bigint. However, you insert the data from
sale_detail_insertbased on the sequence of
shop_name, and then
total_price. As a result, the data in the
sale_detail.customer_idcolumn is inserted into the
sale_detail_insert.shop_namecolumn, and the data in the
sale_detail.shop_namecolumn is inserted into the
- When you insert data into a partition, partitioning columns cannot exist in the
SELECTclause. In the following example, an error is returned because
sale_date,regionis a partitioning column that cannot exist in an INSERT statement for a static partition.
insert overwrite table sale_detail_insert partition (sale_date='2013', region='china') select shop_name, customer_id, total_price, sale_date, region from sale_detail;
partitionmust be constants, not expressions. The following example shows an incorrect usage.
insert overwrite table sale_detail_insert partition (sale_date=datepart('2016-09-18 01:10:00', 'yyyy') , region='china') select shop_name, customer_id, total_price from sale_detail;
Precautions for dynamic partitions
- If you perform the
insert into partitionoperation but the specified partition does not exist, a partition is automatically created.
- If you perform multiple
insert into partitionoperations in parallel but the specified partitions do not exist, only one partition is automatically created.
- If concurrent
insert into partitionoperations are required, you must create a partition in advance to avoid issues caused by concurrent operations.