All Products
Search
Document Center

MaxCompute:Insert or overwrite data into dynamic partitions (DYNAMIC PARTITION)

Last Updated:Sep 12, 2023

MaxCompute allows you to insert data into a dynamic partition by using INSERT INTO or INSERT OVERWRITE.

You can execute the statements that are described in this topic on the following platforms:

Prerequisites

Before you execute these statements, make sure that you are granted the Alter permission on the destination table and the Describe permission on the metadata of the source table. For more information, see MaxCompute permissions.

Description

When you use MaxCompute SQL to process data, you need to only specify the names of partition key columns in INSERT INTO or INSERT OVERWRITE, instead of the values of the partition key columns. After you specify the values of the partition key columns in select_statement, MaxCompute automatically inserts data into the destination partitions based on the column values.

For more information about how to insert data into a static partition, see Insert or update data into a table or a static partition (INSERT INTO and INSERT OVERWRITE).

Limits

If you insert data into a dynamic partition by using INSERT INTO or INSERT OVERWRITE, take note of the following limits:

  • A maximum of 10,000 dynamic partitions can be generated after INSERT INTO is executed. A maximum of 60,000 dynamic partitions can be generated after INSERT OVERWRITE is executed.

  • In a distributed environment, an SQL statement used to insert or update data into dynamic partitions can generate a maximum of 512 dynamic partitions. If the number of dynamic partitions exceeds this limit, an exception occurs.

  • The values of the dynamic partitions cannot be NULL, and cannot contain special characters. Otherwise, the following error is reported: FAILED: ODPS-0123031:Partition exception - invalid dynamic partition value: province=xxx.

    Note

    The value of a partition key column cannot contain double-byte characters, such as Chinese characters. The value of a partition key column must start with a letter and can contain letters, digits, and supported special characters. It must be 1 to 255 bytes in length. The following special characters are supported: spaces, colons (:), underscores (_), dollar signs ($), number signs (#), periods (.), exclamation points (!), and at signs (@). The behavior of other characters is not defined, such as escape characters \t, \n, and /.

  • Clustered tables do not support dynamic partitions.

Precautions

If you want to update table data into a dynamic partition, take note of the following points:

  • If you want to use INSERT INTO or INSERT OVERWRITE to insert data into a partition that does not exist, MaxCompute automatically creates a partition.

  • If you want to run multiple jobs at the same time to insert data into partitions that do not exist, MaxCompute automatically creates partitions for the first job that is successfully executed. However, only one partition is created for this job.

  • If you cannot control job concurrency, we recommend that you run the ALTER TABLE command to create partitions in advance. For more information, see Partition and column operations.

  • If a destination table has multiple levels of partitions, you can specify some partitions as static partitions in an INSERT statement. However, the static partitions must be high-level partitions.

  • To insert data into a dynamic partition, you must specify partition key columns in select_statement. Otherwise, the data fails to be inserted.

Syntax

insert {into|overwrite} table <table_name> partition (<ptcol_name>[, <ptcol_name> ...]) 
<select_statement> from <from_statement>;
  • table_name: required. The name of the destination table into which you want to insert data.

  • ptcol_name: required. The name of the partition key column in the destination table.

  • select_statement: required. The SELECT clause that is used to query the data that you want to insert into the destination table from the source table.

    If the destination table has only level-1 dynamic partitions, the value of the last field in select_statement indicates the value of the dynamic partition in the destination table. The mappings between the column values in the source table in select_statement and the column values in the destination table are determined by the column sequence, instead of column names. If the sequence of columns in the source table is different from that in the destination table, we recommend that you specify columns in the select_statement based on the column sequence in the destination table.

  • from_statement: required. The FROM clause. This clause specifies the data source, such as a source table name.

Examples

  • Example 1: Insert data from a source table into a destination table. You can obtain the partitions generated based on the region column only after the statement is executed. The following statements show an example:

    -- Create a destination table named total_revenues. 
    create table total_revenues (revenue double) partitioned by (region string);
    
    -- Insert the data from the sale_detail table into the total_revenues table. For more information about the sale_detail table, see Insert or update data into a table or a static partition (INSERT INTO and INSERT OVERWRITE). 
    set odps.sql.allow.fullscan=true; 
    insert overwrite table total_revenues partition(region)
    select total_price as revenue, region from sale_detail;
    
    -- Execute the SHOW PARTITIONS statement to view the partitions in the total_revenues table. 
    show partitions total_revenues;
       
    -- The following result is returned: 
    region=china  
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to query data from the total_revenues table.   
    set odps.sql.allow.fullscan=true; 
    select * from total_revenues;    
    
    -- The following result is returned: 
    +------------+------------+
    | revenue    | region     |
    +------------+------------+
    | 100.1      | china      |
    | 100.2      | china      |
    | 100.3      | china      |
    +------------+------------+        
  • Example 2: Insert data from a source table into a destination table. If the destination table has multiple levels of partitions, the level-1 partition sale_date must be specified. The following statements show an example:

    -- Create a destination table named sale_detail_dypart.  
    create table sale_detail_dypart like sale_detail; 
    
    -- Specify a level-1 partition and insert data into the destination table. 
    insert overwrite table sale_detail_dypart partition (sale_date='2013', region)
    select shop_name,customer_id,total_price,region from sale_detail;
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to query data from the sale_detail_dypart table. 
    set odps.sql.allow.fullscan=true; 
    select * from sale_detail_dypart;
    
    -- The following result is returned: 
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+
  • Example 3: The mappings between the columns in select_statement and the columns in dynamic partitions in a destination table are determined by the column sequence, instead of the column names. The following statements show an example:

    -- Insert data from the sale_detail table into the sale_detail_dypart table. 
    insert overwrite table sale_detail_dypart partition (sale_date, region)
    select shop_name,customer_id,total_price,sale_date,region from sale_detail;
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to query data from the sale_detail_dypart table. 
    set odps.sql.allow.fullscan=true; 
    select * from sale_detail_dypart;
    
    -- The following result is returned: The sale_date column in the dynamic partition of the sale_detail_dypart table is determined by the region column in the sale_detail table. The region column in the dynamic partition of the sale_detail_dypart table is determined by the sale_date column in the sale_detail table. 
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    +------------+-------------+-------------+------------+------------+
    
    -- Insert data from the sale_detail table into the sale_detail_dypart table and change the sequence of columns in select_statement. 
    insert overwrite table sale_detail_dypart partition (sale_date, region)
    select shop_name,customer_id,total_price,region,sale_date from sale_detail;
    
    -- Enable a full table scan only for the current session. Execute the SELECT statement to query data from the sale_detail_dypart table. 
    set odps.sql.allow.fullscan=true; 
    select * from sale_detail_dypart;
    
    -- The following result is returned: The sale_date column in the dynamic partition of the sale_detail_dypart table is determined by the region column in the sale_detail table. The region column in the dynamic partition of the sale_detail_dypart table is determined by the sale_date column in the sale_detail table. 
    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | china      | 2013       |
    | s2         | c2          | 100.2       | china      | 2013       |
    | s3         | c3          | 100.3       | china      | 2013       |
    +------------+-------------+-------------+------------+------------+
  • Example 4: If you insert data into a dynamic partition, you must specify the columns in the dynamic partition in select_statement. Otherwise, the data fails to be inserted. Sample statement of incorrect usage:

    insert overwrite table sale_detail_dypart partition (sale_date='2013', region)
    select shop_name,customer_id,total_price from sale_detail;
  • Example 5: If you specify only low-level sub-partitions when you insert data into dynamic partitions, you may fail to insert data into high-level partitions. Sample statement of incorrect usage:

    insert overwrite table sale_detail_dypart partition (sale_date, region='china')
    select shop_name,customer_id,total_price,sale_date from sale_detail_dypart;
  • Example 6: If the data type of a partition key column does not exactly match the data type of the column in select_statement, an implicit conversion is performed when MaxCompute inserts data into a dynamic partition. The following statements show an example:

    -- Create a source table named src. 
    create table src (c int, d string) partitioned by (e int);
    
    -- Add a partition to the src table. 
    alter table src add if not exists partition (e=201312);
    
    -- Append data to the src table. 
    insert into src partition (e=201312) values (1,100.1),(2,100.2),(3,100.3);
    
    -- Create a destination table named parttable. 
    create table parttable(a int, b double) partitioned by (p string);
    
    -- Insert data from the src table into the parttable table. 
    insert into parttable partition (p) select c, d, current_timestamp() from src;
    
    -- Query data in the parttable table. 
    set odps.sql.allow.fullscan=true;
    select * from parttable;
    
    -- The following result is returned: 
    +------------+------------+------------+
    | a          | b          | p          |
    +------------+------------+------------+
    | 1          | 100.1      | 2020-11-25 15:13:28.686 |
    | 2          | 100.2      | 2020-11-25 15:13:28.686 |
    | 3          | 100.3      | 2020-11-25 15:13:28.686 |
    +------------+------------+------------+
    Note

    If your data is ordered, it is randomly scattered when the data is inserted into a dynamic partition. This reduces the data compression ratio. In this case, we recommend that you use Tunnel commands to upload the data to dynamic partitions to increase the data compression ratio. For more information about how to use Tunnel commands, see Migrate data from ApsaraDB RDS to MaxCompute based on dynamic partitioning.