MaxCompute allows you to insert data into a dynamic partition by using INSERT INTO
or INSERT OVERWRITE
.
Prerequisites
Before you execute these statements, make sure that you are granted the ALTER permission on the destination table and the DESCRIBE permission on metadata of the source table. For more information, see Authorize users.
Description
When you use MaxCompute SQL to process data, you need only to specify the names of
partition key columns in INSERT INTO
or INSERT OVERWRITE
, instead of the values of the partition key columns. After you specify the values
of the partition key columns in select_statement
, MaxCompute automatically inserts data into the destination partitions based on the
column values.
For more information about how to insert data into a static partition, see Insert or update data into a table or a static partition (INSERT INTO and INSERT OVERWRITE).
Limits
INSERT INTO
or INSERT OVERWRITE
, take note of the following limits:
- A maximum of 10,000 dynamic partitions can be generated after
INSERT INTO
is executed. A maximum of 60,000 dynamic partitions can be generated afterINSERT OVERWRITE
is executed. - In a distributed environment, an SQL statement used to insert or update data into dynamic partitions can generate a maximum of 512 dynamic partitions. If the number of dynamic partitions exceeds this limit, an exception occurs.
- The values of the dynamic partitions cannot be NULL, and cannot contain special characters.
Otherwise, the following error is reported:
FAILED: ODPS-0123031:Partition exception - invalid dynamic partition value: province=xxx
. - Clustered tables do not support dynamic partitions.
Usage notes
- If you want to use INSERT INTO or INSERT OVERWRITE to
insert data into a partition
that does not exist, MaxCompute automatically creates a partition. - If you want to run multiple jobs at the same time to
insert data into partitions
that do not exist, MaxCompute automatically creates partitions for the first job that is successfully executed. However, only one partition is created for this job. - If you cannot control
job concurrency
, we recommend that you run theALTER TABLE
command to create partitions in advance. For more information, see Partition and column operations. - If a destination table has multiple levels of partitions, you can specify some partitions
as static partitions in an
INSERT
statement. However, the static partitions must be high-level partitions. - To insert data into a dynamic partition, you must specify partition key columns in
select_statement
. Otherwise, the data fails to be inserted.
Syntax
insert {into|overwrite} table <table_name> partition (<ptcol_name>[, <ptcol_name> ...])
<select_statement> from <from_statement>;
- table_name: required. The name of the destination table into which you want to insert data.
- ptcol_name: required. The name of the partition key column in the destination table.
- select_statement: required. The
SELECT
clause that is used to query the data that you want insert into the destination table from the source table.If the destination table has only level-1 dynamic partitions, the value of the last field in
select_statement
indicates the value of the dynamic partition in the destination table. The mappings between the column values in the source table inselect_statement
and the column values in the destination table are determined by the column sequence, instead of column names. If the sequence of columns in the source table is different from that in the destination table, we recommend that you specify columns in the select_statement based on the column sequence in the destination table. - from_statement: required. The
FROM
clause that indicates the data source, such as a source table name.
Examples
- Example 1: Insert data from a source table into a destination table. You can obtain
the partitions generated based on the region column only after the statement is executed. The following statements show an example:
-- Create a destination table named total_revenues. create table total_revenues (revenue double) partitioned by (region string); -- Insert the data from the sale_detail table into the total_revenues table. For more information about the sale_detail table, see Insert or update data into a table or a static partition (INSERT INTO and INSERT OVERWRITE). insert overwrite table total_revenues partition(region) select total_price as revenue, region from sale_detail; -- Execute the
SHOW PARTITIONS
statement to view the partitions in the total_revenues table. show partitions total_revenues; -- The following result is returned: region=china -- Enable a full table scan only for the current session. Execute theSELECT
statement to query data from the total_revenues table. set odps.sql.allow.fullscan=true; select * from total_revenues; -- The following result is returned: +------------+------------+ | revenue | region | +------------+------------+ | 100.1 | china | | 100.2 | china | | 100.3 | china | +------------+------------+ - Example 2: Insert data from a source table into a destination table. If the destination
table has multiple levels of partitions, the level-1 partition sale_date must be specified. The following statements show an example:
-- Create a destination table named sale_detail_dypart. create table sale_detail_dypart like sale_detail; -- Specify a level-1 partition and insert data into the destination table. insert overwrite table sale_detail_dypart partition (sale_date='2013', region) select shop_name,customer_id,total_price,region from sale_detail; -- Enable a full table scan only for the current session. Execute the
SELECT
statement to query data from the sale_detail_dypart table. set odps.sql.allow.fullscan=true; select * from sale_detail_dypart; -- The following result is returned: +------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | +------------+-------------+-------------+------------+------------+ - Example 3: The mappings between the columns in select_statement and the columns in dynamic partitions in a destination table are determined by the
column sequence, instead of the column names. The following statements show an example:
-- Insert data from the sale_detail table into the sale_detail_dypart table. insert overwrite table sale_detail_dypart partition (sale_date, region) select shop_name,customer_id,total_price,sale_date,region from sale_detail; -- Enable a full table scan only for the current session. Execute the
SELECT
statement to query data from the sale_detail_dypart table. set odps.sql.allow.fullscan=true; select * from sale_detail_dypart; -- The following result is returned: The sale_date column in the dynamic partition of the sale_detail_dypart table is determined by the sale_date column in the sale_detail table. The region column in the dynamic partition of the sale_detail_dypart table is determined by the region column in the sale_detail table. +------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | +------------+-------------+-------------+------------+------------+ -- Insert data from the sale_detail table into the sale_detail_dypart table and change the sequence of columns inselect_statement
. insert overwrite table sale_detail_dypart partition (sale_date, region) select shop_name,customer_id,total_price,region,sale_date from sale_detail; -- Enable a full table scan only for the current session. Execute theSELECT
statement to query data from the sale_detail_dypart table. set odps.sql.allow.fullscan=true; select * from sale_detail_dypart; -- The following result is returned: The sale_date column in the dynamic partition of the sale_detail_dypart table is determined by the region column in the sale_detail table. The region column in the dynamic partition of the sale_detail_dypart table is determined by the sale_date column in the sale_detail table. +------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | china | 2013 | | s2 | c2 | 100.2 | china | 2013 | | s3 | c3 | 100.3 | china | 2013 | +------------+-------------+-------------+------------+------------+ - Example 4: If you insert data into a dynamic partition, you must specify the columns
in the dynamic partition in
select_statement
. Otherwise, the data fails to be inserted. Example of incorrect usage:insert overwrite table sale_detail_dypart partition (sale_date='2013', region) select shop_name,customer_id,total_price from sale_detail;
- Example 5: If you specify only low-level sub-partitions when you insert data into
dynamic partitions, you may fail to insert data into high-level partitions. Example
of incorrect usage:
insert overwrite table sale_detail_dypart partition (region='china', sale_date) select shop_name,customer_id,total_price,sale_date from sale_detail;
- Example 6: If the data type of a partition key column does not exactly match the data
type of the column in
select_statement
, an implicit conversion is performed when MaxCompute inserts data into a dynamic partition. The following statements show an example:-- Create a source table named src. create table src (c int, d string) partitioned by (e int); -- Add a partition to the src table. alter table src add if not exists partition (e=201312); -- Append data to the src table. insert into src partition (e=201312) values (1,100.1),(2,100.2),(3,100.3); -- Create a destination table named parttable. create table parttable(a int, b double) partitioned by (p string); -- Insert data from the src table into the parttable table. insert into parttable partition (p) select c, d, current_timestamp() from src; -- Query data in the parttable table. select * from parttable; -- The following result is returned: +------------+------------+------------+ | a | b | p | +------------+------------+------------+ | 1 | 100.1 | 2020-11-25 15:13:28.686 | | 2 | 100.2 | 2020-11-25 15:13:28.686 | | 3 | 100.3 | 2020-11-25 15:13:28.686 | +------------+------------+------------+
Note If your data is ordered, it is randomly scattered when the data is inserted into a dynamic partition. This reduces the data compression ratio. In this case, we recommend that you use Tunnel commands to upload the data to dynamic partitions to increase the data compression ratio. For more information about how to use Tunnel commands, see Migrate data from ApsaraDB RDS to MaxCompute based on dynamic partitioning.