A partitioned table is one where data is divided and stored in different physical locations based on one or more columns. Effective partitioning accelerates queries, simplifies data management , and enables more flexible data operations.
Partitioning options
MaxCompute offers two approaches to managing partitions:
Stardard partition tables: You define one or more columns as partition keys to segment the table's data.
Auto partition tables: Partition columns are automatically generated based on the following two methods.
Auto-partition method
Description
MaxCompute automatically partitions the table by applying the TRUNC_TIME function to a time-based column (e.g., DATE, DATETIME, TIMESTAMP) and using the calculated result as the partition value.
MaxCompute automatically partitions the table by applying the TRUNC_TIME function to the data's ingestion time (the time it's written to the system). This method is known as Ingestion Time Partitioning.
When to use partitioned tables
Consider partitioning a table in the following scenarios:
Accelerate queries: With partitioned tables, queries scan only necessary partitions instead of the entire table, reducing the volume of data processed.
Reduce costs: Specifying partition filters in your queries avoids the resource waste associated with full table scans, effectively lowering your computation costs.
Manage data at the partition level: You can write data to or delete data from specific partitions without a full table scan. This allows for efficient data management and enables automatic data cleanup at the partition level by using lifecycle policies.
For optimal performance, ensure each partition contains at least 10,000 rows. An excessive number of small partitions may degrade query performance.
Limits
A table can have a maximum of 6 partition levels.
A table can have a maximum of 60,000 partitions.
A single query can query a maximum of 10,000 partitions.
The values in a partition key column of the STRING type cannot contain Chinese characters.
Data types of partition columns
In MaxCompute V2.0, partition columns support STRING, TINYINT, SMALLINT, INT, BIGINT, and VARCHAR types.
MaxCompute stores partition values as STRING, automatically converting other types. To avoid potential errors, define partition columns as STRING.
In MaxCompute 1.0 data type version, partition columns only support the STRING type.
While you can define a partition column's type as BIGINT, this only affects how the type is displayed in the table schema. In all computational operations, such as calculations and comparisons, MaxCompute treats the column as a STRING. In the following example, the returned result is empty.
---Create a table named parttest. CREATE TABLE parttest (a BIGINT) PARTITIONED BY (pt BIGINT); ---Insert data into the table. INSERT INTO parttest PARTITION(pt)(a,pt) VALUES (1, 1); INSERT INTO parttest PARTITION(pt)(a,pt) VALUES (1, 10); ---Query the rows where the value of pt is greater than or equal to 2. SELECT * FROM parttest WHERE pt >= '2';MaxCompute does not support using time/date type data as partition columns, but you can use the auto partition feature to generate partitions based on function calculations on time/date type data in the table. For more information, see Examples of auto partition tables.
Examples of regular partition tables
Create a partitioned table.
-- Create a regular two-level partitioned table, with date as the first-level partition and region as the second-level partition CREATE TABLE src (shop_name STRING, customer_id BIGINT) PARTITIONED BY (pt STRING,region STRING);Add partitions and execute the INSERT INTO command to append data to the partitioned table
src.ALTER TABLE src ADD PARTITION (pt='20170506',region='hangzhou') PARTITION (pt='20180101',region='shanghai'); INSERT INTO src PARTITION (pt='20170506',region='hangzhou') VALUES ('s1',1),('s2',2),('s3',3); INSERT INTO src PARTITION (pt='20180101',region='shanghai') VALUES ('c1',4),('c2',5),('c3',6);Use partition as filter conditions to query a table.
Correct example: When a query specifies
pt='20170506'andregion='hangzhou', MaxCompute will precisely locate the corresponding first-level and second-level partitions, processing only the matching data and avoiding full table scans.SELECT * FROM src WHERE pt='20170506' AND region='hangzhou';Output:
+------------+-------------+------------+------------+ | shop_name | customer_id | pt | region | +------------+-------------+------------+------------+ | s1 | 1 | 20170506 | hangzhou | | s2 | 2 | 20170506 | hangzhou | | s3 | 3 | 20170506 | hangzhou | +------------+-------------+------------+------------+Incorrect example: With this approach, partition pruning is not guaranteed to work correctly. The
ptfield is of STRING type, but when compared to a BIGINT value (e.g., 20170506), MaxCompute implicitly casts both operands to DOUBLE, which can lead to precision loss and compromise partition pruning optimization.SELECT * FROM src WHERE pt = 20170506 AND region = hangzhou;