A partitioned table is a table that is divided into multiple segments (partitions) based on one or more partitioning columns in the table. A partition corresponds to an independent folder in a distributed file system. Partitioning groups data of the same category into the same folder.
Partitioning improves query performance. You can specify the name of the partition that you want to access by using the WHERE clause. This way, MaxCompute only scans the specified partition. This improves processing efficiency and reduces cost.
Each value in a partitioning column is called a partition or directory in MaxCompute. You can group multiple fields of a table to a single partition to create multi-level partitions. Multi-level partitions are similar to multi-level directories.
MaxCompute V2.0 supports the following data types for partitioning columns: TINYINT, SMALLINT, INT, BIGINT, VARCHAR, and STRING.
--- Create a table named parttest. CREATE TABLE parttest (a bigint) PARTITIONED BY (pt bigint); --- Insert data into parttest. INSERT INTO parttest partition(pt) SELECT 1, 2 from dual; INSERT INTO parttest partition(pt) SELECT 1, 10 from dual; --- Query the rows whose pt field is greater than or equal to 2. SELECT * FROM parttest WHERE pt >= '2';
- A table can contain a maximum of six levels of partitions.
- A table can contain a maximum of 60,000 partitions.
- A maximum of 10,000 partitions can be queried at a time.
- The partitioning column value of a partition whose data type is STRING cannot be Chinese characters.
- Create a partition.
-- Create a partitioned table that contains two levels of partitions in which date is the level-1 partition and region is the level-2 partition. CREATE TABLE src (key string, value bigint) PARTITIONED BY (pt string,region string);
- During the query, use a partitioning column as a filter condition in the WHERE clause.
-- Correct usage. When MaxCompute generates a query plan, only the data of the '20170601' partition whose region is 'hangzhou' is used as input data. select * from src where pt='20170601'and region='hangzhou'; -- Incorrect usage. In this example, the effectiveness of the partition filtering cannot be guaranteed. The pt value is of the STRING type. When the value of the STRING type is compared with that of the BIGINT type (20170601), MaxCompute converts both data types to DOUBLE, which causes a loss in precision. select * from src where pt = 20170601;
Some SQL operations on partitions are less efficient and may result in higher costs, for example, DYNAMIC PARTITION. For more information, see Insert data in dynamic partition mode (DYNAMIC PARTITION).
The syntax of some operations that are used to process partitioned and non-partitioned tables is different in MaxCompute. For more information, see Table operations and INSERT OVERWRITE and INSERT INTO.