To improve MaxCompute’s processing efficiency, you can specify a partition when creating a table. Specifically, several fields in the table can be specified as partition columns. A partition is comparable in terms of functionality to a directory under a file system.
In MaxCompute each value of a partition column is used as a partition. You can specify multiple fields of the table as a partition whereby they then function similarly to multi-level directories. If the partitions to be accessed are specified when you use data, then only corresponding partitions are read and a full table scan is avoided, improving processing efficiency while reducing costs.
Currently, MaxCompute supports the following partition types: TINYINT, SMALLINT, INT, BIGINT, VARCHAR, and STRING.
In MaxCompute versions earlier than 2.0, only STRING partition type is supported. Although the partition type can be specified as BIGINT, it is still handled as STRING, and only the schema of the table is indicated as a BIGINT type.
An example is as follows:
create table parttest (a bigint) partitioned by (pt bigint);
insert into parttest partition(pt) select 1, 2 from dual;
insert into parttest partition(pt) select 1, 10 from dual;
select * from parttest where pt >= 2;
After the execution, the returned result is only one line, because 10 was treated as a STRING and compared with 2, meaning no result can be returned.
When using a partition, the following restrictions apply:
The maximum number of partition levels for a single table is 6 levels.
The maximum number of single table partitions is 60,000.
The maximum number of query partitions for a query is 10,000.
For example, to create a two-level partition table with the date as the level one partition and the region as the level two partition:
create table src (key string, value bigint) partitioned by (pt string,region string);
When querying, use the partition column as a filter condition in the WHERE condition filter:
select * from src where pt='20170601' and region='hangzhou'; -- This example is the correct method of using WHERE conditional filter. When MaxCompute generates a query plan, only data of the region 'hangzhou' under the '20170601' partition is accessed.
select * from src where pt = 20170601; -- This example is an incorrect method of using the WHERE conditional filter. In this example, the effectiveness of the partition filter cannot be guaranteed. Pt is a STRING type. When the STRING type is compared with BIGINT type (20170601), MaxCompute converts both to DOUBLE type, and loss of precision occurs.
Some SQL operations on the partitions are less efficient and may cause higher billing, for example, using dynamic partition.