All Products
Search
Document Center

AnalyticDB:CREATE TABLE: Create partitioned and replicated tables

Last Updated:May 21, 2026

This topic explains how to use the CREATE TABLE syntax in AnalyticDB for MySQL to create partitioned and replicated tables and define their distribution keys, partition keys, indexes, lifecycle, and tiered storage.

Data distribution strategy

Before creating a table, use this diagram to understand key concepts such as shards, partitions, and clustered indexes.

image

Syntax

CREATE TABLE [IF NOT EXISTS] table_name
  ({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'column_comment']
  | table_constraints}
  [, ... ])
  [table_attribute]
  [partition_options]
  [index_all]
  [storage_policy]
  [block_size]
  [engine]
  [table_properties]
  [AS query_expr]
  [COMMENT 'table_comment']

column_attributes:
  [DEFAULT {constant | CURRENT_TIMESTAMP}]
  [AUTO_INCREMENT]

column_constraints:
  [{NOT NULL|NULL} ]
  [PRIMARY KEY]

table_constraints:
  [{INDEX|KEY} [index_name] (column_name|column_name->'$.json_path'|column_name->'$[*]')][,...]
  [FULLTEXT [INDEX|KEY] [index_name] (column_name) [index_option]] [,...]
  [PRIMARY KEY [index_name] (column_name,...)]
  [CLUSTERED KEY [index_name] (column_name[ASC|DESC],...) ]
  [[CONSTRAINT [symbol]] FOREIGN KEY (fk_column_name) REFERENCES pk_table_name (pk_column_name)][,...]
  [ANN INDEX [index_name] (column_name,...) [index_option]] [,...]

table_attribute:
  DISTRIBUTED BY HASH(column_name,...) | DISTRIBUTED BY BROADCAST

partition_options:
  PARTITION BY 
        {VALUE(column_name) | VALUE(DATE_FORMAT(column_name, 'format')) | VALUE(FROM_UNIXTIME(column_name, 'format'))}
  LIFECYCLE N
  
 index_all:
 INDEX_ALL= 'Y|N'

storage_policy:
  STORAGE_POLICY= {'HOT'|'COLD'|'MIXED' {hot_partition_count=N}}

block_size:
  BLOCK_SIZE= VALUE

engine:
  ENGINE= 'XUANWU|XUANWU_V2'
Note

Internal tables in AnalyticDB for MySQL use the zstd compression algorithm by default.

Parameters

Parameters

Parameter

Description

table_name

The table name must be 1 to 127 characters long, start with a letter or an underscore (_), and contain only letters, digits, and underscores (_).

Use the db_name.table_name format to create a table in a specific database.

column_name

The column name must be 1 to 127 characters long, start with a letter or an underscore (_), and contain only letters, digits, and underscores (_).

column_type

The data type of the column. For the data types supported by AnalyticDB for MySQL, see Basic data types and Complex data types.

COMMENT

A comment for the column or table.

Column attributes

DEFAULT {constant | CURRENT_TIMESTAMP}

Sets a default value for a column. You can specify a constant or use the CURRENT_TIMESTAMP function. Other functions and variable expressions are not supported.

If you do not specify a default value, the column's default value is NULL.

AUTO_INCREMENT

Defines an auto-increment column. The data type of an auto-increment column must be BIGINT.

AnalyticDB for MySQL assigns a unique value to an auto-increment column. However, the values are not guaranteed to be sequential and may not start from 1.

Important
  • When you insert data into a table that contains an auto-increment column, explicitly specify the column names in the statement. For example: INSERT INTO table (column1,column2) VALUES (value1,value2). This practice prevents errors such as Insert query has mismatched column sizes caused by a mismatched column count or order.

  • In distributed systems, when writing data with an INSERT INTO SELECT statement, the auto-increment feature guarantees unique values only within a single ETL process.

Column constraints

NOT NULL

For a column defined as NOT NULL, the value cannot be NULL. If a column is not defined as NOT NULL or is defined as NULL, the value can be NULL.

PRIMARY KEY

When defined as a column constraint, a primary key applies to a single column, for example, id BIGINT NOT NULL PRIMARY KEY. To create a composite primary key from multiple columns, you must define it as a table constraint.

Table constraints (indexes)

AnalyticDB for MySQL supports various types of indexes, including regular indexes (INDEX), primary key indexes, clustered indexes, foreign key indexes, full-text indexes, and vector indexes. A table can have one or more indexes.

INDEX | KEY

Defines a regular index. INDEX and KEY are synonyms.

  • For a XUANWU_V2 table, indexes are not created on all columns by default. If the table has a primary key, a regular index is created by default only on that key.

  • For a XUANWU table, the system creates indexes on all columns by default. However, if you manually create an index on specific columns (for example, INDEX (id)) when you create a XUANWU table, AnalyticDB for MySQL will not automatically create indexes on the other columns in the table.

Note: Composite indexes are not supported. For example, the syntax INDEX (column1,column2) is not supported.

PRIMARY KEY

Defines a primary key index.

Basic usage

  • Each table can have only one primary key.

  • A primary key can consist of a single column, such as PRIMARY KEY (id), or multiple columns, such as PRIMARY KEY (id,name).

  • The primary key must include the distribution key and partition key. Place the distribution key and partition key at the beginning of the primary key definition.

Notes

  • You cannot perform DELETE and UPDATE operations on a table that does not have a primary key.

  • If you do not define a primary key, the following behaviors apply:

    • If you do not define a primary key and a distribution key, AnalyticDB for MySQL automatically adds a column named __adb_auto_id__ and uses it as both the primary key and the distribution key.

    • If you define a distribution key but not a primary key, AnalyticDB for MySQL does not automatically add a primary key.

  • After a table is created, you cannot add, remove, or modify the primary key columns.

Tuning recommendations: For better performance, use numeric columns for the primary key and minimize the number of columns in the key.

Note

Including too many columns in a primary key can cause the following issues:

  • Increased CPU and I/O resource consumption because AnalyticDB for MySQL checks for duplicate primary key values during data writes.

  • Increased disk space usage by the primary key index. You can use the Storage Analysis feature to view the disk space used by the primary key index.

  • Slower BUILD jobs.

CLUSTERED KEY

Defines a clustered index. A clustered index is configured at the partition level and determines the physical storage order of data. Within each partition, data is sorted by the clustered index key values and stored sequentially, in ascending order by default. The system stores data with the same or similar key values in the same or adjacent data blocks. For range queries or equivalence filtering, if the query condition matches the clustered index column, the storage engine can quickly read contiguous data blocks. This reduces disk I/O and improves data read speed.

Clustered index diagram

image

Scenarios

A clustered index is suitable for both range queries and equivalence filtering. Consider creating a clustered index on columns that are frequently used in range query or equivalence filtering conditions.

When query conditions fully or partially match the clustered index columns, data read efficiency improves. For example, in SaaS applications where users typically access only their own data, you can use the user ID as a clustered index key. This ensures that data for the same user is stored contiguously in the same or adjacent data blocks, which results in faster data reads.

Basic usage

  • Each table can have only one clustered index.

  • You can create a clustered index on a single column, such as CLUSTERED KEY index(id), or on multiple columns, such as CLUSTERED KEY index(id,name). When a clustered index consists of multiple columns, the system first sorts the data by the values in the first column. For rows with the same value in the first column, the data is then sorted by the values in the second column. Therefore, CLUSTERED KEY index(id,name) is different from CLUSTERED KEY index(name,id).

  • By default, a clustered index sorts data in ascending order and is suitable for ascending queries. If your queries require descending order, set the clustered index to descending order when you create the table, for example, CLUSTERED KEY index(id) DESC. If the table already exists, you can delete the existing clustered index and then create a new one in descending order.

  • Do not use columns with long values, such as strings of tens of kilobytes, as clustered index keys as this can degrade sorting performance.

FULLTEXT INDEX | FULLTEXT KEY

Defines a full-text index.

Syntax and parameters

Syntax: [FULLTEXT [INDEX|KEY] [index_name] (column_name) [index_option]] [,...]

Parameters

  • index_name: The name of the full-text index.

  • column_name: The column to index. The column must be of the VARCHAR type.

  • index_option: Optional. Specifies the analyzer and custom dictionary for the full-text index.

FOREIGN KEY

Use foreign key indexes to eliminate unnecessary joins.

Syntax and parameters

Version requirements

Your AnalyticDB for MySQL cluster must run kernel version 3.1.10 or later.

Note

To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

Syntax: [[CONSTRAINT [symbol]] FOREIGN KEY (fk_column_name) REFERENCES pk_table_name (pk_column_name)][,...]

Parameters

  • symbol: Optional. The name of the foreign key constraint, which must be unique within the table. If you do not specify a name, the parser automatically appends the _fk suffix to the foreign key column name and uses the result as the constraint name.

  • fk_column_name: The name of the foreign key column. This column must be defined in the table creation statement.

  • pk_table_name: The name of the referenced primary table. This table must already exist.

  • pk_column_name: The name of a primary key column in the referenced primary table.

Basic usage

  • Each table can have multiple foreign key indexes.

  • Composite foreign key indexes are not supported. For example, the syntax FOREIGN KEY (sr_item_sk, sr_ticket_number) REFERENCES store_sales(ss_item_sk,d_date_sk) is not supported.

  • AnalyticDB for MySQL does not perform data constraint checks. You must ensure the data constraint between the primary key of the parent table and the foreign key of the child table.

  • You cannot create foreign key constraints on external tables.

ANN INDEX

Defines a vector index.

Note: You cannot create vector indexes on XUANWU_V2 tables.

Syntax and parameters

Syntax: [ANN INDEX [index_name] (column_name,...) [index_option]] [,...]

Parameters

  • index_name: The name of the vector index.

  • column_name: The name of the vector column. Its data type must be array<float>, array<smallint>, or array<byte>, and you must specify its dimension. Example: feature array<float>(4).

  • index_option: The properties of the vector index.

    • algorithm: The algorithm for vector distance calculation. Only HNSW_PQ is supported. This algorithm is suitable for medium-sized datasets that contain millions to tens of millions of records per table and are sensitive to vector dimensions.

    • dis_function: The formula for vector distance calculation. Only SquaredL2 is supported. Formula: (x1-y1)^2+(x2-y2)^2+….

JSON INDEX

Defines a JSON index or a JSON array index.

Syntax and parameters

JSON index

Version requirements

  • For clusters running a kernel version of 3.1.5.10 or later, the system does not automatically create JSON indexes after you create a table. You must create them manually.

  • For clusters running a kernel version earlier than 3.1.5.10, the system automatically creates a JSON index on JSON columns after you create a table.

Note

To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

Syntax: [INDEX [index_name] (column_name|column_name->'$.json_path')]

Parameters

  • index_name: The name of the index.

  • column_name|column_name->'$.json_path':

    • column_name: The JSON column to index.

    • column_name->'$.json_path': The JSON column and a specific property key to index. A JSON index can target only one property key within a JSON column.

      Important
      • Only clusters with a kernel version of 3.1.6.8 or later support column_name->'$.json_path'.

        To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

      • When creating an index on a specific property key within a JSON column, if a regular index (INDEX) already exists on that column, you must delete the existing index first. Otherwise, the system reports an error.

JSON array index

Version requirements

Only clusters running a kernel version 3.1.10.6 or later support JSON array indexes.

Note

To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

Syntax: [INDEX [index_name] (column_name->'$[*]')]

Parameters

  • index_name: The name of the index.

  • column_name->'$[*]': The column on which to create the JSON array index. For example, vj->'$[*]' creates a JSON array index on the vj column.

Table attribute (distribution key)

The table_attribute determines whether a table is a standard table or a replicated table.

  • DISTRIBUTED BY HASH: Specifies that the table is a standard table. Standard tables leverage a distributed system to improve query efficiency. They can store large amounts of data, typically from tens of millions to hundreds of billions of rows.

  • DISTRIBUTED BY BROADCAST: Specifies that the table is a replicated table. A replicated table stores a full copy of its data on each shard in the cluster. We recommend that each replicated table contain no more than 20,000 rows.

DISTRIBUTED BY HASH (column_name,...)

Specifies the distribution key of a table. A standard table with a distribution key is known as a partitioned table. AnalyticDB for MySQL applies a hash function to the distribution key values and distributes rows across different shards based on the resulting hash values. This approach improves scalability and query performance.

Sharding example

image

Basic usage

  • Each table can have only one distribution key.

  • A distribution key can consist of one or more columns.

  • The columns in the distribution key must be included in the primary key. For example, if the distribution key is customerid, the primary key must also include customerid.

Usage notes

  • If you do not specify a distribution key when you create a table, the system handles it as follows based on whether the table has a primary key:

    • If the table has a primary key, AnalyticDB for MySQL uses the primary key as the distribution key by default.

    • If the table does not have a primary key, AnalyticDB for MySQL automatically adds a column named __adb_auto_id__ and uses it as both the primary key and the distribution key.

  • After a table is created, you cannot add, remove, or change the distribution key columns. To modify the distribution key, create a new table and migrate the data.

Recommendations

  • Use as few columns as possible for the distribution key to make it more versatile for various complex queries.

  • Choose columns that are frequently used in query conditions and have high cardinality (evenly distributed values) as the distribution key. Good examples include transaction IDs, device IDs, user IDs, or auto-increment columns. However, avoid columns that are always filtered with a constant value. For example, if a column a is frequently queried but the condition is always a=3, using it as the distribution key will cause data skew, making it an unsuitable choice.

  • Use columns that are used for joins as the distribution key. When two tables are joined, if they share the same distribution key (the join column), data with the same key values is co-located on the same shard. This allows join operations to be performed locally without moving data between shards, reducing data redistribution and improving query performance. For example, if you need to view order history by customer, you can choose customer_id as the distribution key.

  • Avoid using date, time, or timestamp columns as the distribution key. These columns can cause data skew during writes and reduce write performance. Moreover, queries are often filtered by a specific date or time range. If a time-based column is the distribution key, all data for a query might reside on a single node, preventing the query from being processed in parallel across the cluster. We recommend that you use date and time columns as the partition key instead.

  • You can use the storage diagnostics feature to check if the distribution key is effective and to identify data skew.

DISTRIBUTED BY BROADCAST

Specifies a replicated table. A replicated table stores a full copy of its data on each shard of the cluster. We recommend that you do not store a large amount of data in a replicated table.

Advantages: During join operations, the data from a replicated table does not need to be moved across shards. In high-concurrency scenarios, this can significantly reduce network overhead and improve cluster stability.

Disadvantages: When data in a replicated table is changed (due to INSERT, UPDATE, or DELETE operations), the changes are broadcast to all shards to ensure data consistency across all copies. This impacts overall write performance, so we recommend avoiding frequent modifications to replicated tables.

Partition key and lifecycle

If a shard contains a large amount of data after setting a distribution key, you can define a partition key to divide the data on the shard into partitions. This accelerates data filtering and improves query performance.

Why define partitions?

  • Partitions accelerate data filtering and improve query performance.

    • Partition pruning. Queries scan only relevant partitions and skip irrelevant ones. This reduces the amount of data scanned and improves query speed.

    • Improved index scan performance. The efficiency of an index scan decreases if the index covers too many rows, for example, more than 50 million. Indexes are partition-level, meaning each partition has its own independent index. If a table is not partitioned, all its data resides in a single partition. When the data volume exceeds tens of millions of rows, index scan efficiency drops. Partitioning a table distributes data across partitions in different shards. This allows you to control the number of rows in each partition's index, keeping it within tens of millions to ensure optimal scan performance.

    • Improved BUILD efficiency. The BUILD operation converts real-time data into historical data. This process involves creating partitions, building indexes, and cleaning up redundant data. New indexes take effect only after the BUILD operation is complete. If a table is not partitioned, each BUILD operation processes the entire table. The larger the table, the longer the BUILD operation takes, delaying new index activation and degrading query performance. When a table is partitioned, the BUILD operation runs only on partitions with data changes, which shortens the BUILD time.

  • You can use partitions with the LIFECYCLE clause to manage the data lifecycle.

  • You can use partitions with a storage policy (storage_policy) to tier hot and cold data.

Data partitioning and lifecycle management

image

PARTITION BY

Specifies the partition key.

Syntax: PARTITION BY VALUE {(column_name)|(DATE_FORMAT(column_name, 'format'))|(FROM_UNIXTIME(column_name, 'format'))} LIFECYCLE n

Parameters:

  • column_name: The partition key. PARTITION BY VALUE(column_name) indicates that the values in the column_name column are used to partition the data. The data type of the partition key can be a numeric type, datetime type, or a string that represents a number.

  • DATE_FORMAT(column_name, 'format')|FROM_UNIXTIME(column_name, 'format'): Uses the DATE_FORMAT or FROM_UNIXTIME function to convert a datetime column to a specified date format for partitioning. The format parameter supports only year, month, and day formats, such as %Y, %y, %Y%m, %y%m, %Y%m%d, and %y%m%d. After table creation, you can modify the format using an ALTER TABLE statement.

    • Use the DATE_FORMAT function when the column is of the BIGINT, TIMESTAMP, DATETIME, or VARCHAR data type. For a BIGINT column, the value must be a millisecond-level UNIX timestamp, such as 1734278400000. For TIMESTAMP, DATETIME, and VARCHAR columns, the value must be in a format like "2024-11-26 00:01:02".

    • Use the FROM_UNIXTIME function when the column is of the INT data type. The value must be a second-level UNIX timestamp, such as 1696266000.

Usage notes

  • For clusters with a kernel version earlier than 3.2.1.0, when you use PARTITION BY to define partitions, you must also define a lifecycle (LIFECYCLE n). Otherwise, an error occurs.

  • For clusters with kernel version 3.2.1.0 or later, when you use PARTITION BY to define partitions, the lifecycle (LIFECYCLE n) is optional. If you omit this parameter, the system does not clean up partition data.

  • After a table is created, you cannot add a partition key, or add, remove, or modify columns in the partition key. To add or modify a partition key, you must create a new table and migrate the data. For more information, see ALTER TABLE.

Recommendations

  • We recommend that you use a datetime type field as the partition key.

  • Partitions that are too large or too small can degrade query and write performance, and even impact cluster stability. For recommendations on the number of rows per partition and how to validate your partitioning strategy, see Diagnose a partition table.

  • Avoid frequently updating data in historical partitions. For example, if you frequently update multiple historical partitions each day, you should re-evaluate whether your partition key is appropriate.

LIFECYCLE n

Use the LIFECYCLE clause with PARTITION BY to manage the partition lifecycle. AnalyticDB for MySQL sorts partitions in descending order based on the partition key value, retains the top n partitions, and deletes any partitions that fall outside this range. You can use LIFECYCLE to define the data retention period.

  • For kernel versions earlier than 3.2.1.1, LIFECYCLE n specifies that each shard can retain a maximum of n partitions. When managing partition lifecycles at the shard-level, the total number of retained partitions might exceed n if data is unevenly distributed or the data volume is extremely small.

  • For kernel version 3.2.1.1 and later, for tables that use the XUANWU engine: New tables created after an upgrade use table-level lifecycle management, where LIFECYCLE n specifies that each table retains a maximum of n partitions. Tables created before the upgrade continue to use shard-level lifecycle management, where LIFECYCLE n specifies that each shard retains a maximum of n partitions.

  • Tables that use the XUANWU_V2 engine continue to use shard-level lifecycle management, where LIFECYCLE n specifies that each shard retains a maximum of n partitions. The XUANWU_V2 engine does not yet support table-level lifecycle management.

Example

The clause PARTITION BY VALUE (DATE_FORMAT(date, '%Y%m%d')) LIFECYCLE 30 partitions data by converting the date column to the yyyyMMdd format and retains a maximum of 30 partitions. Assume that data from day 1 is written to partition 20231201, data from day 2 to partition 20231202, and so on, until data from day 30 is written to partition 20231230. When data for day 31 is written to partition 20231231, the oldest partition (20231201) is automatically deleted to maintain a maximum of 30 partitions.

index_all

Specifies whether to create an index on all columns.

Valid values:

  • Y: Creates an index on all columns. The default value for a XUANWU table is Y.

  • N: Creates an index on the primary key only. The default value for a XUANWU_V2 table is N.

storage_policy

For Enterprise Edition, Basic Edition, and Data Lakehouse Edition clusters and Data Warehouse Edition in elastic mode clusters (new), you can specify a storage policy. Different storage policies offer varying levels of read/write performance and storage costs.

Valid values:

  • hot (default): hot storage. This policy stores all table partitions on SSDs. This policy provides the best performance at the highest storage cost.

  • cold: cold storage. This policy stores all table partitions in Object Storage Service (OSS). Compared to hot storage, this policy offers lower performance at the lowest storage cost.

  • mixed: tiered storage. This policy balances storage costs with query performance by storing frequently accessed data (hot data) on SSDs and infrequently accessed data (cold data) in OSS. To use the mixed policy, you must define partitions with the PARTITION BY clause and specify the number of hot partitions with the hot_partition_count parameter. If no partitions are defined, the mixed policy has no effect, and the system stores all data on SSDs.

    Tiered Storage Diagram

    image

hot_partition_count

When STORAGE_POLICY='mixed' is set, use hot_partition_count=n (where n is a positive integer) to define the number of hot partitions. AnalyticDB for MySQL sorts all partitions in descending order based on their partition key values. The n largest partitions are hot partitions, and the remaining ones are cold partitions.

Note

You cannot specify hot_partition_count=n if STORAGE_POLICY is not 'mixed'. If you attempt to do so, an error is returned.

BLOCK_SIZE

A data block is the minimum I/O unit for data reads and writes. The BLOCK_SIZE parameter specifies the number of rows that each data block contains in column-oriented storage. Adjusting this value changes the number of rows read per I/O operation, and the impact on performance depends on your query patterns. For example, for a point query, a large BLOCK_SIZE can reduce read efficiency because the system must read an entire block even to retrieve only a few rows. In this case, you can improve performance by setting a smaller BLOCK_SIZE.

Default values:

  • 4096: The default value for a replicated table.

  • 8192: The default value for new cluster editions in elastic mode and for Standalone Edition with fewer than 32 compute cores.

  • 32760: The default value in all other cases. When this default is used, a SHOW CREATE TABLE statement does not display the BLOCK_SIZE setting.

Important

If you are not familiar with the principles of column-oriented storage, avoid changing the BLOCK_SIZE value.

Engine

Specifies the storage engine for internal tables in AnalyticDB for MySQL used for historical data analysis.

  • For kernel versions earlier than 3.2.2.0, the value is XUANWU. This is the default value if you do not specify ENGINE when creating a table.

    Important

    For kernel versions earlier than 3.1.9.5, if you explicitly specify ENGINE='XUANWU' when you create an internal table, you must also explicitly specify table_properties='{"format":"columnstore"}'. Otherwise, the table creation fails.

  • For kernel versions 3.2.2.0 and later, the valid values are as follows:

    • If RC_DDL_ENGINE_REWRITE_XUANWUV2 is set to true, the value must be XUANWU_V2.

    • If RC_DDL_ENGINE_REWRITE_XUANWUV2 is set to false, you can use XUANWU_V2 or XUANWU.

    You can run the SHOW ADB_CONFIG KEY=RC_DDL_ENGINE_REWRITE_XUANWUV2; command to check the value of this parameter. You can also modify the value of RC_DDL_ENGINE_REWRITE_XUANWUV2 at the cluster level or table level.

CTAS

The CREATE TABLE AS query_expr statement creates and populates a table with the result set of a SELECT query. For detailed usage, see CREATE TABLE AS SELECT (CTAS).

Examples

Automatic partitioning by date

Create a partitioned table named sales that is automatically partitioned by the date in the sale_time column.

CREATE TABLE sales (
  sale_id BIGINT NOT NULL COMMENT 'Order ID',
  customer_id VARCHAR NOT NULL COMMENT 'Customer ID',
  phone_num BIGINT NOT NULL COMMENT 'Customer phone number',
  revenue DECIMAL(15, 2) COMMENT 'Order amount',
  sale_time TIMESTAMP NOT NULL COMMENT 'Order time',
  PRIMARY KEY (sale_time,sale_id)
 )
DISTRIBUTED BY HASH(sale_id)
PARTITION BY VALUE(DATE_FORMAT(sale_time, '%Y%m%d'));                   

Partitioned table with a lifecycle

Create a partitioned table named customer. The columns login_time, customer_id, and phone_num form a composite primary key. The distribution key is customer_id, the partition key is login_time, and the lifecycle is set to 30.

Partitions are sorted in descending order by the partition key login_time (e.g., 20231202, 20231201). The system retains only the 30 partitions with the largest key values. When data is written to a 31st partition, the system automatically deletes the partition with the smallest key value.

For example, assume data is written with a login_time of 20231201 on day 1, 20231202 on day 2, and so on. When data with a login_time of 20231231 is written on day 31, the system automatically deletes the partition with the oldest data ('20231201'). This retains only data from the most recent 30 days.

CREATE TABLE customer (
  customer_id BIGINT NOT NULL COMMENT 'Customer ID',
  customer_name VARCHAR NOT NULL COMMENT 'Customer name',
  phone_num BIGINT NOT NULL COMMENT 'Phone number',
  city_name VARCHAR NOT NULL COMMENT 'City',
  sex INT NOT NULL COMMENT 'Gender',
  id_number VARCHAR NOT NULL COMMENT 'ID number',
  home_address VARCHAR NOT NULL COMMENT 'Home address',
  office_address VARCHAR NOT NULL COMMENT 'Office address',
  age INT NOT NULL COMMENT 'Age',
  login_time TIMESTAMP NOT NULL COMMENT 'Login time',
  PRIMARY KEY (login_time,customer_id,phone_num)
 )
DISTRIBUTED BY HASH(customer_id)
PARTITION BY VALUE(DATE_FORMAT(login_time, '%Y%m%d')) LIFECYCLE 30
COMMENT 'Customer information table';                   

Non-partitioned table

Default behavior: Primary key as distribution key

If you define a primary key for a table but do not specify a distribution key, AnalyticDB for MySQL uses the primary key as the distribution key by default.

CREATE TABLE orders (
  order_id BIGINT NOT NULL COMMENT 'Order ID',
  customer_id INT NOT NULL COMMENT 'Customer ID',
  order_status VARCHAR(1) NOT NULL COMMENT 'Order status',
  total_price DECIMAL(15, 2) NOT NULL COMMENT 'Order amount',
  order_date DATE NOT NULL COMMENT 'Order date',
  PRIMARY KEY(order_id,order_date)
);

Run the SHOW CREATE TABLE statement to verify the result. The output shows that the primary key columns order_id and order_date also serve as the distribution key.

SHOW CREATE TABLE orders;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                  | 
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| orders  | CREATE TABLE `orders` (                                                                                                                       |
|         | `order_id` bigint NOT NULL COMMENT 'Order ID',                                                                                                   |
|         | `customer_id` int NOT NULL COMMENT 'Customer ID',                                                                                                   |
|         | `order_status` varchar(1) NOT NULL COMMENT 'Order status',                                                                                         | 
|         | `total_price` decimal(15, 2) NOT NULL COMMENT 'Order amount',                                                                                      |
|         | `order_date` date NOT NULL COMMENT 'Order date',                                                                                                 |
|         | PRIMARY KEY (`order_id`,`order_date`)                                                                                                         |
|         | ) DISTRIBUTED BY HASH(`order_id`,`order_date`) INDEX_ALL='Y' STORAGE_POLICY='HOT' ENGINE='XUANWU' TABLE_PROPERTIES='{"format":"columnstore"}'  |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)

Default behavior: Auto-added primary and distribution keys

If you create a table without specifying a primary key or a distribution key, AnalyticDB for MySQL automatically adds a column named __adb_auto_id__ and uses it as both the primary key and the distribution key.

CREATE TABLE orders_new (
  order_id BIGINT NOT NULL COMMENT 'Order ID',
  customer_id INT NOT NULL COMMENT 'Customer ID',
  order_status VARCHAR(1) NOT NULL COMMENT 'Order status',
  total_price DECIMAL(15, 2) NOT NULL COMMENT 'Order amount',
  order_date DATE NOT NULL COMMENT 'Order date'
);

Run the SHOW CREATE TABLE statement to verify the result. The output shows that AnalyticDB for MySQL added an auto-increment column named __adb_auto_id__ and set it as both the primary key and distribution key.

SHOW CREATE TABLE orders_new;
+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Table       | Create Table                                                                                                                                  | 
+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| orders_new  | CREATE TABLE `orders_new` (                                                                                                                   |
|             | `__adb_auto_id__` bigint AUTO_INCREMENT,                                                                                                      |
|             | `order_id` bigint NOT NULL COMMENT 'Order ID',                                                                                                   |
|             | `customer_id` int NOT NULL COMMENT 'Customer ID',                                                                                                   |
|             | `order_status` varchar(1) NOT NULL COMMENT 'Order status',                                                                                         | 
|             | `total_price` decimal(15, 2) NOT NULL COMMENT 'Order amount',                                                                                      |
|             | `order_date` date NOT NULL COMMENT 'Order date',                                                                                                 |
|             | PRIMARY KEY (`__adb_auto_id__`)                                                                                                               |
|             | ) DISTRIBUTED BY HASH(`__adb_auto_id__`) INDEX_ALL='Y' STORAGE_POLICY='HOT' ENGINE='XUANWU' TABLE_PROPERTIES='{"format":"columnstore"}'        |
+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)

Specify primary and distribution keys

Create a table named supplier. The column supplier_id is an auto-increment column. The distribution key is supplier_id, and data is sharded by hashing the supplier_id values.

CREATE TABLE supplier (
  supplier_id BIGINT AUTO_INCREMENT PRIMARY KEY,
  supplier_name VARCHAR,
  address INT,
  phone VARCHAR
) 
DISTRIBUTED BY HASH(supplier_id);

Tiered storage policies

Cold storage policy

CREATE TABLE item (
  order_id BIGINT NOT NULL,
  item_id INT NOT NULL,
  quantity DECIMAL(15, 2) NOT NULL,
  discount DECIMAL(15, 2) NOT NULL,
  shipdate DATE NOT NULL,
  PRIMARY KEY (order_id,item_id,shipdate)
) 
DISTRIBUTED BY HASH(item_id) 
PARTITION BY VALUE(date_format(shipdate, '%Y%m')) LIFECYCLE 200 
STORAGE_POLICY='COLD';

Hot storage policy

CREATE TABLE item (
  order_id BIGINT NOT NULL,
  item_id INT NOT NULL,
  quantity DECIMAL(15, 2) NOT NULL,
  discount DECIMAL(15, 2) NOT NULL,
  shipdate DATE NOT NULL,
  PRIMARY KEY (order_id,item_id,shipdate)
) 
DISTRIBUTED BY HASH(item_id) 
PARTITION BY VALUE(date_format(shipdate, '%Y%m')) LIFECYCLE 200 
STORAGE_POLICY='HOT';

Mixed policy with 16 hot partitions

CREATE TABLE item (
  order_id BIGINT NOT NULL,
  item_id INT NOT NULL,
  quantity DECIMAL(15, 2) NOT NULL,
  discount DECIMAL(15, 2) NOT NULL,
  shipdate DATE NOT NULL,
  PRIMARY KEY (order_id,item_id,shipdate)
) 
DISTRIBUTED BY HASH(item_id) 
PARTITION BY VALUE(date_format(shipdate, '%Y%m')) LIFECYCLE 200  
STORAGE_POLICY='MIXED' HOT_PARTITION_COUNT=16;

Create regular indexes on specific columns

This example creates regular indexes on the id and date columns.

CREATE TABLE index_tb (
  id INT,
  sales DECIMAL(15, 2),
  date DATE,
  INDEX (id),
  INDEX (date),
  PRIMARY KEY (id)
) 
DISTRIBUTED BY HASH(id);

Define a clustered index

Create a clustered index named clustered_index on the quantity column.

CREATE TABLE clustered (
  product_id INT,
  product_name VARCHAR,
  quantity INT,        
  price DECIMAL(10, 2),
  CLUSTERED KEY INDEX clustered_index(quantity)
)
DISTRIBUTED BY HASH(product_id);

Full-text index

Create a full-text index named fidx_c on the content column.

CREATE TABLE fulltext_tb (
  id INT,
  content VARCHAR,
  keyword VARCHAR,
  FULLTEXT INDEX fidx_c(content),
  PRIMARY KEY (id)
) 
DISTRIBUTED BY HASH(id);

For more information about creating and modifying a full-text index, see Create a full-text index.

For more information, see Full-text search.

Define a vector index

Define short_feature (type array<smallint>) and float_feature (type array<float>) as vector columns with a dimension of 4.

Create the vector index short_feature_index on the short_feature column and float_feature_index on the float_feature column.

CREATE TABLE fact_tb (  
  xid BIGINT NOT NULL,  
  cid BIGINT NOT NULL,  
  uid VARCHAR NOT NULL,  
  vid VARCHAR NOT NULL,  
  wid VARCHAR NOT NULL,  
  short_feature array<smallint>(4),  
  float_feature array<float>(4),  
  ann index short_feature_index(short_feature), 
  ann index float_feature_index(float_feature),  
  PRIMARY KEY (xid, cid, vid)
) 
DISTRIBUTED BY HASH(xid) PARTITION BY VALUE(cid) LIFECYCLE 4;

For details on vector indexing and vector search, see Vector search.

Define a foreign key

Create a table named store_returns. Use the FOREIGN KEY clause to associate the sr_item_sk column with the primary key customer_id of the customer table.

CREATE TABLE store_returns (
  sr_sale_id BIGINT NOT NULL PRIMARY KEY,
  sr_store_sk BIGINT,
  sr_item_sk BIGINT NOT NULL,
  FOREIGN KEY (sr_item_sk) REFERENCES customer (customer_id)
);

JSON array index

Create a JSON array index named idx_vj on the vj column.

CREATE TABLE json(
  id INT,
  vj JSON,
  INDEX idx_vj(vj->'$[*]')
)
DISTRIBUTED BY HASH(id);

For details on creating and modifying JSON array indexes, see Create a JSON array index and JSON array indexes.

FAQ

Column attributes and constraints

Do values in an auto-increment column start from 1? Are they unique?

Values in an auto-increment column are not sequential and do not start from 1. However, they are guaranteed to be unique.

Distribution key, partition key, and lifecycle

What is the difference between a distribution key and a partition key?

Data is distributed across shards based on the hash of distribution key values. Within each shard, data is further divided into partitions based on partition key values. The following diagram illustrates this process.

image

Must I specify a distribution key when I create a table?

  • Specifying a distribution key is optional when you create a partitioned table. If you do not specify a distribution key, AnalyticDB for MySQL uses the primary key as the distribution key. If no primary key is specified, the system automatically creates a column named __adb_auto_id__ and uses it as both the distribution key and the primary key.

  • You do not need to specify a distribution key when creating a replicated table. However, you must specify DISTRIBUTED BY BROADCAST to indicate that the system stores a full copy of the data on each storage node.

Does changing the cluster specifications alter the number of shards?

No. Changing the cluster specifications does not alter the number of shards.

How do I query the partition information of a table?

To query the partition information of a table, run the following SQL statement:

SELECT partition_id, -- The name of the partition.
 row_count, -- The total number of rows in the partition.
 local_data_size, -- The size of the local storage occupied by the partition.
 index_size, -- The size of the index in the partition.
 pk_size, -- The size of the primary key index in the partition.
 remote_data_size -- The size of the remote storage occupied by the partition.
FROM information_schema.kepler_partitions
WHERE schema_name = '$DB'
 AND table_name ='$TABLE' 
 AND partition_id > 0;

Why can't I find partition information after I create a partitioned table?

This issue occurs for the following reasons:

  • When you create a table, you only define a partitioning rule by specifying a partition key; the system does not create partitions immediately. The system creates partitions based on the values of the partition key. If you have not written data to the table, the system creates no partitions because the partition key has no values.

  • Partition building is not a real-time process. You can view partition information only after the BUILD job for the written data is complete.

Solution:

Write data to the table and wait for the BUILD job to complete. You can then view the partition information.

How do I query data in a specific partition?

You can query data from a specific partition by using the filter condition WHERE <partition key name> = '<partition key value>'. A similar usage, such as SELECT * FROM table PARTITION(202304), is not supported.

The following example shows how to query data in a specific partition.

A partitioned table named orders_demo is created and partitioned by the order_date column. The following is a sample CREATE TABLE statement:

CREATE TABLE orders_demo (
  order_id BIGINT NOT NULL COMMENT 'Order ID',
  customer_id INT NOT NULL COMMENT 'Customer ID',
  order_status VARCHAR(1) NOT NULL COMMENT 'Order status',
  total_price DECIMAL(15, 2) NOT NULL COMMENT 'Order amount',
  order_date DATE NOT NULL COMMENT 'Order date',
  PRIMARY KEY(order_id,order_date)
)
DISTRIBUTED BY HASH(order_id) 
PARTITION BY VALUE(date_format(order_date, '%Y%m')) LIFECYCLE 30 ;

The following is sample code to insert 10 rows of data into the table:

INSERT INTO orders_demo (order_id, customer_id, order_status, total_price, order_date)
VALUES
  (1001, 1, 'C', 150.75, '2023-10-01'),
  (1002, 2, 'P', 200.50, '2023-10-01'),
  (1003, 3, 'S', 99.99, '2023-10-01'),
  (1004, 4, 'C', 300.00, '2023-10-01'),
  (1005, 5, 'P', 450.25, '2023-10-02'),
  (1006, 6, 'S', 120.00, '2023-10-02'),
  (1007, 7, 'C', 80.50, '2023-10-03'),
  (1008, 8, 'P', 600.00, '2023-10-03'),
  (1009, 9, 'S', 250.75, '2023-10-03'),
  (1010, 10, 'C', 199.99, '2023-10-14');

The following is a sample statement to manually build the partitioned table to create the partitions:

BUILD TABLE orders_demo;
A BUILD job is automatically triggered when a table meets certain conditions. In this example, we manually run the BUILD job to facilitate the subsequent steps. For more information, see automatic BUILD.

Query the status of the BUILD job. If the status field returns FINISH, the BUILD job for the orders_demo table is complete.

SELECT table_name, schema_name, status FROM INFORMATION_SCHEMA.KEPLER_META_BUILD_TASK WHERE table_name='ORDERS_DEMO';

In this example, the order_date partition key is of the DATE type. To query data in the partition for 2023-10-01, run the following SQL statement:

SELECT * FROM orders_demo WHERE order_date='2023-10-01';

If the partition key is of the DATETIME type, you must specify a time range such as WHERE order_date >= "2023-10-01 00:00:00" and order_date < "2023-10-02 00:00:00" to query data in the partition for 2023-10-01. Sample statement:

SELECT * FROM orders_demo WHERE order_date >= "2023-10-01 00:00:00" and order_date < "2023-10-02 00:00:00";

When I query a partitioned table, must I use the partition key as a filter condition?

No. However, using the partition key as a filter condition significantly improves query performance by allowing the system to scan only relevant partitions instead of the entire table.

What are the data type requirements for a partition key?

A partition key must be a numeric, date/time, or string type that represents a number. Using any other data type may cause data write errors.

The error message partition format function error indicates that the values written to the partition key do not meet the data type requirements.

Can I use functions other than DATE_FORMAT and FROM_UNIXTIME to define a partition key?

No. You can define a partition key only by using one of the following methods: PARTITION BY VALUE(column), PARTITION BY VALUE(DATE_FORMAT(column,'format')), or PARTITION BY VALUE(FROM_UNIXTIME(column,'format')). An error occurs if you use any other function.

Note

For more information about how to define a partition key, see partition_options (Partition key and lifecycle).

How do I view the lifecycle of a partitioned table?

Run the SHOW CREATE TABLE <table_name> statement to view the CREATE TABLE statement, which includes the lifecycle of the partitioned table.

I set the lifecycle to 30 days. Why can I still query data older than 30 days?

This issue occurs for the following reasons:

  • The partition has just expired, but the system has not deleted it yet. The system does not delete expired partition data immediately; it deletes the data only after the table's BUILD job completes.

  • For tables created in clusters that run a kernel version earlier than 3.2.1.1, LIFECYCLE defines the number of partitions to retain per shard. This issue can occur if the actual number of partitions on a shard is less than the specified LIFECYCLE value. The issue does not occur for tables created in clusters that run kernel version 3.2.1.1 or later.

    Examples:

    • Uneven data distribution: Assume that data is partitioned by date. Shard 1 contains partitions from 20231201 to 20231230, and Shard 2 contains partitions from 20231202 to 20231231. In this case, both shards have 30 partitions, which does not exceed the LIFECYCLE value of 30. Therefore, no partitions are deleted from either shard, and you can query data from 20231201 to 20231231.

    • No new data is written for an extended period: Assume that data is partitioned by date, and Shard 1 contains partitions 20231201, 20231202, 20231203, and 20231204. If no new partition data is written to the table after 20231204, Shard 1 has only four partitions. This number does not exceed the LIFECYCLE value of 30, so no partitions are deleted. You can still query data from 20231201 even after 20231231.

Is data in an expired partition deleted immediately?

No. The system does not create or delete partitions in real time. After a partition expires, the system deletes it only after the table's BUILD job completes.

Indexes

How do I query the clustered index of a table?

Run the SHOW CREATE TABLE statement to view the clustered index defined in the CREATE TABLE statement.

Does AnalyticDB for MySQL support unique indexes?

AnalyticDB for MySQL does not support the UNIQUE INDEX constraint. However, the primary key index in AnalyticDB for MySQL is a unique index and ensures that the primary key values are unique within the table.

Are composite indexes, such as INDEX(column1, column2), supported?

No. AnalyticDB for MySQL does not support composite indexes. An index can contain only a single column, such as INDEX(column1).

Column-oriented storage

What does TABLE_PROPERTIES='{"format":"columnstore"}' mean in a CREATE TABLE statement?

The clause TABLE_PROPERTIES='{"format":"columnstore"}' is a fixed value that specifies a column-oriented storage format for the data in the ENGINE. You do not need to manually specify this property when you create a table.

Does a table support using row-oriented storage for some partitions and column-oriented storage for others?

No.

Other

After a table is created, which parameters can be modified by using the ALTER TABLE statement?

You can use the ALTER TABLE statement to modify the following parameters:

  • table_name, column_name, column_type, and COMMENT

  • Add and delete columns, except for primary key columns

  • The default value of a column

  • Change NOT NULL to NULL

  • Add and delete INDEX indexes

  • The date format of a partition function

  • Lifecycle

  • Storage policy

For more information about the syntax, see ALTER TABLE.

You cannot modify other parameters after the table is created.

What is the maximum number of tables that can be created in a cluster?

The cluster edition determines the maximum number of tables that you can create in an AnalyticDB for MySQL cluster:

  • Enterprise Edition: 80000/(Number of shards/Number of reserved resource nodes/3). The value of Number of shards/Number of reserved resource nodes/3 is rounded up. You can add reserved resource nodes to increase the limit on the number of internal tables.

  • Basic Edition: 80000/(Number of shards/Number of reserved resource nodes). The value of (Number of shards/Number of reserved resource nodes) is rounded up. You can add reserved resource nodes to increase the limit on the number of internal tables.

  • Enterprise Edition, Basic Edition, Data Lakehouse Edition, and Data Warehouse Edition (Elastic Mode) clusters support a maximum of 500,000 external tables.

  • Data Lakehouse Edition: The limit on the number of internal tables is [80000/(Number of shards/Number of reserved storage resource groups)]*2. A group of reserved storage resources is 24 ACUs. For example, if a cluster has 48 ACUs of reserved storage resources, the number of groups is 2. You can scale up reserved storage resources to increase the limit on the number of internal tables.

  • Data Warehouse Edition (Elastic Mode): The limit on the number of internal tables is [80000/(Number of shards/Number of EIUs)]*2. The value of Number of shards/Number of EIUs is rounded up. EIU stands for elastic I/O unit. You can add EIUs to increase the limit on the number of internal tables.

  • Data Warehouse Edition (Reserved Mode) with 1 to 20 node groups: The limit on the number of internal tables is 80000/(Number of shards/Number of node groups). The value of Number of shards/Number of node groups is rounded up. You can add node groups to increase the limit on the number of internal tables.

Note

To query the number of shards, run the SELECT COUNT(1) FROM information_schema.kepler_meta_shards; statement. The number of shards cannot be changed.

What is the default character set of AnalyticDB for MySQL?

The default character set for AnalyticDB for MySQL is utf-8, which is equivalent to MySQL's utf8mb4 character set. Other character sets are not supported.

How do I determine whether a table is an internal or external table?

You can run the SHOW CREATE TABLE db_name.table_name; statement to view the DDL statement of the table. If the DDL statement does not contain the ENGINE parameter, or if the value of the ENGINE parameter is XUANWU or XUANWU_V2, the table is an internal table; otherwise, it is an external table.

Common errors

partition number must larger than 0

Cause: The CREATE TABLE statement defines a partition but does not specify its lifecycle.

The following CREATE TABLE statement causes this error:

CREATE TABLE test (
  id INT COMMENT '',
  name VARCHAR(10) COMMENT '',
  PRIMARY KEY (id, name)
) 
DISTRIBUTED BY HASH(id) PARTITION BY VALUE(name);

Solution: Specify the partition lifecycle in the CREATE TABLE statement. For example:

CREATE TABLE test (
  id INT COMMENT '',
  name VARCHAR(10) COMMENT '',
  PRIMARY KEY (id, name)
) 
DISTRIBUTED BY HASH(id) PARTITION BY VALUE(name) LIFECYCLE 30;
Note

This error occurs only on clusters that run a kernel version earlier than 3.2.1.0.

Only 204800 partition allowed, the number of existing partition=>196462

Cause: This error occurs when the number of partitions in an AnalyticDB for MySQL cluster exceeds the limit of 204,800.

To query the number of partitions in the cluster, run the following statement:

SELECT count(partition_id)
FROM information_schema.kepler_partitions
WHERE partition_id > 0;

Solution: Use the ALTER TABLE statement to adjust the partition granularity. For example, change partitioning from daily to monthly.

partition column 'XXX' is not found in primary index=> [YYY]

Cause: The primary key must include both the distribution key and the partition key.

Incorrect SQL statement example 1:

CREATE TABLE test (
  id INT COMMENT '',
  name VARCHAR(10) COMMENT '',
  PRIMARY KEY (id)
) 
DISTRIBUTED BY HASH(id) PARTITION BY VALUE(name) LIFECYCLE 30;

This error also occurs if you omit the primary key and distribution key. In this scenario, AnalyticDB for MySQL automatically creates a column named __adb_auto_id__ and sets it as both the primary key and distribution key. The operation fails because the resulting primary key, which is only __adb_auto_id__, does not include the partition key.

Incorrect SQL statement example 2:

CREATE TABLE test (
  id INT COMMENT '',
  name VARCHAR(10) COMMENT ''
) 
PARTITION BY VALUE(name) LIFECYCLE 30;

Solution: Add the partition key to the PRIMARY KEY definition.

SemanticException:only 5000 table allowed

Cause: This error occurs when the total number of tables in an AnalyticDB for MySQL cluster, including those in use and in the table recycle bin, exceeds the limit. The table limit varies based on the product series and specifications. For more information, see Table count limit.

Solution:

unsigned expr not supported

Cause: AnalyticDB for MySQL does not support unsigned numbers, so the UNSIGNED attribute is not allowed.

Solution: Remove the UNSIGNED attribute from your CREATE TABLE statement. You must implement non-negative value constraints in your application code.

Related topics