How to create an AnalyticDB for MySQL internal table by using CREATE TABLE - AnalyticDB for MySQL

AnalyticDB for MySQL allows you to execute the CREATE TABLE statement to create a table or execute the CREATE TABLE AS SELECT (CTAS) statement to write queried data to a new table.

Syntax

CREATE TABLE [IF NOT EXISTS] table_name
  ({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
  | table_constraints}
   [, ... ]  )
   table_attribute
   [partition_options]
   [storage_policy]
   [block_size]
   [engine]
   [rt_engine]
   [table_properties]
[AS] query_expression
   COMMENT 'string'

column_attributes:
   [DEFAULT default_expr]
   [AUTO_INCREMENT]

column_constraints:
   [{NOT NULL|NULL} ]
   [PRIMARY KEY]

table_constraints:
   [{INDEX|KEY} [index_name] (column_name,...)]
   [PRIMARY KEY [index_name] (column_name,...)]
   [CLUSTERED KEY [index_name] (column_name,...)]
   

table_attribute:
   DISTRIBUTED BY HASH(column_name,...) | DISTRIBUTED BY BROADCAST

partition_options:
  PARTITION BY 
        {VALUE(column_name) | VALUE(date_format(column_name, ?))}
  LIFECYCLE N

storage_policy:
  STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]

block_size:
  BLOCK_SIZE= VALUE

engine:
  ENGINE= 'XUANWU'

rt_engine:
  RT_ENGINE= 'COLUMNSTORE'

table_properties:
  TABLE_PROPERTIES = '{"format":"columnstore"}'

Parameters

Parameter	description
`table_name`	The name of the table. The table name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The table name must start with a letter or underscore (_). Specify the table name in the `db_name.table_name` format to distinguish tables that share the same name across different databases.
`column_name`	The name of the column to be added to the table. The column name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The column name must start with a letter or underscore (_).
`column_type`	The data type of the column to be added. For more information about the data types supported by AnalyticDB for MySQL, see Basic data types.
`column_attributes`	`DEFAULT default_expr`: specifies the default value of the column. `DEFAULT` is an expression without variables. Example: `current_timestamp`. If no value is specified, the default value of the column is `NULL`. `AUTO_INCREMENT`: specifies whether the column is an auto-increment column. This parameter is optional. The data type of an auto-increment column must be `BIGINT` because AnalyticDB for MySQL provides unique values for an auto-increment column. However, these values are not incremented in sequence. The value of `AUTO_INCREMENT` cannot start from 1.
`column_constraints`	`NOT NULL`\|`NULL`: specifies whether the column accepts the NULL value. `NOT NULL` indicates that the column does not accept the `NULL` value. `NULL` indicates that the column accepts the `NULL` value. Default value: NULL. `PRIMARY KEY`: specifies the primary key of the column. You can specify one or more primary key columns in the `PRIMARY KEY(column_name [, ... ])` format.
`table_constraints`	Note Unique indexes cannot be created in AnalyticDB for MySQL. `INDEX\|KEY`: specifies the index. AnalyticDB for MySQL automatically creates an index for all columns of the table. If you specify an index, AnalyticDB for MySQL does not automatically create an index for all columns. `PRIMARY KEY`: specifies the primary key index. Only tables with primary keys support the DELETE and UPDATE operations. A primary key must include a distribution key and a partition key. We recommend that you put distribution and partition keys in the front section of a composite primary key. `CLUSTERED KEY`: specifies the clustered index that is used as the aggregate columns to sort data in the table. The logical order of key values in the clustered index determines the physical order of the corresponding rows in the table. For example, `CLUSTERED KEY col5_col6_cls_index(col5,col6)` specifies the `col5 col6` clustered index. `col5 col6` and `col6 col5` are different clustered indexes. A clustered index sorts one or more aggregate columns in the table to ensure that the same or proximate data is stored in the same or proximate position of a disk. When you use aggregate columns as query conditions, query results are stored in the same position of a disk. This can reduce the number of I/O operations and improve query performance. You can use the field that must be included in a query as a clustered index. For example, in software-as-a-service (SaaS) applications, you can use your user ID as the clustered index to access your own data. This ensures that data access is localized and improves data query performance. Aggregate columns have the following limits: You can create only a single clustered index for each table. A clustered index sorts the entire table, which causes low data write performance and high CPU utilization. In this case, we recommend that you do not use clustered indexes. If a table contains a clustered index, `CLUSTERED KEY` is not displayed when you execute the `SHOW CREATE TABLE` statement.
`table_attribute`	`DISTRIBUTED BY HASH(column_name,...)`: specifies the distribution key in a standard table. The data of the table is distributed based on the hash value of the column specified by `column_name`. AnalyticDB for MySQL allows you to select multiple fields as the distribution key. AnalyticDB for MySQL does not allow you to change distribution keys. If you want to use a different distribution key for your table, perform the operations described in the "Change a partition or distribution key" section of the ALTER TABLE topic. `DISTRIBUTED BY BROADCAST`: specifies a dimension table. The dimension table is stored on each node of a cluster. For performance reasons, we recommend that you do not store large volumes of data in the dimension table.
`partition_options`	`PARTITION BY`: specifies the partition key in a standard table. `LIFECYCLE N` is used to manage the lifecycle of the table. Partitions are sorted and only `N` partitions are retained. The other partitions are filtered out. You can execute the following statement to query the partition information about a table: `SELECT shard_name, table_name, partition_id, row_count FROM INFORMATION_SCHEMA.KEPLER_CSTORE_PARTITION_XIHE_MERGED WHERE table_name like upper("%<table_name>%") ORDER BY row_count DESC;` For example, `PARTITION BY VALUE(column_name)` indicates that the table is partitioned based on the column specified by `column_name`. `PARTITION BY VALUE(DATE_FORMAT(column_name, '%Y%m%d'))` indicates that the table is partitioned based on the column specified by `column_name` after the column is formatted to a date format such as `20190101`. `LIFECYCLE 365` indicates that a maximum of 365 partitions can be retained on each node. Only data from the last 365 days is stored. When you write data on the 366th day, data from the first day is deleted. Note The number of nodes in an AnalyticDB for MySQL cluster varies with the following editions and modes: AnalyticDB for MySQL Data Warehouse Edition (V3.0) in reserved mode for Cluster Edition: The number of nodes is equivalent to the number of node groups that is displayed on the Clusters page. AnalyticDB for MySQL Data Warehouse Edition (V3.0) in elastic mode for Cluster Edition: The number of nodes is equivalent to the number of elastic I/O units (EIUs). AnalyticDB for MySQL Data Lakehouse Edition (V3.0): The number of nodes is equivalent to the amount of reserved storage resources divided by 24 AnalyticDB compute units (ACUs). AnalyticDB for MySQL does not allow you to change partition keys. If you want to use a different partition key for your table, perform the operations that are described in the "Change a partition or distribution key" section of the ALTER TABLE topic. Note Partitions are based on a shard, instead of on the entire table. If data is unevenly distributed, more than N partitions may be retained for the entire table. Partitions are not cleared in real time, but are cleared by asynchronous tasks in the background. When you use `PARTITION BY` to specify partitions, you must specify `LIFECYCLE N`. Otherwise, an error is returned. If you do not specify partitions, data is not cleared. If you do not specify partitions, data is not cleared.
storage_policy	Note Tiered storage of hot and cold data is supported only for clusters in elastic mode for Cluster Edition. `STORAGE_POLICY`: specifies the storage policy of the table. Valid values: HOT, COLD, and MIXED. Default value: HOT. Different storage policies bring about different data read and write performance and storage costs. To reduce data storage costs and ensure query performance, you can store the data that is frequently queried (called hot data) on SSDs and the data that is less frequently queried (called cold data) on Object Storage Service (OSS). You can also separately select cold and hot storage media based on table granularity and partition granularity. For example, you can specify that all data of the table is stored on SSDs, all data is stored on OSS, or data of part of partitions in the table is stored on SSDs and data of the rest partitions is stored on OSS. HOT, COLD, and MIXED are case-insensitive. HOT: Data of all partitions is stored on SSDs. COLD: Data of all partitions is stored on OSS. MIXED: Data of part of partitions is stored on SSDs and data of the rest partitions is stored on OSS. You must use the `hot_partition_count` parameter to specify the number of partitions whose data is stored on SSDs.
hot_partition_count=N	The number of partitions that are stored on SSDs when the MIXED storage policy is used. These partitions are called hot partitions. Partitions are sorted by partition key value in descending order. The first N partitions are hot partitions and the others are cold partitions. N is a non-zero positive integer. When you select the MIXED storage policy, you must also specify the number of hot partitions. The `hot_partition_count=N` parameter is not allowed for other storage policy types.
block_size	Specifies the number of values stored in each block in column-oriented storage. It is the minimum unit for I/O operations. Default value for a cluster in elastic mode for Cluster Edition that has fewer than 32 cores: 8192. Default value for a dimension table: 4096. Default value for other cases: 32760. If the default value of block_size is 32760, block_size is not displayed when you execute the `SHOW CREATE TABLE <table_name>;` statement. This parameter determines the size of values read or written in each I/O operation and affects query performance based on query characteristics. For example, if block_size is set to a large value for point queries, blocks are read inefficiently by the storage system. In this case, you can appropriately decrease the value of block_size. Important If you are not familiar with column-oriented storage, we recommend that you do not change the value of block_size.
engine	The storage engine type of AnalyticDB for MySQL internal tables. Set the value to `XUANWU`. For more information, see XUANWU analytical storage engine.
rt_engine	The storage model of the real-time storage engine. Set the value to COLUMNSTORE. In this case, the RT_ENGINE information is not returned for `SHOW CREATE TABLE` statements.
table_properties	The storage model of AnalyticDB for MySQL internal tables for full data. Set the value to `{"format":"columnstore"}`. Note For AnalyticDB for MySQL clusters earlier than V3.1.9.5, if you set the engine parameter to `XUANWU`, you must set the table_properties parameter to `"format":"columnstore`.

Usage notes

AnalyticDB for MySQL clusters use the UTF-8 encoding format during table creation. This encoding format is equivalent to the utf8mb4 format in MySQL. AnalyticDB for MySQL does not support other encoding formats.
AnalyticDB for MySQL does not support unsigned constraints.
The maximum number of tables that can be created in a cluster varies with the following AnalyticDB for MySQL modes:
- Reserved mode: 80000/(Number of shards/Number of node groups). Each reserved mode cluster may have 1 to 20 node groups.
- Internal tables in elastic mode: [80000/(Number of shards/Number of EIUs)] × 2.
- External tables in elastic mode: 500,000.
Note
- The following SQL statement can be used to query the number of shards: SELECT count(1) FROM information_schema.kepler_meta_shards;.
- In the preceding formulas, the results of Number of shards/Number of node groups and Number of shards/Number of EIUs are rounded up to the nearest integer.
- This value can be increased by adding more node groups. For more information, see Scale a Data Warehouse Edition cluster and change the cluster mode.
- This value can be increased by adding more EIUs. For more information, see Use EIUs to scale up storage resources.

Examples

Create a standard table named test. Set id as an auto-increment column. Set the distribution key to id. Configure the table data to be distributed based on the hash value of the id column.
```
CREATE TABLE test (
       id bigint auto_increment,
       name varchar,
       value int,
       ts timestamp
)
DISTRIBUTED BY HASH(id);
```

Create a standard table named customer. Set the distribution key to customer_id and the partition key to login_time. Configure the table data to be sorted by login_time and retained for 30 days. Set the composite primary key to login_time, customer_id, and phone_num.

CREATE TABLE customer (
customer_id bigint NOT NULL COMMENT 'Customer ID',
customer_name varchar NOT NULL COMMENT 'Customer name',
phone_num bigint NOT NULL COMMENT 'Phone number',
city_name varchar NOT NULL COMMENT 'City',
sex int NOT NULL COMMENT 'Gender',
id_number varchar NOT NULL COMMENT 'ID card number',
home_address varchar NOT NULL COMMENT 'Home address',
office_address varchar NOT NULL COMMENT 'Office address',
age int NOT NULL COMMENT 'Age',
login_time timestamp NOT NULL COMMENT 'Logon time',
PRIMARY KEY (login_time,customer_id,phone_num)
 )
DISTRIBUTED BY HASH(customer_id)
PARTITION BY VALUE(DATE_FORMAT(login_time, '%Y%m%d')) LIFECYCLE 30
COMMENT 'Customer information table';

MySQL syntax compatibility

The standard CREATE TABLE syntax of AnalyticDB for MySQL must contain the DISTRIBUTED BY ... clause. The standard CREATE TABLE syntax of MySQL does not contain the DISTRIBUTED BY ... clause. By default, AnalyticDB for MySQL is compatible with the CREATE TABLE syntax of MySQL. You can use one of the following methods to resolve the compatibility issues caused by the DISTRIBUTED BY ... clause:

By default, if a MySQL table contains a primary key, AnalyticDB for MySQL uses the primary key as the COLUMN value of the DISTRIBUTED BY COLUMN clause.

CREATE TABLE t (c1 bigint, c2 int, c3 varchar, PRIMARY KEY(c1,c2));

SHOW CREATE TABLE t;

+-------+---------------------------------------------------------------------------+
| Table | Create Table                                                              |
+-------+---------------------------------------------------------------------------+
| t     | Create Table `t` (
 `c1` bigint,
 `c2` int,
 `c3` varchar,
 primary key (c1,c2)
) DISTRIBUTED BY HASH(`c1`,`c2`) INDEX_ALL='Y' |
+-------+----------------------------------------------------------------------------+
1 row in set (0.04 sec)

If a MySQL table does not contain a primary key, AnalyticDB for MySQL adds the __adb_auto_id__ column as the primary key and uses it as the COLUMN value of the DISTRIBUTED BY COLUMN clause.

CREATE TABLE t (c1 bigint, c2 int, c3 varchar);

SHOW CREATE TABLE t;

+-------+----------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                 |
+-------+----------------------------------------------------------------------------------------------+
| t     | Create Table `t` (
 `c1` bigint,
 `c2` int,
 `c3` varchar,
 `__adb_auto_id__` bigint AUTO_INCREMENT,
 primary key (__adb_auto_id__)
) DISTRIBUTED BY HASH(`__adb_auto_id__`) INDEX_ALL='Y' |
+-------+----------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)

Specify a storage policy of hot and cold data when you create a table

Note

Tiered storage of hot and cold data is supported only for clusters in elastic mode for Cluster Edition.

Syntax

CREATE TABLE [IF NOT EXISTS] table_name
  ({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
  | table_constraints}
   [, ... ]  )
   table_attribute
   [partition_options]
   [storage_policy]
   [AS] query_expression
   COMMENT 'string'
   
storage_policy:
   STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]

Examples

Specify the COLD storage policy when you create a table

CREATE TABLE test_table (
 L_ORDERKEY bigint NOT NULL,
 L_LINENUMBER int NOT NULL,
 L_SHIPDATE date NOT NULL,
 dummy varchar,
 primary key (l_orderkey,l_linenumber,l_shipdate)
) DISTRIBUTE BY HASH(l_orderkey) 
PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' 
STORAGE_POLICY='COLD';

Specify the HOT storage policy when you create a table

CREATE TABLE test_table (
 L_ORDERKEY bigint NOT NULL,
 L_LINENUMBER int NOT NULL,
 L_SHIPDATE date NOT NULL,
 dummy varchar,
 primary key (l_orderkey,l_linenumber,l_shipdate)
) DISTRIBUTE BY HASH(l_orderkey) 
PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' 
STORAGE_POLICY='HOT';

Specify the MIXED storage policy and set the number of hot partitions to 16 when you create a table

CREATE TABLE test_table (
 L_ORDERKEY bigint NOT NULL,
 L_LINENUMBER int NOT NULL,
 L_SHIPDATE date NOT NULL,
 dummy varchar,
 primary key (l_orderkey,l_linenumber,l_shipdate)
) DISTRIBUTE BY HASH(l_orderkey) 
PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' 
STORAGE_POLICY='MIXED' HOT_PARTITION_COUNT=16;

Create a table that has a vector index

For more information about the syntax, parameters, and examples, see the "Create a vector index" section of the Vector search topic.