cloud-native Data Warehouse AnalyticDB for MySQL Edition allows you to execute the CREATE TABLE statement to create a table or execute the CREATE TABLE AS SELECT (CTAS) statement to write the queried data to a new table.

Syntax

CREATE TABLE [IF NOT EXISTS] table_name
  ({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
  | table_constraints}
   [, ... ]  )
   table_attribute
   [partition_options]
   [storage_policy]
   [AS] query_expression
   COMMENT 'string'

column_attributes:
   [DEFAULT default_expr]
   [AUTO_INCREMENT]

column_constraints:
   [{NOT NULL|NULL} ]
   [PRIMARY KEY]

table_constraints:
   [{INDEX|KEY} [index_name] (column_name,...)]
   [PRIMARY KEY [index_name] (column_name,...)]
   [CLUSTERED KEY [index_name] (column_name,...)]

table_attribute:
   DISTRIBUTED BY HASH(column_name,...) | DISTRIBUTED BY BROADCAST

partition_options:
  PARTITION BY 
        {VALUE(column_name) | VALUE(date_format(column_name, ?))}
  LIFECYCLE N

storage_policy:
  STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]

Parameters

Parameter Description
table_name The name of a table to be created.

The table name must be 1 to 127 characters in length, and can contain letters, digits, and underscores (_). The table name must start with a letter or underscore (_).

Specify the table name in the db_name.table_name format to distinguish tables that have the same name across different databases.

column_name The name of the column.

The column name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The table name must start with a letter or underscore (_).

column_type The data type of the column to be added.

For more information about the data types supported by AnalyticDB for MySQL, see Basic data types.

column_attributes
  • DEFAULT default_expr: the default value of the column. The value of DEFAULT is an expression without variables, such as current_timestamp.

    If this parameter is not specified, the default value is NULL.

  • AUTO_INCREMENT: Optional. Specifies whether the column is an auto-increment column.

    The data type of an auto-increment column must be BIGINT because AnalyticDB for MySQL provides unique values for an auto-increment column. However, these values are not incremented in sequence.

column_constraints
  • NOT NULL|NULL: specifies whether the column accepts the NULL value. A value of NOT NULL indicates that the column does not accept the NULL value. A value of NULL indicates that the column accepts the NULL value. Default value: NULL.
  • PRIMARY KEY: the primary key of the column.

    You can define one or more primary keys in the PRIMARY KEY(column_name [, ... ]) format.

table_constraints INDEX|KEY: the inverted index.

AnalyticDB for MySQL automatically creates indexes for whole tables. You do not need to manually create an index.

PRIMARY KEY The index for the primary keys.
  • Only tables with primary keys support the DELETE and UPDATE operations.
  • The primary keys must include the partition key. We recommend that you put the partition key before the primary key combination.
CLUSTERED KEY The clustered index that defines the aggregate columns used for sorting data in the table. The logical order of the key values in the clustered index determines the physical order of the corresponding rows in the table.

For example, clustered key col5_col6_cls_index(col5,col6) specifies the col5 col6 clustered index. col5 col6 and col6 col5 are different clustered indexes.

A clustered index sorts one or more aggregate columns in the table to ensure that the same or proximate data is saved in the same or proximate position of a disk. When you use aggregate columns as query conditions, query results are saved in the same position of a disk. This can reduce the I/O operations and improve query performance.

You can use the field that must be included in a query as a clustered index. For example, in software as a service (SaaS) applications, users can only access their own data by using their user ID as the clustered index. This ensures that data access is localized and improves data query performance.

Aggregate columns have the following limits:

  • You can create only one clustered index for each table.
  • Clustered indexes sort the entire table and cause lower data write performance and high CPU utilization. Therefore, we recommend that you do not use clustered indexes.
DISTRIBUTED BY HASH(column_name,...) The distribution key of the fact table. The data of the table is distributed based on the hash value of the columns specified by column_name.

AnalyticDB for MySQL allows you to select multiple fields as the partition key.

DISTRIBUTED BY BROADCAST The dimension table. The dimension table is stored on each node of a cluster. For performance reasons, we recommend that you do not store large amounts of data in the dimension table.
partition_options The options for fact table partitions.

LIFECYCLE N is used for lifecycle management of the table. Partitions are sorted and only N partitions will be retained. The rest will be filtered out.

For example, PARTITION BY VALUE(column_name) indicates that the table is partitioned based on the column specified by column_name. PARTITION BY VALUE(DATE_FORMAT(column_name, '%Y%m%d')) indicates that the table is partitioned based on the column specified by column_name after the column is formatted to a date format such as 20190101. LIFECYCLE 365 specifies that a maximum of 365 partitions can be retained on each node. That is, only data from the last 365 days is stored. When you write data on the 366th day, data from the first day is deleted.

Note
  • Level-2 partitions are based on a shard, instead of on the entire table. If data is unevenly distributed, more than N level-2 partitions may be reserved for the entire table.
  • Level-2 partitions are not cleared in real time, but by asynchronous tasks in the background.
storage_policy
Note Only AnalyticDB for MySQL elastic clusters support the tiered storage of hot and cold data feature.

The storage policy of the table. Valid values: hot, cold, and mixed. Default value: hot.

Different storage policies bring about different data read and write performance and storage costs. To reduce data storage costs and ensure query performance, you can store data with high query frequency (called hot data) on solid-state drive (SSD) media and data with low query frequency (called cold data) on hard disk drive (HDD) media.

You can also separately select cold and hot storage media based on table granularity and level-2 partition granularity. For example, you can specify that all data of the table is stored on SSD media, or all data is stored on HDD media, or data of part of level-2 partitions in the table is stored on SSD media and data of the rest level-2 partitions is stored on HDD media.

  • Hot, cold, and mixed are case-insensitive.
  • HOT: Data of all partitions is stored on SSD media.
  • COLD: Data of all partitions is stored on HDD media.
  • MIXED: Data of part of partitions is stored on SSD media and data of the rest partitions is stored on HDD media. You must use the hot_partition_count parameter to specify the number of partitions whose data is stored on SSD media.
hot_partition_count=N The number of partitions whose data is stored on SSD media when the mixed storage policy is used. Partitions are sorted by size in descending order. A maximum of N partitions can use the hot storage policy and the rest partitions will use the cold storage policy.
  • N is a non-zero positive integer.
  • When select the mixed stroage policy, you must also specify the hot_partition_count=N value. The hot_partition_count=N parameter is not allowed for other storage polic types.

Precautions

  • AnalyticDB for MySQL clusters use the UTF-8 encoding format during table creation. This encoding format is equivalent to the utf8mb4 format in MySQL. AnalyticDB for MySQL does not support other encoding formats.
  • The maximum number of tables that can be created in a cluster varies with the following AnalyticDB for MySQL editions:
    • Cluster Edition: min (256 × Number of node groups, 10000)
    • Basic Edition:
      • T8: 500
      • T16 and T32: 1,500
      • T52: 2,500

Examples

  • Create a test table.
    CREATE TABLE test (
           id bigint auto_increment,
           name varchar,
           value int,
           ts timestamp
    )
    DISTRIBUTED BY HASH(id);

    The test table is a fact table. The id column is an auto-increment column. The distribution key is id. The data of the table is distributed based on the hash value of the id column.

  • Create a customer table.
    CREATE TABLE customer (
    customer_id bigint NOT NULL COMMENT 'Customer ID',
    customer_name varchar NOT NULL COMMENT 'Customer name',
    phone_num bigint NOT NULL COMMENT 'Phone number',
    city_name varchar NOT NULL COMMENT 'City',
    sex int NOT NULL COMMENT 'Gender',
    id_number varchar NOT NULL COMMENT 'ID card number',
    home_address varchar NOT NULL COMMENT 'Home address',
    office_address varchar NOT NULL COMMENT 'Office address',
    age int NOT NULL COMMENT 'Age',
    login_time timestamp NOT NULL COMMENT 'Logon time',
    PRIMARY KEY (login_time,customer_id,phone_num)
     )
    DISTRIBUTED BY HASH(customer_id)
    PARTITION BY VALUE(DATE_FORMAT(login_time, '%Y%m%d')) LIFECYCLE 30
    COMMENT 'Customer information table';                   

    The customer table is a fact table. In the table, the distribution key is customer_id. The partition key is login_time. login_time, customer_id, and phone_num form the composite primary key.

MySQL syntax compatibility

The standard CREATE TABLE syntax of AnalyticDB for MySQL must contain the DISTRIBUTED BY ... clause. The CREATE TABLE syntax of MySQL does not contain the DISTRIBUTED BY ... clause. By default, AnalyticDB for MySQL is compatible with the CREATE TABLE syntax of MySQL. You can use one of the following methods to resolve the compatibility issues caused by the DISTRIBUTED BY ... clause:

  • By default, if a MySQL table contains a primary key, AnalyticDB for MySQL uses the primary key as the COLUMN value of the DISTRIBUTED BY COLUMN clause.
    CREATE TABLE t (c1 bigint, c2 int, c3 varchar, primary key(c1,c2));
    Query OK, 0 rows affected (2.37 sec)
    
    SHOW CREATE TABLE t;
    +-------+-------------------------------------------------------------------------------------------------------------------------------+
    | Table | Create Table                                                                                                                  |
    +-------+-------------------------------------------------------------------------------------------------------------------------------+
    | t     | Create Table `t` (
     `c1` bigint,
     `c2` int,
     `c3` varchar,
     primary key (c1,c2)
    ) DISTRIBUTED BY HASH(`c1`,`c2`) INDEX_ALL='Y' |
    +-------+-------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.04 sec)
  • If a MySQL table does not contain a primary key, AnalyticDB for MySQL automatically adds the __adb_auto_id__ column to the table and uses this column as the primary key and the COLUMN value of the DISTRIBUTED BY COLUMN clause.
    CREATE TABLE t (c1 bigint, c2 int, c3 varchar);
    Query OK, 0 rows affected (0.50 sec)
    
    SHOW CREATE TABLE t;
    +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Table | Create Table                                                                                                                                                                              |
    +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | t     | Create Table `t` (
     `c1` bigint,
     `c2` int,
     `c3` varchar,
     `__adb_auto_id__` bigint AUTO_INCREMENT,
     primary key (__adb_auto_id__)
    ) DISTRIBUTED BY HASH(`__adb_auto_id__`) INDEX_ALL='Y' |
    +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.04 sec)

Specify storage policies of cold and hot data when you create a table

Note Only AnalyticDB for MySQL elastic clusters support the tiered storage of hot and cold data feature.

You can specify the storage policy when you create a table.

CREATE TABLE [IF NOT EXISTS] table_name
  ({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
  | table_constraints}
   [, ... ]  )
   table_attribute
   [partition_options]
   [storage_policy]
   [AS] query_expression
   COMMENT 'string'
   
storage_policy:
   STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]
Example 1: Specify the cold storage policy when you create a table
CREATE TABLE test_table (
 L_ORDERKEY bigint NOT NULL,
 L_LINENUMBER int NOT NULL,
 L_SHIPDATE date NOT NULL,
 dummy varchar,
 primary key (l_orderkey,l_linenumber,l_shipdate)
) DISTRIBUTE BY HASH(l_orderkey) 
PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' 
STORAGE_POLICY='COLD';
Example 2: Specify the hot storage policy when you create a table
CREATE TABLE test_table (
 L_ORDERKEY bigint NOT NULL,
 L_LINENUMBER int NOT NULL,
 L_SHIPDATE date NOT NULL,
 dummy varchar,
 primary key (l_orderkey,l_linenumber,l_shipdate)
) DISTRIBUTE BY HASH(l_orderkey) 
PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' 
STORAGE_POLICY='HOT';
Example 3: Specify the mixed storage policy and set the number of partitions that use the hot storage policy to 16 when you create a table
CREATE TABLE test_table (
 L_ORDERKEY bigint NOT NULL,
 L_LINENUMBER int NOT NULL,
 L_SHIPDATE date NOT NULL,
 dummy varchar,
 primary key (l_orderkey,l_linenumber,l_shipdate)
) DISTRIBUTE BY HASH(l_orderkey) 
PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' 
STORAGE_POLICY='MIXED' HOT_PARTITION_COUNT=16;

Create a table that has a vector index

You can create an ann index when you create a table. The syntax to create an ann index:
ann index [index_name] (col_name,...)] [algorithm=HNSW_PQ ] [dis_function=SquaredL2]
Example:
CREATE TABLE fact_tb (  
xid bigint not null,  
cid bigint not null,  
uid varchar not null,  
vid varchar not null,  
wid varchar not null,  
short_feature array < smallint >(32),  
float_feature array < float >(32),  
ann index short_feature_index(short_feature), 
ann index float_feature_index(float_feature),  
PRIMARY KEY (xid, cid, vid)
) DISTRIBUTE BY HASH(xid) PARTITION BY VALUE(cid) LIFECYCLE 4
Parameters:
  • short_feature/float_feature: the name of the vector column. You can customize this parameter value.
  • array<float>(32): the data type of the vector column and the dimension of the vector. You can customize this parameter value. You must specify the dimension of the vector. Valid values: float, byte, and short. Define the type of the feature_data column as 512. Example:
    feature_data` array<float>(512)
  • ann: the system keyword
  • index: the system keyword
  • short_feature_index/float_feature_index: the name of the index. You can customize this parameter value. Create a vector index on the feature_data column:
    ann index ecnn_index(`FEATURE_DATA`) algorithm=HNSW_PQ dis_function=SquaredL2

    You can ignore the algorithm and dis_function parameters. The default value of the algorithm parameter is HNSW_PQ. The default value of the dis_function parameter is SquaredL2.

  • algorithm: the algorithm used by the formula to calculate the vector distance. The following table lists the algorithm of the calculation formula of vector distance supported by AnalyticDB for MySQL.
    Algorithm Scenario Supported data type
    HNSW_PQ This algorithmis suitable for scenarios of medium-scale data volume, in which the data volume of a single table ranges from millions to tens of millions and is sensitive to the vector dimension. SHORT[], BYTE[], or FLOAT[]
  • dis_function: the calculation formula of vector distance. Default value: SquaredL2. The following table lists the calculation formula of vector distance supported by AnalyticDB for MySQL.
    Calculation formula of distance Formula Supported data type
    SquaredEuclidean (SquaredL2) (x1-y1)^2+(x2-y2)^2+… BYTE[], SHORT[], or FLOAT[]
Use the vector index to retrieve queries
  1. Query the last five records of the vector '[1,1,1,1]':
    SELECT id, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb ORDER BY 2 LIMIT 5;
  2. Query the last five records of the vector '[1,1,1,1]' by using the vector index, and sort the records by distance:
    SELECT id, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb WHERE xid = 1 AND cid = 0 ORDER BY 2 LIMIT 5;
  3. Query the last five records of the vector '[1,1,1,1]' by using the vector index, sort the records by distance, and set the maximum range of distance:
    SELECT id, l2_distance(float_feature, '[1.2,1.5,2,3.0]') FROM fact_tb WHERE l2_distance(float_feature, '[1.2,1.5,2,3.0]') < 50.0 AND xid = 2 ORDER BY 2 LIMIT 5;