AnalyticDB for MySQL allows you to execute the CREATE TABLE
statement to create a table or execute the CREATE TABLE AS SELECT (CTAS)
statement to write queried data to a new table.
Syntax
CREATE TABLE [IF NOT EXISTS] table_name
({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
| table_constraints}
[, ... ] )
table_attribute
[partition_options]
[storage_policy]
[block_size]
[AS] query_expression
COMMENT 'string'
column_attributes:
[DEFAULT default_expr]
[AUTO_INCREMENT]
column_constraints:
[{NOT NULL|NULL} ]
[PRIMARY KEY]
table_constraints:
[{INDEX|KEY} [index_name] (column_name,...)]
[PRIMARY KEY [index_name] (column_name,...)]
[CLUSTERED KEY [index_name] (column_name,...)]
table_attribute:
DISTRIBUTED BY HASH(column_name,...) | DISTRIBUTED BY BROADCAST
partition_options:
PARTITION BY
{VALUE(column_name) | VALUE(date_format(column_name, ?))}
LIFECYCLE N
storage_policy:
STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]
block_size:
BLOCK_SIZE= VALUE
Parameters
Parameter | Description |
---|---|
table_name |
The name of the table.
The table name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The table name must start with a letter or underscore (_). Specify the table name in the |
column_name |
The name of the column.
The column name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The column name must start with a letter or underscore (_). |
column_type |
The data type of the column to be added.
For more information about the data types supported by AnalyticDB for MySQL, see Basic data types. |
column_attributes |
|
column_constraints |
|
table_constraints |
Note Unique indexes cannot be created in AnalyticDB for MySQL.
|
table_attribute |
|
partition_options |
PARTITION BY : specifies the partition key in a fact table.
For example, AnalyticDB for MySQL does not allow you to change partition keys. If you want to use a different partition key for your table, perform the operations described in the "Modify a partition or distribution key" section of the ALTER TABLE topic. Note
|
storage_policy |
Note Only clusters in elastic mode for Cluster Edition support tiered storage of hot and
cold data.
Different storage policies bring about different data read and write performance and storage costs. To reduce data storage costs and ensure query performance, you can store data with a high query frequency (called hot data) on SSDs and data with a low query frequency (called cold data) on HDDs. You can also separately select cold and hot storage media based on table granularity and list partition granularity. For example, you can specify that all data of the table is stored on SSDs, all data is stored on HDDs, or data of part of list partitions in the table is stored on SSDs and data of the rest list partitions is stored on HDDs.
|
hot_partition_count=N | Specifies the number of partitions whose data is stored on SSDs when the MIXED storage
policy is used. These partitions are called hot partitions. Partitions are sorted
by partition key value in descending order. The first N partitions are hot partitions
and the others are cold partitions.
|
block_size | Specifies the number of values stored in each block in column-oriented storage. It
is the minimum unit for I/O operations.
This parameter determines the size of values read or written in each I/O operation and affects query performance based on query characteristics. For example, if block_size is set to a large value for point queries, blocks are read inefficiently by the storage system. In this case, you can appropriately decrease the value of block_size. Important If you are not familiar with column-oriented storage, we recommend that you do not
change the value of block_size.
|
Precautions
- AnalyticDB for MySQL clusters use the UTF-8 encoding format during table creation. This encoding format is equivalent to the utf8mb4 format in MySQL. AnalyticDB for MySQL does not support other encoding formats.
- The maximum number of tables that can be created in a cluster varies with the following
AnalyticDB for MySQL editions:
- Cluster Edition: MIN(256 × Number of node groups, 10000)
- Basic Edition:
- T8: 500
- T16 and T32: 1,500
- T52: 2,500
Examples
- Create a fact table named test. Set
id
as an auto-increment column. Set the distribution key toid
. Configure the table data to be distributed based on the hash value of theid
column.CREATE TABLE test ( id bigint auto_increment, name varchar, value int, ts timestamp ) DISTRIBUTED BY HASH(id);
- Create a fact table named customer. Set the distribution key to
customer_id
and the partition key tologin_time
. Configure the table data to be sorted bylogin_time
and retained for 30 days. Set the composite primary key tologin_time
,customer_id
, andphone_num
.CREATE TABLE customer ( customer_id bigint NOT NULL COMMENT 'Customer ID', customer_name varchar NOT NULL COMMENT 'Customer name', phone_num bigint NOT NULL COMMENT 'Phone number', city_name varchar NOT NULL COMMENT 'City', sex int NOT NULL COMMENT 'Gender', id_number varchar NOT NULL COMMENT 'ID card number', home_address varchar NOT NULL COMMENT 'Home address', office_address varchar NOT NULL COMMENT 'Office address', age int NOT NULL COMMENT 'Age', login_time timestamp NOT NULL COMMENT 'Logon time', PRIMARY KEY (login_time,customer_id,phone_num) ) DISTRIBUTED BY HASH(customer_id) PARTITION BY VALUE(DATE_FORMAT(login_time, '%Y%m%d')) LIFECYCLE 30 COMMENT 'Customer information table';
MySQL syntax compatibility
The standard CREATE TABLE syntax of AnalyticDB for MySQL must contain the DISTRIBUTED BY ...
clause. The standard CREATE TABLE syntax of MySQL does not contain the DISTRIBUTED BY ...
clause. By default, AnalyticDB for MySQL is compatible with the CREATE TABLE syntax of MySQL. You can use one of the following
methods to resolve the compatibility issues caused by the DISTRIBUTED BY ...
clause:
- By default, if a MySQL table contains a primary key, AnalyticDB for MySQL uses the primary key as the COLUMN value of the
DISTRIBUTED BY COLUMN
clause.CREATE TABLE t (c1 bigint, c2 int, c3 varchar, PRIMARY KEY(c1,c2));
SHOW CREATE TABLE t;
+-------+---------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------+ | t | Create Table `t` ( `c1` bigint, `c2` int, `c3` varchar, primary key (c1,c2) ) DISTRIBUTED BY HASH(`c1`,`c2`) INDEX_ALL='Y' | +-------+----------------------------------------------------------------------------+ 1 row in set (0.04 sec)
- If a MySQL table does not contain a primary key, AnalyticDB for MySQL adds the
__adb_auto_id__
column as the primary key and uses it as the COLUMN value of theDISTRIBUTED BY COLUMN
clause.CREATE TABLE t (c1 bigint, c2 int, c3 varchar);
SHOW CREATE TABLE t;
+-------+----------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+----------------------------------------------------------------------------------------------+ | t | Create Table `t` ( `c1` bigint, `c2` int, `c3` varchar, `__adb_auto_id__` bigint AUTO_INCREMENT, primary key (__adb_auto_id__) ) DISTRIBUTED BY HASH(`__adb_auto_id__`) INDEX_ALL='Y' | +-------+----------------------------------------------------------------------------------------------+ 1 row in set (0.04 sec)
Specify a storage policy of hot and cold data when you create a table
Syntax
CREATE TABLE [IF NOT EXISTS] table_name
({column_name column_type [column_attributes] [ column_constraints ] [COMMENT 'string']
| table_constraints}
[, ... ] )
table_attribute
[partition_options]
[storage_policy]
[AS] query_expression
COMMENT 'string'
storage_policy:
STORAGE_POLICY= 'HOT'|'COLD'|'MIXED' [hot_partition_count=N]
Examples
- Specify the COLD storage policy when you create a table
CREATE TABLE test_table ( L_ORDERKEY bigint NOT NULL, L_LINENUMBER int NOT NULL, L_SHIPDATE date NOT NULL, dummy varchar, primary key (l_orderkey,l_linenumber,l_shipdate) ) DISTRIBUTE BY HASH(l_orderkey) PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' STORAGE_POLICY='COLD';
- Specify the HOT storage policy when you create a table
CREATE TABLE test_table ( L_ORDERKEY bigint NOT NULL, L_LINENUMBER int NOT NULL, L_SHIPDATE date NOT NULL, dummy varchar, primary key (l_orderkey,l_linenumber,l_shipdate) ) DISTRIBUTE BY HASH(l_orderkey) PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' STORAGE_POLICY='HOT';
- Specify the MIXED storage policy and set the number of hot partitions to 16 when you
create a table
CREATE TABLE test_table ( L_ORDERKEY bigint NOT NULL, L_LINENUMBER int NOT NULL, L_SHIPDATE date NOT NULL, dummy varchar, primary key (l_orderkey,l_linenumber,l_shipdate) ) DISTRIBUTE BY HASH(l_orderkey) PARTITION BY VALUE(date_format(l_shipdate, '%Y%m')) LIFECYCLE 200 INDEX_ALL='Y' STORAGE_POLICY='MIXED' HOT_PARTITION_COUNT=16;
Create a table that has a vector index
Syntax
ann index [index_name] (col_name,...)] [algorithm=HNSW_PQ ] [dis_function=SquaredL2]
Examples
CREATE TABLE fact_tb (
xid bigint not null,
cid bigint not null,
uid varchar not null,
vid varchar not null,
wid varchar not null,
short_feature array < smallint >(4),
float_feature array < float >(4),
ann index short_feature_index(short_feature),
ann index float_feature_index(float_feature),
PRIMARY KEY (xid, cid, vid)
) DISTRIBUTE BY HASH(xid) PARTITION BY VALUE(cid) LIFECYCLE 4;
Parameters
short_feature/float_feature
: the name of the vector column. This parameter is customized.array<float>(4)
: the data type of the vector column and the vector dimension. This parameter is customized. You must specify the dimension of the vector. Valid values of the data type are float, byte, and short. Set the data type of the feature_data column to float and the vector dimension to 512. Example:feature_data` array<float>(512)
ann
: the system keyword.index
: the system keyword.short_feature_index/float_feature_index
: the name of the index. This parameter is customized. Create a vector index for the feature_data column:ann index ecnn_index(`FEATURE_DATA`) algorithm=HNSW_PQ dis_function=SquaredL2
You can ignore the algorithm and dis_function parameters. The default value of the algorithm parameter is HNSW_PQ. The default value of the dis_function parameter is SquaredL2.
algorithm
: the algorithm used by the formula to calculate the vector distance. The following table describes the algorithm supported by AnalyticDB for MySQL to calculate the vector distance.Algorithm Scenario Supported data type HNSW_PQ This algorithm is suitable for medium-scale scenarios in which the data volume of a single table ranges from millions to tens of millions and is sensitive to the vector dimension. SHORT[], BYTE[], or FLOAT[] dis_function
: the function used to calculate the vector distance. Default value: SquaredL2. The following table describes the function supported by AnalyticDB for MySQL to calculate the vector distance.Function Formula Supported data type SquaredEuclidean (SquaredL2) (x1-y1)^2+(x2-y2)^2+… BYTE[], SHORT[], or FLOAT[]
Insert data to the table
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (1,2,'A','B','C','[1,1,1,1]','[1.2,1.5,2,3.0]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (2,1,'e','v','f','[2,2,2,2]','[1.5,1.15,2.2,2.7]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (0,6,'d','f','g','[3,3,3,3]','[0.2,1.6,5,3.7]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'j','b','h','[4,4,4,4]','[1.0,4.15,6,2.9]');
INSERT into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (8,5,'Sj','Hb','Dh','[5,5,5,5]','[1.3,4.5,6.9,5.2]');
insert into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'x','g','h','[3,4,4,4]','[1.0,4.15,6,2.9]');
insert into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'j','r','k','[6,6,4,4]','[1.0,4.15,6,2.9]');
insert into fact_tb (xid,cid,uid,vid,wid,short_feature,float_feature) VALUES (5,4,'s','i','q','[2,2,4,4]','[1.0,4.15,6,2.9]');
Use the vector index to query data
- Query the top three entries in the short_feature column that have the shortest distance
to the vector
'[1,1,1,1]'
.
The following information is returned:SELECT xid, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb ORDER BY 2 LIMIT 3;
+-------+-------------------------------------------------------------+ | xid | l2_distance(short_feature, '[1,1,1,1]') | +-------+-------------------------------------------------------------+ | 1 | 0.0 | +-------+-------------------------------------------------------------+ | 2 | 4.0 | +-------+-------------------------------------------------------------+ | 0 | 16.0 | +-------+-------------------------------------------------------------+ 3 row in set (0.02 sec)
- Query the top four entries in the short_feature column that have the shortest distance
to the vector
'[1,1,1,1]'
when xid is 5 and cid is 4.
The following information is returned:SELECT uid, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb WHERE xid = 5 AND cid = 4 ORDER BY 2 LIMIT 4;
+-------+-------------------------------------------------------------+ | uid | l2_distance(short_feature, '[1,1,1,1]') | +-------+-------------------------------------------------------------+ | s | 20.0 | +-------+-------------------------------------------------------------+ | x | 31.0 | +-------+-------------------------------------------------------------+ | j | 36.0 | +-------+-------------------------------------------------------------+ | j | 68.0 | +-------+-------------------------------------------------------------+ 4 row in set (0.04 sec)
- Query the top three entries in the short_feature column that have the shortest distance
to the vector
'[1,1,1,1]'
when the distance is less than or equal to 50.
The following information is returned:SELECT uid, l2_distance(short_feature, '[1,1,1,1]') FROM fact_tb WHERE l2_distance(short_feature, '[1,1,1,1]') < 50.0 AND xid = 5 ORDER BY 2 LIMIT 3;
+-------+-------------------------------------------------------------+ | uid | l2_distance(short_feature, '[1,1,1,1]') | +-------+-------------------------------------------------------------+ | s | 20.0 | +-------+-------------------------------------------------------------+ | x | 31.0 | +-------+-------------------------------------------------------------+ | j | 36.0 | +-------+-------------------------------------------------------------+ 3 row in set (0.05 sec)