This topic describes how to change table schemas in ApsaraDB for SelectDB and provides the usage notes. You can change table schemas in ApsaraDB for SelectDB to meet new business requirements.
Overview
You can change the schema of an existing table by performing one of the following operations:
Create or delete a column.
Change the type of a column.
Change the column order.
Create or modify a Bloom filter index.
Create or delete a bitmap index.
Terms
base table: Each new table corresponds to a base table upon creation.
rollup: A rollup table is created from a base table or another rollup table.
index: a materialized index. Both rollup and base tables are materialized indexes.
transaction: Each import job is a transaction. Each transaction is identified by a unique, incrementally assigned transaction ID.
How it works
The basic process of performing a schema change is to generate index data of a new schema from the original index data. This process includes the conversion of two types of data: the historical data and the data that is imported during the schema change.
+----------+
| Load Job |
+----+-----+
|
| Load job generates both origin and new index data
|
| +------------------+ +---------------+
| | Origin Index | | Origin Index |
+------> New Incoming Data| | History Data |
| +------------------+ +------+--------+
| |
| | Convert history data
| |
| +------------------+ +------v--------+
| | New Index | | New Index |
+------> New Incoming Data| | History Data |
+------------------+ +---------------+Before the historical data is converted, ApsaraDB for SelectDB obtains the latest transaction ID and waits for all import transactions whose ID is smaller than the latest transaction ID to complete. The latest transaction ID is designated as the watershed. ApsaraDB for SelectDB ensures that all import jobs processed after the watershed transaction ID generate data for the original and new indexes at the same time. This way, ApsaraDB for SelectDB ensures that the new index contains complete data after the historical data is converted.
Create a schema change job
To change the schema of an existing table, execute the following statement. Schema change is an asynchronous operation. After a schema change job is submitted, a confirmation of submission is returned. You can then execute the SHOW ALTER statement to query the progress of the job.
Syntax
ALTER TABLE [database.]table <alter_clause>;Only one schema change job can run on a table at a time.
You can import data into and query data from a table during the schema change of the table.
When you change the schema of a table, you cannot modify the partition key columns or bucket columns of the table.
If the schema of a table contains value columns that are aggregated by using the REPLACE method, you cannot delete the key columns.
If you delete the key columns, ApsaraDB for SelectDB cannot determine the values of the value columns that are aggregated by using the REPLACE method.
All non-key columns of a table that uses the UNIQUE KEY model are aggregated by using the REPLACE method.
If you add a value column that is aggregated by using the SUM or REPLACE method to a table, the default value of the value column is not meaningful for historical data.
Because the historical data has lost detailed information, the default value does not represent the actual aggregated value.
If you change the type of a column, you must preserve all original attributes of the column except the type.
For example, to change the type of the
k1 INT SUM NULL DEFAULT "1"column to BIGINT, execute the following statement:ALTER TABLE tbl1 MODIFY COLUMN k1 BIGINT SUM NULL DEFAULT "1";Except the new column type, all the other attributes such as the aggregation method, nullability, and default value must reflect the original information of the column.
You cannot modify the column name, aggregation method, nullability, default value, or column comment.
You can use the <alter_clause> clause to perform a schema change operation by using one of the following methods:
Insert a column into the specified position of the specified index.
ADD COLUMN column_name column_type [KEY | agg_type] [DEFAULT "default_Value"] [AFTER column_name|FIRST] [TO rollup_index_name] [PROPERTIES ("key"="Value", ...)]NoteIf you want to insert a value column into an index that uses the AGGREGATE KEY model, you must specify the
agg_typeparameter.If you want to insert a key column into an index that uses a non-aggregate model, such as the DUPLICATE KEY model, you must specify the
KEYkeyword.You cannot insert a column that already exists in a base index into a rollup index. If you need to insert a column that already exists in a base index into a rollup index, you can create a new rollup index.
Insert multiple columns into the specified index.
ADD COLUMN (column_name1 column_type [KEY | agg_type] DEFAULT "default_Value", ...) [TO rollup_index_name] [PROPERTIES ("key"="Value", ...)]NoteIf you want to insert multiple value columns into an index that uses the AGGREGATE KEY model, you must specify the
agg_typeparameter.If you want to insert multiple key columns into an index that uses the AGGREGATE KEY model, you must specify the
KEYkeyword.You cannot insert columns that already exist in a base index into a rollup index. If you need to insert columns that already exist in a base index into a rollup index, you can create a new rollup index.
Delete a column from the specified index.
DROP COLUMN column_name[FROM rollup_index_name]NoteYou cannot delete a partition key column.
If you delete a column from a base index, the column is also deleted from the rollup index that contains the column.
Change the type of a column and specify the column position in the specified index.
MODIFY COLUMN column_name column_type [KEY | agg_type] [NULL | NOT NULL] [DEFAULT "default_Value"] [AFTER column_name|FIRST] [FROM rollup_index_name] [PROPERTIES ("key"="Value", ...)]NoteIf you want to modify a value column in an index that uses the AGGREGATE KEY model, you must specify the
agg_typeparameter.If you want to modify a key column in an index that uses a non-aggregate model, you must specify the
KEYkeyword.You can change only the type of a column. All the other attributes of the column must remain consistent with the original column attributes in the MODIFY COLUMN statement. For more information, see the Examples section of this topic.
You cannot modify the partition key columns or bucket columns.
The following type conversions are supported. Make sure that the conversions do not result in unacceptable precision loss.
Convert columns from the TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, or DOUBLE type to a numeric type that has a larger range.
Convert columns from the TINTINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, or DECIMAL type to the VARCHAR type.
Change the maximum length of VARCHAR type columns.
Convert columns from the VARCHAR or CHAR type to the TINTINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, or DOUBLE type.
Convert columns from the VARCHAR or CHAR type to the DATE type. The following date formats are supported: %Y-%m-%d, %y-%m-%d, %Y%m%d, %y%m%d, %Y/%m/%d, and %y/%m/%d.
Convert columns from the DATETIME type to the DATE type. Only the year, month, and day information is retained. For example,
2019-12-09 21:47:05is converted to2019-12-09.Convert columns from the DATE type to the DATETIME type. The hours, minutes, and seconds are automatically filled with 0. For example,
2019-12-09is converted to2019-12-09 00:00:00.Convert columns from the FLOAT type to the DOUBLE type.
Convert columns from the INT type to the DATE type. The conversion fails if the INT data is not in a valid format. In this case, the original column data remains unchanged.
Columns of all types except the DATE and DATETIME types can be converted to the STRING type. However, columns of the STRING type cannot be converted to another type.
Reorder columns for the specified index.
ORDER BY (column_name1, column_name2, ...) [FROM rollup_index_name] [PROPERTIES ("key"="Value", ...)]NoteAll columns in the index must be specified.
Value columns follow key columns.
Examples
Change the schema of the
example_db.my_tabletable that uses a non-aggregate model: Insert a key column namednew_colafter thecol1column in theexample_rollup_indextable.ALTER TABLE example_db.my_table ADD COLUMN new_col INT KEY DEFAULT "0" AFTER col1 TO example_rollup_index;Change the schema of the
example_db.my_tabletable that uses a non-aggregate model: Insert a value column namednew_colafter thecol1column in theexample_rollup_indextable.ALTER TABLE example_db.my_table ADD COLUMN new_col INT DEFAULT "0" AFTER col1 TO example_rollup_index;
Change the schema of the
example_db.my_tabletable that uses the AGGREGATE KEY model: Insert a key column namednew_colafter thecol1column in theexample_rollup_indextable.ALTER TABLE example_db.my_table ADD COLUMN new_col INT DEFAULT "0" AFTER col1 TO example_rollup_index;Change the schema of the
example_db.my_tabletable that uses the AGGREGATE KEY model: Insert a value column that is namednew_coland uses the SUM aggregation method after thecol1column in theexample_rollup_indextable.ALTER TABLE example_db.my_table ADD COLUMN new_col INT SUM DEFAULT "0" AFTER col1 TO example_rollup_index;Change the schema of the
example_db.my_tabletable that uses the AGGREGATE KEY model: Insert two columns into theexample_rollup_indextable. One is a key column namedcol1and the other is a value column that is namedcol2and uses the SUM aggregation method.ALTER TABLE example_db.my_table ADD COLUMN (col1 INT DEFAULT "1", col2 FLOAT SUM DEFAULT "2.3") TO example_rollup_index;Change the schema of the
example_db.my_tabletable: Delete a column from theexample_rollup_indextable.ALTER TABLE example_db.my_table DROP COLUMN col2 FROM example_rollup_index;
Change the schema of the
example_db.my_tabletable: Change the type of the key column namedcol1in the base index toBIGINTand move the column to follow thecol2column.ALTER TABLE example_db.my_table MODIFY COLUMN col1 BIGINT KEY DEFAULT "1" AFTER col2;NoteWhen you modify a key column or a value column, you must declare the complete information about the column.
Change the schema of the
example_db.my_tabletable: Modify the maximum length of theval1column in the base index fromVARCHAR(32)toVARCHAR(64).ALTER TABLE example_db.my_table MODIFY COLUMN val1 VARCHAR(64) REPLACE DEFAULT "abc";Change the schema of the
example_db.my_tabletable: Reorder the columns in theexample_rollup_indextable. In this example, the original column order is k1, k2, k3, v1, v2.ALTER TABLE example_db.my_table ORDER BY (k3,k1,k2,v2,v1) FROM example_rollup_index;
Change the maximum length of a field in the key column in a table that uses the DUPLICATE KEY model.
The following sample code provides an example of the original schema:
+-----------+-------+-------------+------+------+---------+-------+ | IndexName | Field | Type | Null | Key | Default | Extra | +-----------+-------+-------------+------+------+---------+-------+ | tbl1 | k1 | INT | No | true | N/A | | | | k2 | INT | No | true | N/A | | | | k3 | varchar(20) | No | true | N/A | | | | k4 | INT | No | false| N/A | | +-----------+-------+-------------+------+------+---------+-------+Execute the following statement to change the maximum length of the
k3column to 50:ALTER TABLE example_tbl MODIFY COLUMN k3 varchar(50) key null COMMENT 'to 50'The following sample code shows the new schema after the change:
+-----------+-------+-------------+------+------+---------+-------+ | IndexName | Field | Type | Null | Key | Default | Extra | +-----------+-------+-------------+------+------+---------+-------+ | tbl1 | k1 | INT | No | true | N/A | | | | k2 | INT | No | true | N/A | | | | k3 | varchar(50) | No | true | N/A | | | | k4 | INT | No | false| N/A | | +-----------+-------+-------------+------+------+---------+-------+Schema change is an asynchronous operation. Only one schema change job can run on a table at a time.
Modify multiple rollup tables in a schema change job.
The following sample code provides an example of the original schemas:
+-----------+-------+------+------+------+---------+-------+ | IndexName | Field | Type | Null | Key | Default | Extra | +-----------+-------+------+------+------+---------+-------+ | tbl1 | k1 | INT | No | true | N/A | | | | k2 | INT | No | true | N/A | | | | k3 | INT | No | true | N/A | | | | | | | | | | | rollup2 | k2 | INT | No | true | N/A | | | | | | | | | | | rollup1 | k1 | INT | No | true | N/A | | | | k2 | INT | No | true | N/A | | +-----------+-------+------+------+------+---------+-------+Execute the following statements to insert the
k4column into therollup1table and the k4 andk5columns into therollup2table.ALTER TABLE tbl1 ADD COLUMN k4 INT default "1" to rollup1, ADD COLUMN k4 INT default "1" to rollup2, ADD COLUMN k5 INT default "1" to rollup2;The following sample code shows the new schemas after the change:
+-----------+-------+------+------+------+---------+-------+ | IndexName | Field | Type | Null | Key | Default | Extra | +-----------+-------+------+------+------+---------+-------+ | tbl1 | k1 | INT | No | true | N/A | | | | k2 | INT | No | true | N/A | | | | k3 | INT | No | true | N/A | | | | k4 | INT | No | true | 1 | | | | k5 | INT | No | true | 1 | | | | | | | | | | | rollup2 | k2 | INT | No | true | N/A | | | | k4 | INT | No | true | 1 | | | | k5 | INT | No | true | 1 | | | | | | | | | | | rollup1 | k1 | INT | No | true | N/A | | | | k2 | INT | No | true | N/A | | | | k4 | INT | No | true | 1 | | +-----------+-------+------+------+------+---------+-------+The
k4 and k5columns are also inserted into the base table namedtbl1. This means that the columns inserted into rollup tables are automatically inserted into the base table.Columns that already exist in the base table cannot be inserted into the rollup tables. To insert columns that already exist in the base table into a rollup table, you can create a new rollup table, and then delete the original rollup table.
Query a schema change job
You can execute the SHOW ALTER TABLE COLUMN statement to query a schema change job that is running or complete. If a schema change job involves multiple indexes, multiple rows are returned after you execute the statement. Each row corresponds to an index.
Syntax
SHOW ALTER TABLE COLUMN;Sample code
SHOW ALTER TABLE COLUMN\G;
*************************** 1. row ***************************
JobId: 20021
TableName: tbl1
CreateTime: 2019-08-05 23:03:13
FinishTime: 2019-08-05 23:03:42
IndexName: tbl1
IndexId: 20022
OriginIndexId: 20017
SchemaVersion: 2:792557838
TransactionId: 10023
State: FINISHED
Msg:
Progress: NULL
Timeout: 86400
1 row in set (0.00 sec)The following table describes the parameters.
Parameter | Description |
JobId | The unique ID of the schema change job. |
TableName | The name of the base table on which the schema change is performed. |
CreateTime | The time when the schema change job was created. |
FinishedTime | The time when the schema change job was complete. If the schema change job is not complete, N/A is returned. |
IndexName | The name of the index that is involved in the schema change. |
IndexId | The unique ID of the new index. |
OriginIndexId | The unique ID of the original index. |
SchemaVersion | The version of the schema change, which is displayed in the M:N format. M indicates the version of the schema change and N indicates the hash value of the version. The version increments with each schema change. |
State | The state of the schema change job. Valid values:
|
Msg | The error message that is returned if the job fails. |
Progress | The progress of the schema change job. The progress is displayed only if the job is in the RUNNING state. The progress is displayed in the M/N format. N indicates the total number of replicas involved in the schema change job. M indicates the number of replicas whose historical data is converted. |
Timeout | The timeout period of the schema change job. Unit: seconds. |
Cancel a schema change job
You can execute the CANCEL ALTER TABLE COLUMN statement to cancel a schema change job if the job is not in the FINISHED or CANCELLED state.
Syntax
CANCEL ALTER TABLE COLUMN FROM <tbl_name>;Parameters
Parameter | Description |
tbl_name | The name of the table. |
Example
Cancel the ALTER COLUMN operation to be performed on the example_db.my_table table.
CANCEL ALTER TABLE COLUMNFROM example_db.my_table;Related configurations
Frontend configurations
alter_table_timeout_second: the default timeout period of the schema change job. Default value: 86400. Unit: seconds.
Backend configurations
alter_tablet_worker_count: the number of threads that are used to convert historical data on the backend. Default value: 3. If you want to accelerate the schema change job, you can specify a greater value for this parameter and restart the backend. However, if a larger number of threads are used to convert historical data on the backend, the I/O pressure may be increased and other operations may be affected. The threads are shared with rollup jobs.alter_index_worker_count: the number of threads that are used to build indexes for historical data on the backend. Default value: 3. If you want to accelerate the schema change job, you can specify a greater value for this parameter and restart the backend. However, if a larger number of threads are used to build indexes for historical data on the backend, the I/O pressure may be increased and other operations may be affected.NoteThis parameter supports only inverted indexes.
FAQ
Q: What is the execution speed of schema change jobs?
A: The execution speed of schema change jobs is estimated to be about 10 MB/s based on the worst efficiency. You can specify the timeout period for a schema change job based on the execution speed.
Q: What do I do if the
Table xxx is not stable. ...error is reported when I submit a schema change job?A: You can perform a schema change operation on a table only if the data in the table is complete and in a balanced state. If some data shard replicas of the table are incomplete or if balancing operations are being performed on some replicas, the system rejects your request to submit a schema change job.
You can check whether the data shard replicas are complete by executing the following statement:
ADMIN SHOW REPLICA STATUS FROM tbl WHERE STATUS != "OK";If results are returned, issues exist in the replicas. In most cases, the system automatically fixes these issues. You can also execute the following statement to fix this table:
ADMIN REPAIR TABLE tbl1;Alternatively, you can execute the following statement to check whether one or more balancing operations are running:
SHOW PROC "/cluster_balance/pending_tablets";You can wait until the balancing operations are complete, or execute the following statement to temporarily disable the balancing operations:
ADMIN SET FRONTEND CONFIG ("disable_balance" = "true");