Schema change in ApsaraDB for SelectDB - ApsaraDB for SelectDB

This topic describes how to change table schemas in ApsaraDB for SelectDB and provides the usage notes. You can change table schemas in ApsaraDB for SelectDB to meet new business requirements.

Overview

You can change the schema of an existing table by performing one of the following operations:

Create or delete a column.
Change the type of a column.
Change the column order.
Create or modify a Bloom filter index.
Create or delete a bitmap index.

Terms

base table: Each new table corresponds to a base table upon creation.
rollup: A rollup table is created from a base table or another rollup table.
index: a materialized index. Both rollup and base tables are materialized indexes.
transaction: Each import job is a transaction. Each transaction is identified by a unique, incrementally assigned transaction ID.

How it works

The basic process of performing a schema change is to generate index data of a new schema from the original index data. This process includes the conversion of two types of data: the historical data and the data that is imported during the schema change.

+----------+
| Load Job |
+----+-----+
     |
     | Load job generates both origin and new index data
     |
     |      +------------------+ +---------------+
     |      | Origin Index     | | Origin Index  |
     +------> New Incoming Data| | History Data  |
     |      +------------------+ +------+--------+
     |                                  |
     |                                  | Convert history data
     |                                  |
     |      +------------------+ +------v--------+
     |      | New Index        | | New Index     |
     +------> New Incoming Data| | History Data  |
            +------------------+ +---------------+

Before the historical data is converted, ApsaraDB for SelectDB obtains the latest transaction ID and waits for all import transactions whose ID is smaller than the latest transaction ID to complete. The latest transaction ID is designated as the watershed. ApsaraDB for SelectDB ensures that all import jobs processed after the watershed transaction ID generate data for the original and new indexes at the same time. This way, ApsaraDB for SelectDB ensures that the new index contains complete data after the historical data is converted.

Create a schema change job

To change the schema of an existing table, execute the following statement. Schema change is an asynchronous operation. After a schema change job is submitted, a confirmation of submission is returned. You can then execute the SHOW ALTER statement to query the progress of the job.

Syntax

ALTER TABLE [database.]table <alter_clause>;

Note

Only one schema change job can run on a table at a time.
You can import data into and query data from a table during the schema change of the table.
When you change the schema of a table, you cannot modify the partition key columns or bucket columns of the table.
If the schema of a table contains value columns that are aggregated by using the REPLACE method, you cannot delete the key columns.
If you delete the key columns, ApsaraDB for SelectDB cannot determine the values of the value columns that are aggregated by using the REPLACE method.
All non-key columns of a table that uses the UNIQUE KEY model are aggregated by using the REPLACE method.
If you add a value column that is aggregated by using the SUM or REPLACE method to a table, the default value of the value column is not meaningful for historical data.
Because the historical data has lost detailed information, the default value does not represent the actual aggregated value.
If you change the type of a column, you must preserve all original attributes of the column except the type.
For example, to change the type of the k1 INT SUM NULL DEFAULT "1" column to BIGINT, execute the following statement:
```
ALTER TABLE tbl1 MODIFY COLUMN k1 BIGINT SUM NULL DEFAULT "1";
```
Except the new column type, all the other attributes such as the aggregation method, nullability, and default value must reflect the original information of the column.
You cannot modify the column name, aggregation method, nullability, default value, or column comment.

You can use the <alter_clause> clause to perform a schema change operation by using one of the following methods:

Insert a column into the specified position of the specified index.
```
ADD COLUMN column_name column_type [KEY | agg_type] [DEFAULT "default_Value"]
[AFTER column_name|FIRST]
[TO rollup_index_name]
[PROPERTIES ("key"="Value", ...)]
```
Note
- If you want to insert a value column into an index that uses the AGGREGATE KEY model, you must specify the agg_type parameter.
- If you want to insert a key column into an index that uses a non-aggregate model, such as the DUPLICATE KEY model, you must specify the KEY keyword.
- You cannot insert a column that already exists in a base index into a rollup index. If you need to insert a column that already exists in a base index into a rollup index, you can create a new rollup index.
Insert multiple columns into the specified index.
```
ADD COLUMN (column_name1 column_type [KEY | agg_type] DEFAULT "default_Value", ...)
[TO rollup_index_name]
[PROPERTIES ("key"="Value", ...)]
```
Note
- If you want to insert multiple value columns into an index that uses the AGGREGATE KEY model, you must specify the agg_type parameter.
- If you want to insert multiple key columns into an index that uses the AGGREGATE KEY model, you must specify the KEY keyword.
- You cannot insert columns that already exist in a base index into a rollup index. If you need to insert columns that already exist in a base index into a rollup index, you can create a new rollup index.
Delete a column from the specified index.
```
DROP COLUMN column_name[FROM rollup_index_name]
```
Note
- You cannot delete a partition key column.
- If you delete a column from a base index, the column is also deleted from the rollup index that contains the column.
Change the type of a column and specify the column position in the specified index.
```
MODIFY COLUMN column_name column_type [KEY | agg_type] [NULL | NOT NULL] [DEFAULT "default_Value"]
[AFTER column_name|FIRST]
[FROM rollup_index_name]
[PROPERTIES ("key"="Value", ...)]
```
Note
- If you want to modify a value column in an index that uses the AGGREGATE KEY model, you must specify the agg_type parameter.
- If you want to modify a key column in an index that uses a non-aggregate model, you must specify the KEY keyword.
- You can change only the type of a column. All the other attributes of the column must remain consistent with the original column attributes in the MODIFY COLUMN statement. For more information, see the Examples section of this topic.
- You cannot modify the partition key columns or bucket columns.
- The following type conversions are supported. Make sure that the conversions do not result in unacceptable precision loss.
  - Convert columns from the TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, or DOUBLE type to a numeric type that has a larger range.
  - Convert columns from the TINTINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, or DECIMAL type to the VARCHAR type.
  - Change the maximum length of VARCHAR type columns.
  - Convert columns from the VARCHAR or CHAR type to the TINTINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, or DOUBLE type.
  - Convert columns from the VARCHAR or CHAR type to the DATE type. The following date formats are supported: %Y-%m-%d, %y-%m-%d, %Y%m%d, %y%m%d, %Y/%m/%d, and %y/%m/%d.
  - Convert columns from the DATETIME type to the DATE type. Only the year, month, and day information is retained. For example, 2019-12-09 21:47:05 is converted to 2019-12-09.
  - Convert columns from the DATE type to the DATETIME type. The hours, minutes, and seconds are automatically filled with 0. For example, 2019-12-09 is converted to 2019-12-09 00:00:00.
  - Convert columns from the FLOAT type to the DOUBLE type.
  - Convert columns from the INT type to the DATE type. The conversion fails if the INT data is not in a valid format. In this case, the original column data remains unchanged.
  - Columns of all types except the DATE and DATETIME types can be converted to the STRING type. However, columns of the STRING type cannot be converted to another type.
Reorder columns for the specified index.
```
ORDER BY (column_name1, column_name2, ...)
[FROM rollup_index_name]
[PROPERTIES ("key"="Value", ...)]
```
Note
- All columns in the index must be specified.
- Value columns follow key columns.

Examples

Change the schema of the example_db.my_table table that uses a non-aggregate model: Insert a key column named new_col after the col1 column in the example_rollup_index table.
```
ALTER TABLE example_db.my_table
ADD COLUMN new_col INT KEY DEFAULT "0" AFTER col1
TO example_rollup_index;
```
Change the schema of the example_db.my_table table that uses a non-aggregate model: Insert a value column named new_col after the col1 column in the example_rollup_index table.
```
ALTER TABLE example_db.my_table   
ADD COLUMN new_col INT DEFAULT "0" AFTER col1    
TO example_rollup_index;
```

Change the schema of the example_db.my_table table that uses the AGGREGATE KEY model: Insert a key column named new_col after the col1 column in the example_rollup_index table.
```
ALTER TABLE example_db.my_table   
ADD COLUMN new_col INT DEFAULT "0" AFTER col1    
TO example_rollup_index;
```
Change the schema of the example_db.my_table table that uses the AGGREGATE KEY model: Insert a value column that is named new_col and uses the SUM aggregation method after the col1 column in the example_rollup_index table.
```
ALTER TABLE example_db.my_table   
ADD COLUMN new_col INT SUM DEFAULT "0" AFTER col1    
TO example_rollup_index;
```
Change the schema of the example_db.my_table table that uses the AGGREGATE KEY model: Insert two columns into the example_rollup_index table. One is a key column named col1 and the other is a value column that is named col2 and uses the SUM aggregation method.
```
ALTER TABLE example_db.my_table
ADD COLUMN (col1 INT DEFAULT "1", col2 FLOAT SUM DEFAULT "2.3")
TO example_rollup_index;
```
Change the schema of the example_db.my_table table: Delete a column from the example_rollup_index table.
```
ALTER TABLE example_db.my_table
DROP COLUMN col2
FROM example_rollup_index;
```

Change the schema of the example_db.my_table table: Change the type of the key column named col1 in the base index to BIGINT and move the column to follow the col2 column.
```
ALTER TABLE example_db.my_table 
MODIFY COLUMN col1 BIGINT KEY DEFAULT "1" AFTER col2;
```
Note
When you modify a key column or a value column, you must declare the complete information about the column.
Change the schema of the example_db.my_table table: Modify the maximum length of the val1 column in the base index from VARCHAR(32) to VARCHAR(64).
```
ALTER TABLE example_db.my_table 
MODIFY COLUMN val1 VARCHAR(64) REPLACE DEFAULT "abc";
```
Change the schema of the example_db.my_table table: Reorder the columns in the example_rollup_index table. In this example, the original column order is k1, k2, k3, v1, v2.
```
ALTER TABLE example_db.my_table
ORDER BY (k3,k1,k2,v2,v1)
FROM example_rollup_index;
```

Change the maximum length of a field in the key column in a table that uses the DUPLICATE KEY model.

The following sample code provides an example of the original schema:

+-----------+-------+-------------+------+------+---------+-------+
| IndexName | Field | Type        | Null | Key  | Default | Extra |
+-----------+-------+-------------+------+------+---------+-------+
| tbl1      | k1    | INT         | No   | true | N/A     |       |
|           | k2    | INT         | No   | true | N/A     |       |
|           | k3    | varchar(20) | No   | true | N/A     |       |
|           | k4    | INT         | No   | false| N/A     |       |
+-----------+-------+-------------+------+------+---------+-------+

Execute the following statement to change the maximum length of the k3 column to 50:

ALTER TABLE example_tbl MODIFY COLUMN k3 varchar(50) key null COMMENT 'to 50'

The following sample code shows the new schema after the change:

+-----------+-------+-------------+------+------+---------+-------+
| IndexName | Field | Type        | Null | Key  | Default | Extra |
+-----------+-------+-------------+------+------+---------+-------+
| tbl1      | k1    | INT         | No   | true | N/A     |       |
|           | k2    | INT         | No   | true | N/A     |       |
|           | k3    | varchar(50) | No   | true | N/A     |       |
|           | k4    | INT         | No   | false| N/A     |       |
+-----------+-------+-------------+------+------+---------+-------+

Schema change is an asynchronous operation. Only one schema change job can run on a table at a time.

Modify multiple rollup tables in a schema change job.

The following sample code provides an example of the original schemas:

+-----------+-------+------+------+------+---------+-------+
| IndexName | Field | Type | Null | Key  | Default | Extra |
+-----------+-------+------+------+------+---------+-------+
| tbl1      | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
|           | k3    | INT  | No   | true | N/A     |       |
|           |       |      |      |      |         |       |
| rollup2   | k2    | INT  | No   | true | N/A     |       |
|           |       |      |      |      |         |       |
| rollup1   | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
+-----------+-------+------+------+------+---------+-------+

Execute the following statements to insert the k4 column into the rollup1 table and the k4 and k5 columns into the rollup2 table.

ALTER TABLE tbl1
ADD COLUMN k4 INT default "1" to rollup1,
ADD COLUMN k4 INT default "1" to rollup2,
ADD COLUMN k5 INT default "1" to rollup2;

The following sample code shows the new schemas after the change:

+-----------+-------+------+------+------+---------+-------+
| IndexName | Field | Type | Null | Key  | Default | Extra |
+-----------+-------+------+------+------+---------+-------+
| tbl1      | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
|           | k3    | INT  | No   | true | N/A     |       |
|           | k4    | INT  | No   | true | 1       |       |
|           | k5    | INT  | No   | true | 1       |       |
|           |       |      |      |      |         |       |
| rollup2   | k2    | INT  | No   | true | N/A     |       |
|           | k4    | INT  | No   | true | 1       |       |
|           | k5    | INT  | No   | true | 1       |       |
|           |       |      |      |      |         |       |
| rollup1   | k1    | INT  | No   | true | N/A     |       |
|           | k2    | INT  | No   | true | N/A     |       |
|           | k4    | INT  | No   | true | 1       |       |
+-----------+-------+------+------+------+---------+-------+

The k4 and k5 columns are also inserted into the base table named tbl1. This means that the columns inserted into rollup tables are automatically inserted into the base table.

Columns that already exist in the base table cannot be inserted into the rollup tables. To insert columns that already exist in the base table into a rollup table, you can create a new rollup table, and then delete the original rollup table.

Query a schema change job

You can execute the SHOW ALTER TABLE COLUMN statement to query a schema change job that is running or complete. If a schema change job involves multiple indexes, multiple rows are returned after you execute the statement. Each row corresponds to an index.

Syntax

SHOW ALTER TABLE COLUMN;

Sample code

SHOW ALTER TABLE COLUMN\G;
*************************** 1. row ***************************
        JobId: 20021
    TableName: tbl1
   CreateTime: 2019-08-05 23:03:13
   FinishTime: 2019-08-05 23:03:42
    IndexName: tbl1
      IndexId: 20022
OriginIndexId: 20017
SchemaVersion: 2:792557838
TransactionId: 10023
        State: FINISHED
          Msg: 
     Progress: NULL
      Timeout: 86400
1 row in set (0.00 sec)

The following table describes the parameters.

Parameter	Description
JobId	The unique ID of the schema change job.
TableName	The name of the base table on which the schema change is performed.
CreateTime	The time when the schema change job was created.
FinishedTime	The time when the schema change job was complete. If the schema change job is not complete, N/A is returned.
IndexName	The name of the index that is involved in the schema change.
IndexId	The unique ID of the new index.
OriginIndexId	The unique ID of the original index.
SchemaVersion	The version of the schema change, which is displayed in the M:N format. M indicates the version of the schema change and N indicates the hash value of the version. The version increments with each schema change.
State	The state of the schema change job. Valid values: PENDING: The job is waiting to be scheduled in the queue. WAITING_TXN: The job is waiting for all import jobs whose ID is smaller than the watershed transaction ID to complete. RUNNING: The historical data is being converted. FINISHED: The job is successful. CANCELLED: The job fails.
Msg	The error message that is returned if the job fails.
Progress	The progress of the schema change job. The progress is displayed only if the job is in the RUNNING state. The progress is displayed in the M/N format. N indicates the total number of replicas involved in the schema change job. M indicates the number of replicas whose historical data is converted.
Timeout	The timeout period of the schema change job. Unit: seconds.

Cancel a schema change job

You can execute the CANCEL ALTER TABLE COLUMN statement to cancel a schema change job if the job is not in the FINISHED or CANCELLED state.

Syntax

CANCEL ALTER TABLE COLUMN FROM <tbl_name>;

Parameters

Parameter	Description
tbl_name	The name of the table.

Example

Cancel the ALTER COLUMN operation to be performed on the example_db.my_table table.

CANCEL ALTER TABLE COLUMNFROM example_db.my_table;

Related configurations

Frontend configurations

alter_table_timeout_second: the default timeout period of the schema change job. Default value: 86400. Unit: seconds.

Backend configurations

alter_tablet_worker_count: the number of threads that are used to convert historical data on the backend. Default value: 3. If you want to accelerate the schema change job, you can specify a greater value for this parameter and restart the backend. However, if a larger number of threads are used to convert historical data on the backend, the I/O pressure may be increased and other operations may be affected. The threads are shared with rollup jobs.
alter_index_worker_count: the number of threads that are used to build indexes for historical data on the backend. Default value: 3. If you want to accelerate the schema change job, you can specify a greater value for this parameter and restart the backend. However, if a larger number of threads are used to build indexes for historical data on the backend, the I/O pressure may be increased and other operations may be affected.
Note
This parameter supports only inverted indexes.

FAQ

Q: What is the execution speed of schema change jobs?
A: The execution speed of schema change jobs is estimated to be about 10 MB/s based on the worst efficiency. You can specify the timeout period for a schema change job based on the execution speed.
Q: What do I do if the Table xxx is not stable. ... error is reported when I submit a schema change job?
A: You can perform a schema change operation on a table only if the data in the table is complete and in a balanced state. If some data shard replicas of the table are incomplete or if balancing operations are being performed on some replicas, the system rejects your request to submit a schema change job.
You can check whether the data shard replicas are complete by executing the following statement:
ADMIN SHOW REPLICA STATUS FROM tbl WHERE STATUS != "OK";
If results are returned, issues exist in the replicas. In most cases, the system automatically fixes these issues. You can also execute the following statement to fix this table:
ADMIN REPAIR TABLE tbl1;
Alternatively, you can execute the following statement to check whether one or more balancing operations are running:
SHOW PROC "/cluster_balance/pending_tablets";
You can wait until the balancing operations are complete, or execute the following statement to temporarily disable the balancing operations:
ADMIN SET FRONTEND CONFIG ("disable_balance" = "true");