MaterializeMySQL is used to synchronize data from a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database to an ApsaraDB for ClickHouse database. You can follow the precautions and procedure in this topic when you use MaterializeMySQL to synchronize data from one of the preceding MySQL databases. This topic describes how to synchronize data from an ApsaraDB RDS for MySQL database to an ApsaraDB for ClickHouse database.
Background information
To improve real-time data warehousing capabilities, ApsaraDB for ClickHouse provides the MaterializeMySQL database engine. MaterializeMySQL can read binary log files from a MySQL database and then replay the DDL and DML operations on an ApsaraDB for ClickHouse database that serves as a MySQL replica. This way, data can be synchronized between databases in real time.
Features
- You can synchronize data in specific tables by configuring a list of tables that you want to synchronize or a list of tables that you do not want to synchronize.
- You can modify the configuration items for data synchronization.
- You can specify whether to skip errors during data synchronization.
- You can specify whether to skip tables whose schemas are not supported by ApsaraDB for ClickHouse.
- You can specify whether to use the FINAL modifier when you query a table in a database that uses the MaterializeMySQL database engine.
- You can specify the
ORDER BY
conditions that are used to sort data in ApsaraDB for ClickHouse. - You can specify whether to use a distributed table when you synchronize data from an ApsaraDB RDS for MySQL database to an ApsaraDB for ClickHouse database that uses MaterializeMySQL.
- You can specify the maximum number of rows that can be pulled each second during full data synchronization.
Precautions
- Only ApsaraDB for ClickHouse clusters of V20.8 and later support MaterializeMySQL.
- When you use MaterializeMySQL to query a table in a database that uses MaterializeMySQL, you must use the FINAL modifier to ensure that the query result is consistent with data in the corresponding table in ApsaraDB RDS for MySQL.
- You cannot synchronize the following tables to a database that uses MaterializeMySQL: tables that do not have primary keys and tables that are created by executing the
CREATE TABLE AS SELECT
statement. - The source ApsaraDB RDS for MySQL cluster and the destination ApsaraDB for ClickHouse cluster must be deployed in the same virtual private cloud (VPC).
- The SELECT queries take effect only when the
log_bin
andgtid_mode
parameters are set to ON for the source ApsaraDB RDS for MySQL cluster.Note By default, thelog_bin
andgtid_mode
parameters are set to ON for the ApsaraDB RDS for MySQL cluster. For more information about the parameters, see View the parameters of an ApsaraDB RDS for MySQL instance. - The kernel version of the source ApsaraDB RDS for MySQL cluster must be 5.6 or later and global transaction identifiers (GTIDs) must be enabled.
- If the DDL statements that are executed on the source ApsaraDB RDS for MySQL cluster do not comply with the standard MySQL syntax, data synchronization may fail.
- To synchronize data from the source ApsaraDB RDS for MySQL cluster, you must add the CIDR block of the vSwitch that the destination ApsaraDB for ClickHouse cluster uses to the whitelist of the source ApsaraDB RDS for MySQL cluster. For more information, see Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance.
- Your account that is used to synchronize data from a MySQL database must be granted the permissions to execute the RELOAD, REPLICATION SLAVE, REPLICATION CLIENT, and SELECT PRIVILEGE statements on the MySQL database. Note You can execute the
GRANT RELOAD, REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'your-user-name';
statement to obtain the required permissions. - Do not modify table schemas or data in ApsaraDB for ClickHouse.
- The value range for the data of the
DateTime
data type in ApsaraDB for ClickHouse is[1970-01-01 00:00:00,2106-02-07 06:28:15]
. If time data in ApsaraDB RDS for MySQL is not within this range, the values of synchronized data in ApsaraDB for ClickHouse are incorrect.
Use MaterializeMySQL
Before you synchronize tables from a MySQL database to an ApsaraDB for ClickHouse database, create a database that uses MaterializeMySQL.
Syntax
When you create a database, set the database engine type to MaterializeMySQL and configure the parameters for MaterializeMySQL. You can use the following syntax to create a database that uses MaterializeMySQL:
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', 'database', 'user', 'password')
[SETTINGS...]
where SETTINGS are:
[ { include_tables | exclude_tables } ]
[ skip_error_count ]
[ skip_unsupported_tables ]
[ order_by_only_primary_key ]
[ shard_model ]
[ rate_limiter_row_count_per_second ]
[ allow_query_with_final_auto ]
Parameter | Description |
host:port | The URL and port number of the ApsaraDB RDS for MySQL database. Note If the data source is a PolarDB for MySQL database, you can use only the endpoint of the primary database, that is, the primary endpoint, to connect to the ApsaraDB for ClickHouse cluster. |
database | The name of the ApsaraDB RDS for MySQL database. |
user | The account that is used to connect to the ApsaraDB RDS for MySQL database. |
password | The password of the account that is used to connect to the ApsaraDB RDS for MySQL database. |
Parameter | Data type | Description |
include_tables | String | A list of tables that you want to synchronize. After you configure a list of tables that you want to synchronize, only the tables in the list can be synchronized. Separate table names with commas (,). Rules for fuzzy matches:
|
exclude_tables | String | A list of tables that you do not want to synchronize. After you configure a list of tables that you do not want to synchronize, the tables in the list cannot be synchronized. Separate table names with commas (,). Rules for fuzzy matches:
Note The exclude_tables and include_tables parameters cannot be used together. |
skip_error_count | Int | Specifies whether to skip errors during data synchronization. Valid values:
|
skip_unsupported_tables | Boolean | Specifies whether to skip tables that are not supported by MaterializeMySQL. Valid values:
Note The skip_unsupported_tables parameter is assigned a higher priority than the skip_error_count parameter. If the skip_unsupported_tables parameter is specified, the skip_error_count parameter does not take effect. |
order_by_only_primary_key | Boolean | Specifies the ORDER BY conditions that are used to sort data in ApsaraDB for ClickHouse. Valid values:
|
shard_model | Boolean | Specifies whether to use a distributed table when you synchronize data from an ApsaraDB RDS for MySQL database to an ApsaraDB for ClickHouse database that uses MaterializeMySQL. Valid values:
|
rate_limiter_row_count_per_second | Int | Specifies the maximum number of rows that can be retrieved each second during full data synchronization. Valid values:
|
allow_query_with_final_auto | Boolean | Specifies whether to use the FINAL modifier when you query a table in a database that uses MaterializeMySQL. Valid values:
Note This parameter is unavailable for ApsaraDB for ClickHouse V22.8 clusters. |
ALTER DATABASE <database> MODIFY SETTING
Example
CREATE DATABASE IF NOT EXISTS db_name ON CLUSTER cluster
ENGINE = MaterializeMySQL('rm-bp1h40w639t19****.mysql.rds.aliyuncs.com:3306', 'database', 'user', 'password')
SETTINGS
include_tables ='a,b,c';
Instructions
Virtual columns
Column | Data type | Description |
_version | UInt64 | The transaction counter that is used to record information about data versions. |
_sign | Int8 | The delete marker that indicates whether a row is deleted. Valid values:
|
Conversions of DDL statements
- INSERT queries on a MySQL database are converted into
INSERT with _sign=1
. - DELETE queries on a MySQL database are converted into
INSERT with _sign=-1
. - UPDATE queries on a MySQL database are converted into
INSERT with _sign=1
. When you modify the primary key, UPDATE queries on a MySQL database are converted intoINSERT with _sign=-1
.
- DDL statements that cannot be parsed by ApsaraDB for ClickHouse are ignored.
- UPDATE CASCADE and DELETE CASCADE queries are not supported by MaterializeMySQL.
SELECT queries
- You must specify the _sign=1 parameter and the FINAL modifier in a SELECT query. For example:
select * from t1 final where _sign=1;
. - By default, the following rules apply when you query data in an ApsaraDB for ClickHouse cluster of V21.8 or earlier:
- If you do not specify the _version parameter in a SELECT query, the FINAL modifier is used, and rows that contain the maximum value of _version are returned. The system returns data of the latest version.
- If you do not specify the _sign parameter in a SELECT query,
WHERE _sign=1
is used by default. This way, rows that contain_sign=1
are returned. _sign=1 indicates that a row is not deleted.
Index conversion
- ApsaraDB for ClickHouse automatically uses the primary keys and indexes of tables in an ApsaraDB RDS for MySQL database as the
ORDER BY
conditions to sort data. - Data in ApsaraDB for ClickHouse is sorted based on only one physical order specified by the
ORDER BY
clause. If you need to create a physical order, use a materialized view.
Mappings of supported data types of columns between MySQL databases and ApsaraDB for ClickHouse databases
The following table describes the mappings of supported data types of columns between MySQL databases and ApsaraDB for ClickHouse databases.
MySQL | ClickHouse |
TINY | Int8 |
SHORT | Int16 |
INT24 | Int32 |
LONG | UInt32 |
LONG | UInt64 |
FLOAT | Float32 |
DOUBLE | Float64 |
DECIMAL and NEWDECIMAL | Decimal |
DATE and NEWDATE | Date |
DATETIME and TIMESTAMP | DateTime |
DATETIME2 and TIMESTAMP2 | DateTime64 |
STRING | String |
VARCHAR and VAR_STRING | String |
BLOB | String |
BIT | UInt64 |
SET | UInt64 |
ENUM | Enum16 |
YEAR | String |
TIME | String |
GEOMETRY | String |
JSON | Not supported |
Synchronize data from an ApsaraDB RDS for MySQL database
- Create a database and a table.
- Create a source ApsaraDB RDS for MySQL database. Then, create a table in the database.
CREATE DATABASE cktest; CREATE TABLE cktest.test01 (a INT PRIMARY KEY, b INT);
- Create a destination ApsaraDB for ClickHouse database for data synchronization.
CREATE DATABASE mysql ENGINE = MaterializeMySQL('rm-bp1h40w639t19****.mysql.rds.aliyuncs.com:3306', 'cktest', 'user', 'password');
- Query tables in the destination ApsaraDB for ClickHouse database.
SHOW TABLES FROM mysql;
The following output is returned:+--------+ | name | +--------+ | test01 | +--------+
- Create a source ApsaraDB RDS for MySQL database. Then, create a table in the database.
- Insert data and query the data.
- Insert data into the table in the source ApsaraDB RDS for MySQL database.
INSERT INTO cktest.test01 VALUES (1, 10), (2, 20);
- Query the inserted data in ApsaraDB for ClickHouse.
SELECT * FROM mysql.test01;
The following output is returned:+----+----+ | a | b | +----+----+ | 1 | 10 | +----+----+ | 2 | 20 | +----+----+
- Insert data into the table in the source ApsaraDB RDS for MySQL database.
- Delete data, add a column, update data, and then query the updated data.
- Delete data, add a column, update data, and then query the updated data in the source ApsaraDB RDS for MySQL database.
DELETE FROM cktest.test01 WHERE a=1; ALTER TABLE cktest.test01 ADD COLUMN c VARCHAR(16); UPDATE cktest.test01 SET c='ok!', b=202; SELECT * FROM test01;
The following output is returned:+---+-----+-----+ | a | b | c | +---+-----+-----+ | 2 | 202 | ok! | +---+-----+-----+
- Query the inserted data in ApsaraDB for ClickHouse.
SELECT * FROM mysql.test01;
The following output is returned:+----+-----+-----+ | a | b | c | +----+-----+-----+ | 2 | 202 | ok! | +----+-----+-----+
- Delete data, add a column, update data, and then query the updated data in the source ApsaraDB RDS for MySQL database.
Query details about data synchronization
- If the ApsaraDB for ClickHouse cluster contains only one node, execute the following statement:
select * from system.materialize_mysql;
- If the ApsaraDB for ClickHouse cluster contains multiple nodes, execute the following statement:
select * from cluster(default, system, materialize_mysql);
+----------+----------------+-------------+------------------+--------------------+
| database | status | error_count | lasted_error_msg | sync_failed_tables |
+----------+----------------+-------------+------------------+--------------------+
| mysql | increment_sync | 0 | | |
+----------+----------------+-------------+------------------+--------------------+
Field | Description |
database | The destination database in the ApsaraDB for ClickHouse cluster that is used for the data synchronization. |
status | The state of the data synchronization. Valid values:
|
error_count | The number of errors during the data synchronization. |
lasted_error_msg | The details of the most recent error during the data synchronization. |
sync_failed_tables | The table or tables that failed to be synchronized. If multiple tables fail to be synchronized, the table names are separated with commas (,). |
FAQ
- Question 1: When I use a database that uses the MaterializeMySQL database engine to synchronize data from a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database, why does the
The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires
error message appear?Common cause: MySQL binary logs are expired and deleted because the MaterializeMySQL engine has stopped synchronizing your data for a long period of time.
Solution: Delete the database and re-create a destination database in your ApsaraDB for ClickHouse cluster for data synchronization.
Question 2: Why does the MaterializeMySQL database engine stop synchronizing a table when I use the engine to synchronize data from a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database? Why does the
sync_failed_tables
parameter not return an empty string after I query the system.materialize_mysql system table?Common cause: DDL queries that ApsaraDB for ClickHouse does not support are performed on a MySQL database during data synchronization.
Solution: Perform another data synchronization from the MySQL database. Perform the following steps:- Delete the table that the MaterializeMySQL database engine stops synchronizing.
drop table <table_name> on cluster default;
Note Thetable_name
parameter specifies the name of the table that the MaterializeMySQL database engine stops synchronizing. If the MaterializeMySQL database engine stops synchronizing a distributed table, you must delete the distributed table and the local table that the MaterializeMySQL database engine stops synchronizing. - Restart data synchronization.
alter database <database_name> on cluster default MODIFY SETTING skip_unsupported_tables = 1;
Note The<database_name>
parameter specifies the destination database in your ApsaraDB for ClickHouse cluster for your data synchronization.
- Delete the table that the MaterializeMySQL database engine stops synchronizing.
- Question 3: MySQL binary logs are deleted because of one of the following reasons: the binary logs expire, or the disk utilization of a self-managed MySQL instance, an ApsaraDB RDS for MySQL instance, or a PolarDB for MySQL instance reaches the threshold. As a result, data fails to be synchronized from the MySQL database to an ApsaraDB for ClickHouse database that uses MaterializeMySQL. How do I query and set a binary log retention period for a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database? Solution:
- For MySQL 8.0 or earlier, execute the following statements to query and set a binary log retention period:
showvariableslike'expire_logs_days'; setglobalexpire_logs_days=7;
- For MySQL 8.0 or later, execute the following statements to query and set a binary log retention period:
showvariableslike'%binlog_expire_logs_seconds%'; setglobalbinlog_expire_logs_seconds=604800;
- For MySQL 8.0 or earlier, execute the following statements to query and set a binary log retention period: