All Products
Search
Document Center

:MaterializeMySQL

Last Updated:Aug 08, 2023

MaterializeMySQL is used to synchronize data from a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database to an ApsaraDB for ClickHouse database. You can follow the precautions and procedure in this topic when you use MaterializeMySQL to synchronize data from one of the preceding MySQL databases. This topic describes how to synchronize data from an ApsaraDB RDS for MySQL database to an ApsaraDB for ClickHouse database.

Background information

To improve real-time data warehousing capabilities, ApsaraDB for ClickHouse provides the MaterializeMySQL database engine. MaterializeMySQL can read binary log files from a MySQL database and then replay the DDL and DML operations on an ApsaraDB for ClickHouse database that serves as a MySQL replica. This way, data can be synchronized between databases in real time.

Features

ApsaraDB for ClickHouse provides the following features developed by Alibaba Cloud:
  • You can synchronize data in specific tables by configuring a list of tables that you want to synchronize or a list of tables that you do not want to synchronize.
  • You can modify the configuration items for data synchronization.
  • You can specify whether to skip errors during data synchronization.
  • You can specify whether to skip tables whose schemas are not supported by ApsaraDB for ClickHouse.
  • You can specify whether to use the FINAL modifier when you query a table in a database that uses the MaterializeMySQL database engine.
  • You can specify the ORDER BY conditions that are used to sort data in ApsaraDB for ClickHouse.
  • You can specify whether to use a distributed table when you synchronize data from an ApsaraDB RDS for MySQL database to an ApsaraDB for ClickHouse database that uses MaterializeMySQL.
  • You can specify the maximum number of rows that can be pulled each second during full data synchronization.

Precautions

  • Only ApsaraDB for ClickHouse clusters of V20.8 and later support MaterializeMySQL.
  • When you use MaterializeMySQL to query a table in a database that uses MaterializeMySQL, you must use the FINAL modifier to ensure that the query result is consistent with data in the corresponding table in ApsaraDB RDS for MySQL.
  • You cannot synchronize the following tables to a database that uses MaterializeMySQL: tables that do not have primary keys and tables that are created by executing the CREATE TABLE AS SELECT statement.
  • The source ApsaraDB RDS for MySQL cluster and the destination ApsaraDB for ClickHouse cluster must be deployed in the same virtual private cloud (VPC).
  • The SELECT queries take effect only when the log_bin and gtid_mode parameters are set to ON for the source ApsaraDB RDS for MySQL cluster.
    Note By default, the log_bin and gtid_mode parameters are set to ON for the ApsaraDB RDS for MySQL cluster. For more information about the parameters, see View the parameters of an ApsaraDB RDS for MySQL instance.
  • The kernel version of the source ApsaraDB RDS for MySQL cluster must be 5.6 or later and global transaction identifiers (GTIDs) must be enabled.
  • If the DDL statements that are executed on the source ApsaraDB RDS for MySQL cluster do not comply with the standard MySQL syntax, data synchronization may fail.
  • To synchronize data from the source ApsaraDB RDS for MySQL cluster, you must add the CIDR block of the vSwitch that the destination ApsaraDB for ClickHouse cluster uses to the whitelist of the source ApsaraDB RDS for MySQL cluster. For more information, see Configure an IP address whitelist for an ApsaraDB RDS for MySQL instance.
  • Your account that is used to synchronize data from a MySQL database must be granted the permissions to execute the RELOAD, REPLICATION SLAVE, REPLICATION CLIENT, and SELECT PRIVILEGE statements on the MySQL database.
    Note You can execute the GRANT RELOAD, REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'your-user-name'; statement to obtain the required permissions.
  • Do not modify table schemas or data in ApsaraDB for ClickHouse.
  • The value range for the data of the DateTime data type in ApsaraDB for ClickHouse is [1970-01-01 00:00:00,2106-02-07 06:28:15]. If time data in ApsaraDB RDS for MySQL is not within this range, the values of synchronized data in ApsaraDB for ClickHouse are incorrect.

Use MaterializeMySQL

Before you synchronize tables from a MySQL database to an ApsaraDB for ClickHouse database, create a database that uses MaterializeMySQL.

Syntax

When you create a database, set the database engine type to MaterializeMySQL and configure the parameters for MaterializeMySQL. You can use the following syntax to create a database that uses MaterializeMySQL:

CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', 'database', 'user', 'password') 
[SETTINGS...]

where SETTINGS are:
[ { include_tables | exclude_tables } ]
[ skip_error_count ]
[ skip_unsupported_tables ]
[ order_by_only_primary_key ]
[ shard_model ]
[ rate_limiter_row_count_per_second ]
[ allow_query_with_final_auto ]
Database parameters
ParameterDescription
host:portThe URL and port number of the ApsaraDB RDS for MySQL database.
Note If the data source is a PolarDB for MySQL database, you can use only the endpoint of the primary database, that is, the primary endpoint, to connect to the ApsaraDB for ClickHouse cluster.
databaseThe name of the ApsaraDB RDS for MySQL database.
userThe account that is used to connect to the ApsaraDB RDS for MySQL database.
passwordThe password of the account that is used to connect to the ApsaraDB RDS for MySQL database.
Parameters for configuring MaterializeMySQL
ParameterData typeDescription
include_tablesStringA list of tables that you want to synchronize. After you configure a list of tables that you want to synchronize, only the tables in the list can be synchronized. Separate table names with commas (,).
Rules for fuzzy matches:
  • *: represents one or more characters except forward slashes (/) and empty strings.
  • ?: represents a single character.
  • {N..M}: represents all numbers within the range of [N, M].
exclude_tablesStringA list of tables that you do not want to synchronize. After you configure a list of tables that you do not want to synchronize, the tables in the list cannot be synchronized. Separate table names with commas (,).
Rules for fuzzy matches:
  • *: represents one or more characters except forward slashes (/) and empty strings.
  • ?: represents a single character.
  • {N..M}: represents a numeric digit within the range of [N, M].
Note The exclude_tables and include_tables parameters cannot be used together.
skip_error_countIntSpecifies whether to skip errors during data synchronization. Valid values:
  • -9223372036854775808 to -1: skips all errors.
  • 0: does not skip errors. This value is the default value.
  • 1 to 9223372036854775807: If you specify a value within this range, the specified number of errors are skipped.
skip_unsupported_tablesBooleanSpecifies whether to skip tables that are not supported by MaterializeMySQL. Valid values:
  • 0: does not skip tables whose schemas are not supported. In this case, data synchronization fails if the schema of a table that you want to synchronize is not supported.
  • 1: skips tables whose schemas are not supported. In this case, data synchronization is not interrupted if the schema of a table that you want to synchronize is not supported. This value is the default value.
Note The skip_unsupported_tables parameter is assigned a higher priority than the skip_error_count parameter. If the skip_unsupported_tables parameter is specified, the skip_error_count parameter does not take effect.
order_by_only_primary_keyBooleanSpecifies the ORDER BY conditions that are used to sort data in ApsaraDB for ClickHouse. Valid values:
  • 0: specifies the primary keys and indexes of tables in an ApsaraDB RDS for MySQL database as the ORDER BY conditions.
  • 1: specifies only the primary keys of tables in a self-managed MySQL database or an ApsaraDB RDS for MySQL database as the ORDER BY conditions. This value is the default value.
shard_modelBooleanSpecifies whether to use a distributed table when you synchronize data from an ApsaraDB RDS for MySQL database to an ApsaraDB for ClickHouse database that uses MaterializeMySQL. Valid values:
  • 0: the default mode. One table in an ApsaraDB RDS for MySQL database corresponds to one local table in an ApsaraDB for ClickHouse database. We recommend that you set the parameter value to 0 if your ApsaraDB for ClickHouse cluster contains only one node.

    In this mode, if your ApsaraDB for ClickHouse cluster contains multiple nodes, the MySQL server synchronizes data to each node of your cluster. This increases workloads on your MySQL server.

  • 1: the distribution mode. One table in an ApsaraDB RDS for MySQL database corresponds to one local table and one distributed table in an ApsaraDB for ClickHouse cluster. The distributed table has the same name as the table in the ApsaraDB RDS for MySQL database. You must query synchronized data in the distributed table. We recommend that you set the parameter value to 1 if your ApsaraDB for ClickHouse cluster contains multiple nodes.
rate_limiter_row_count_per_secondIntSpecifies the maximum number of rows that can be retrieved each second during full data synchronization. Valid values:
  • 0: The maximum number of rows that can be retrieved each second is not limited. This value is the default value.
  • 1 to 9223372036854775807: If you specify a value within this range, the value indicates the maximum number of rows that can be retrieved each second. For example, if you set the parameter value to 1, a maximum of one row can be retrieved each second during full data synchronization.
allow_query_with_final_autoBooleanSpecifies whether to use the FINAL modifier when you query a table in a database that uses MaterializeMySQL. Valid values:
  • 0: The FINAL modifier is not used.
  • 1: The FINAL modifier is used. This value is the default value.

    You can use the set allow_query_with_final_auto = 0 statement to set this parameter at the session level.

Note This parameter is unavailable for ApsaraDB for ClickHouse V22.8 clusters.
Note You can modify all parameters for configuring MaterializeMySQL. The new configuration immediately takes effect after the modification. You can use the following syntax to modify one of the parameters:
ALTER DATABASE <database> MODIFY SETTING

Example

CREATE DATABASE IF NOT EXISTS db_name ON CLUSTER cluster
ENGINE = MaterializeMySQL('rm-bp1h40w639t19****.mysql.rds.aliyuncs.com:3306', 'database', 'user', 'password') 
SETTINGS
include_tables ='a,b,c';

Instructions

Virtual columns

When you create a ReplacingMergeTree table in an ApsaraDB for ClickHouse database that uses MaterializeMySQL, two virtual columns are automatically added to the table.
ColumnData typeDescription
_versionUInt64The transaction counter that is used to record information about data versions.
_signInt8The delete marker that indicates whether a row is deleted. Valid values:
  • 1: The row is not deleted.
  • -1: The row is deleted.
    Note Rows with _sign=-1 are not physically deleted from the tables.

Conversions of DDL statements

MaterializeMySQL does not support direct INSERT, DELETE, or UPDATE queries. DDL queries on a MySQL database are converted into the corresponding DDL queries on an ApsaraDB for ClickHouse database.
  • INSERT queries on a MySQL database are converted into INSERT with _sign=1.
  • DELETE queries on a MySQL database are converted into INSERT with _sign=-1.
  • UPDATE queries on a MySQL database are converted into INSERT with _sign=1. When you modify the primary key, UPDATE queries on a MySQL database are converted into INSERT with _sign=-1.
Note
  • DDL statements that cannot be parsed by ApsaraDB for ClickHouse are ignored.
  • UPDATE CASCADE and DELETE CASCADE queries are not supported by MaterializeMySQL.

SELECT queries

  • You must specify the _sign=1 parameter and the FINAL modifier in a SELECT query. For example: select * from t1 final where _sign=1;.
  • By default, the following rules apply when you query data in an ApsaraDB for ClickHouse cluster of V21.8 or earlier:
    • If you do not specify the _version parameter in a SELECT query, the FINAL modifier is used, and rows that contain the maximum value of _version are returned. The system returns data of the latest version.
    • If you do not specify the _sign parameter in a SELECT query, WHERE _sign=1 is used by default. This way, rows that contain _sign=1 are returned. _sign=1 indicates that a row is not deleted.

Index conversion

  • ApsaraDB for ClickHouse automatically uses the primary keys and indexes of tables in an ApsaraDB RDS for MySQL database as the ORDER BY conditions to sort data.
  • Data in ApsaraDB for ClickHouse is sorted based on only one physical order specified by the ORDER BY clause. If you need to create a physical order, use a materialized view.

Mappings of supported data types of columns between MySQL databases and ApsaraDB for ClickHouse databases

The following table describes the mappings of supported data types of columns between MySQL databases and ApsaraDB for ClickHouse databases.

MySQLClickHouse
TINYInt8
SHORTInt16
INT24Int32
LONGUInt32
LONGUInt64
FLOATFloat32
DOUBLEFloat64
DECIMAL and NEWDECIMALDecimal
DATE and NEWDATEDate
DATETIME and TIMESTAMPDateTime
DATETIME2 and TIMESTAMP2DateTime64
STRINGString
VARCHAR and VAR_STRINGString
BLOBString
BITUInt64
SETUInt64
ENUMEnum16
YEARString
TIMEString
GEOMETRYString
JSONNot supported

Synchronize data from an ApsaraDB RDS for MySQL database

  1. Create a database and a table.
    1. Create a source ApsaraDB RDS for MySQL database. Then, create a table in the database.
      CREATE DATABASE cktest;
      CREATE TABLE cktest.test01 (a INT PRIMARY KEY, b INT);
    2. Create a destination ApsaraDB for ClickHouse database for data synchronization.
      CREATE DATABASE mysql ENGINE = MaterializeMySQL('rm-bp1h40w639t19****.mysql.rds.aliyuncs.com:3306', 'cktest', 'user', 'password');
    3. Query tables in the destination ApsaraDB for ClickHouse database.
      SHOW TABLES FROM mysql;
      The following output is returned:
      +--------+
      | name   | 
      +--------+
      | test01 | 
      +--------+
  2. Insert data and query the data.
    1. Insert data into the table in the source ApsaraDB RDS for MySQL database.
      INSERT INTO cktest.test01 VALUES (1, 10), (2, 20);
    2. Query the inserted data in ApsaraDB for ClickHouse.
      SELECT * FROM mysql.test01;
      The following output is returned:
      +----+----+
      | a  | b  |   
      +----+----+
      | 1  | 10 |
      +----+----+
      | 2  | 20 |
      +----+----+
  3. Delete data, add a column, update data, and then query the updated data.
    1. Delete data, add a column, update data, and then query the updated data in the source ApsaraDB RDS for MySQL database.
      DELETE FROM cktest.test01 WHERE a=1;
      ALTER TABLE cktest.test01 ADD COLUMN c VARCHAR(16);
      UPDATE cktest.test01 SET c='ok!', b=202;
      SELECT * FROM test01;
      The following output is returned:
      +---+-----+-----+ 
      | a | b   | c   |
      +---+-----+-----+ 
      | 2 | 202 | ok! |
      +---+-----+-----+
    2. Query the inserted data in ApsaraDB for ClickHouse.
      SELECT * FROM mysql.test01;
      The following output is returned:
      +----+-----+-----+ 
      | a  | b   | c   |
      +----+-----+-----+ 
      | 2  | 202 | ok! |
      +----+-----+-----+

Query details about data synchronization

You can query the system.materialize_mysql system table to obtain details about data synchronization from an ApsaraDB RDS for MySQL database to your ApsaraDB for ClickHouse database that uses MaterializeMySQL. You can execute one of the following statements based on different scenarios:
  • If the ApsaraDB for ClickHouse cluster contains only one node, execute the following statement:
    select * from system.materialize_mysql;
  • If the ApsaraDB for ClickHouse cluster contains multiple nodes, execute the following statement:
    select * from cluster(default, system, materialize_mysql);
The following output is returned:
+----------+----------------+-------------+------------------+--------------------+
| database |     status     | error_count | lasted_error_msg | sync_failed_tables |
+----------+----------------+-------------+------------------+--------------------+
|  mysql   | increment_sync |     0       |                  |                    |
+----------+----------------+-------------+------------------+--------------------+
The following table describes the fields that are returned in the query result.
FieldDescription
databaseThe destination database in the ApsaraDB for ClickHouse cluster that is used for the data synchronization.
statusThe state of the data synchronization. Valid values:
  • preparing_sync: indicates that the data synchronization is being prepared.
  • full_sync: indicates that full data is being synchronized.
  • increment_sync: indicates that incremental data is being synchronized.
  • stop_sync: indicates that the data synchronization is stopped.
error_countThe number of errors during the data synchronization.
lasted_error_msgThe details of the most recent error during the data synchronization.
sync_failed_tablesThe table or tables that failed to be synchronized. If multiple tables fail to be synchronized, the table names are separated with commas (,).

FAQ

  • Question 1: When I use a database that uses the MaterializeMySQL database engine to synchronize data from a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database, why does the The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires error message appear?

    Common cause: MySQL binary logs are expired and deleted because the MaterializeMySQL engine has stopped synchronizing your data for a long period of time.

    Solution: Delete the database and re-create a destination database in your ApsaraDB for ClickHouse cluster for data synchronization.

  • Question 2: Why does the MaterializeMySQL database engine stop synchronizing a table when I use the engine to synchronize data from a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database? Why does the sync_failed_tables parameter not return an empty string after I query the system.materialize_mysql system table?

    Common cause: DDL queries that ApsaraDB for ClickHouse does not support are performed on a MySQL database during data synchronization.

    Solution: Perform another data synchronization from the MySQL database. Perform the following steps:
    1. Delete the table that the MaterializeMySQL database engine stops synchronizing.
      drop table <table_name> on cluster default;
      Note The table_name parameter specifies the name of the table that the MaterializeMySQL database engine stops synchronizing. If the MaterializeMySQL database engine stops synchronizing a distributed table, you must delete the distributed table and the local table that the MaterializeMySQL database engine stops synchronizing.
    2. Restart data synchronization.
      alter database <database_name> on cluster default MODIFY SETTING skip_unsupported_tables = 1;
      Note The <database_name> parameter specifies the destination database in your ApsaraDB for ClickHouse cluster for your data synchronization.
  • Question 3: MySQL binary logs are deleted because of one of the following reasons: the binary logs expire, or the disk utilization of a self-managed MySQL instance, an ApsaraDB RDS for MySQL instance, or a PolarDB for MySQL instance reaches the threshold. As a result, data fails to be synchronized from the MySQL database to an ApsaraDB for ClickHouse database that uses MaterializeMySQL. How do I query and set a binary log retention period for a self-managed MySQL database, an ApsaraDB RDS for MySQL database, or a PolarDB for MySQL database?
    Solution:
    • For MySQL 8.0 or earlier, execute the following statements to query and set a binary log retention period:
      showvariableslike'expire_logs_days';
      setglobalexpire_logs_days=7;
    • For MySQL 8.0 or later, execute the following statements to query and set a binary log retention period:
      showvariableslike'%binlog_expire_logs_seconds%';
      setglobalbinlog_expire_logs_seconds=604800;