This topic describes how to use MaterializeMySQL to synchronize data from MySQL to ApsaraDB for ClickHouse.

Overview

To enhance real-time data warehousing capabilities, ApsaraDB for ClickHouse provides the MaterializeMySQL database engine to map tables in the MySQL server to ApsaraDB for ClickHouse. The ApsaraDB for ClickHouse server works as a MySQL replica to read binary logs and perform DDL and DML queries. Business data is synchronized in real time by using MySQL binlogs.

Prerequisites

  • The source MySQL database and the destination ApsaraDB for ClickHouse cluster belong to the same virtual private cloud (VPC).
  • The IP address of the ApsaraDB for ClickHouse cluster is added to a whitelist of the MySQL database.
  • You have permissions to execute the RELOAD, REPLICATION SLAVE, REPLICATION CLIENT, and SELECT PRIVILEGE statements on MySQL databases.

Create a database

Syntax

CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]

Engine parameters

Parameter Description
host:port The URL and port number of the MySQL database.
database The name of the MySQL database.
user The account that is used to connect to the MySQL database.
password The password of the account that is used to connect to the MySQL database.

New columns added by default

When you use the MaterializeMySQL database engine, ApsaraDB for ClickHouse creates a table whose engine is ReplacingMergeTree in the destination ApsaraDB for ClickHouse cluster and adds the following columns to the table by default.
Field Description
_version The transaction counter used to record the data version information. The transaction counter is of the UInt64 type.
_sign The delete marker that specifies whether the row is deleted. The delete marker is of the Int8 type.
Valid values:
  • 1: The row is not deleted.
  • -1: The row is deleted.

Mappings of supported data types

MySQL ClickHouse
TINY Int8
SHORT Int16
INT24 Int32
LONG UInt32
LONG UInt64
FLOAT Float32
DOUBLE Float64
DECIMAL, NEWDECIMAL Decimal
DATE, NEWDATE Date
DATETIME, TIMESTAMP DateTime
DATETIME2, TIMESTAMP2 DateTime64
STRING String
VARCHAR, VAR_STRING String
BLOB String
BIT UInt64
SET UInt64
ENUM Enum16
JSON String
YEAR String
TIME String
GEOMETRY String

All other MySQL data types are converted into String. Columns of all the preceding data types can be left empty.

Specifics and recommendations

DDL queries

MySQL DDL queries are converted into the corresponding ApsaraDB for ClickHouse DDL queries, such as ALTER, CREATE, DROP, and RENAME. DDL queries that cannot be parsed by ApsaraDB for ClickHouse are ignored.

Data replication

MaterializeMySQL does not support INSERT, DELETE, or UPDATE queries but converts DDL queries.
  • A MySQL INSERT query is converted into INSERT with _sign=1.
  • A MySQL DELETE query is converted into INSERT with _sign=-1.
  • A MySQL UPDATE query is converted into INSERT with _sign=1 and INSERT with _sign=-1.

SELECT queries

  • If _version is not specified in a SELECT query, the FINAL modifier is used. This way, rows with MAX(_version) are returned, which indicates that data of the latest version is returned.
  • If _sign is not specified in a SELECT query, WHERE _sign=1 is used. This way, rows with _sign=1 are returned.

Index conversion

  • MySQL PRIMARY KEY and INDEX clauses are converted into ORDER BY tuples in ApsaraDB for ClickHouse tables.
  • ApsaraDB for ClickHouse has only one physical order, which is determined by the ORDER BY clause. To create a physical order, use materialized views.
Note
  • Rows with _sign=-1 are not physically deleted from the tables.
  • Cascade UPDATE/DELETE queries are not supported by the MaterializeMySQL engine.
  • Replication can be easily broken.
  • Manual operations on databases and tables are forbidden.
  • MaterializeMySQL is affected by the optimize_on_insert settings. When a table in the MySQL server changes, the data is merged into the corresponding table in the MaterializeMySQL database.

Proprietary features of ApsaraDB for ClickHouse

Note The exclude_tables and include_tables configuration items cannot be both used.

Configure a list of tables to be synchronized

If you configure a list of tables to be synchronized, only tables in the list are synchronized. The include_tables parameter takes String type arguments to include strings that match specific regular expressions. The following statement provides an example on how to configure include_tables:
Creating a Database 
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') 
SETTINGS  
include_tables ='a,b,c...';
Description of wildcards in parameter value strings:
  • *: replaces all characters except forward slashes (/) and empty strings.
  • ?: replaces every single character in a string.
  • {N..M}: replaces all numeric digits within the range of [N, M].

Configure a list of tables not to be synchronized

If a list of tables not to be synchronized is configured, tables in the list are not synchronized. The exclude_tables parameter takes String type arguments to exclude strings that match specific regular expressions. The following statement provides an example on how to configure exclude_tables:
Creating a Database 
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') 
SETTINGS  
exclude_tables ='a,b,c...';
Description of wildcards in parameter value strings:
  • *: replaces all characters except forward slashes (/) and empty strings.
  • ?: replaces every single character in a string.
  • {N..M}: replaces all numeric digits within the range of [N, M].

Modify synchronization configurations

All configuration items of the MaterializeMySQL engine can be modified and dynamically take effect. The following syntax is used:
ALTER database db_name MODIFY SETTING  exclude_tables|include_tables = '*';

Skip synchronization errors

The skip_error_count parameter is added to the MaterializeMySQL engine. You can specify this parameter to specify the number of synchronization errors to be skipped. Valid values of the parameter:
  • -9223372036854775808 to -1: All errors are skipped.
  • 0: No error is skipped.
  • 1 to 9223372036854775807: The specified number of errors are skipped.
The following syntax is used:
Creating a Database
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password')
SETTINGS skip_error_count = -1;

Example 1

Create a database for table synchronization
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') 
SETTINGS 
include_tables ='a,b,c...',
exclude_tables ='e,f,g...',

Example 2

Query data in MySQL
mysql> CREATE DATABASE db;
mysql> CREATE TABLE db.test (a INT PRIMARY KEY, b INT);
mysql> INSERT INTO db.test VALUES (1, 11), (2, 22);
mysql> DELETE FROM db.test WHERE a=1;
mysql> ALTER TABLE db.test ADD COLUMN c VARCHAR(16);
mysql> UPDATE db.test SET c='Wow!', b=222;
mysql> SELECT * FROM test;
+---+------+------+ 
| a |    b |    c |
+---+------+------+ 
| 2 |  222 | Wow! |
+---+------+------+

Exchange data between ApsaraDB for ClickHouse and MySQL

Create a database and show the name of a table in the database.
CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***');
SHOW TABLES FROM mysql;
┌─name─┐
│ test │
└──────┘
Insert data into the table and retrieve data from the table.
SELECT * FROM mysql.test;
┌─a─┬──b─┐ 
│ 1 │ 11 │ 
│ 2 │ 22 │ 
└───┴────┘
Delete data from the table, add a column, update data in the column, and then retrieve data from the table.
SELECT * FROM mysql.test;
┌─a─┬───b─┬─c────┐ 
│ 2 │ 222 │ Wow! │ 
└───┴─────┴──────┘