You can use MySQL Binlog Reader to read data from tables in your MySQL database in real time by subscribing to real-time binlogs. This topic describes how to configure MySQL Binlog Reader. It also describes the network environment and permissions that you must prepare before you configure MySQL Binlog Reader.

Prerequisites

Before you configure MySQL Binlog Reader, make sure that the following operations are performed:
  • Prepare data sources: A PolarDB for MySQL cluster and a Hologres instance are created. In this topic, a PolarDB for MySQL cluster is used as the source data source.
  • Plan and prepare resources: An exclusive resource group for data integration is purchased and configured. For more information, see Plan and configure resources.
  • Evaluate and plan the network environment: Before you perform data integration, connect data sources to exclusive resource groups for data integration based on your business requirements. After data sources and exclusive resource groups for data integration are connected, you can refer to the operations in this topic to configure access settings such as vSwitches and whitelists.
    • If data sources and exclusive resource groups for data integration reside in the same region and virtual private cloud (VPC), they are automatically connected.
    • If data sources and exclusive resource groups for data integration reside in different network environments, you must connect data sources and resource groups by using methods such as a VPN gateway.

Limits

  • The Data Integration service of DataWorks does not support data synchronization from a read-only database by using MySQL Binlog Reader.
  • MySQL Binlog Reader supports the following versions of ApsaraDB RDS for MySQL: V5.1, V5.5, V5.6, V5.7, and V8.0. It does not support Distributed Relational Database Service (DRDS) databases that run MySQL.

Configure MySQL Binlog Reader

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. Select the region where the required workspace resides, find the workspace, and then click Data Analytics.
  2. Move the pointer over the Create icon and choose Data Integration > Real-time synchronization.
    Alternatively, you can click the required workflow, right-click Data Integration, and then choose Create > Real-time synchronization.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Notice The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. On the configuration tab of the real-time sync node, choose Input > MySQL Binlog and drag MySQL Binlog to the canvas on the right.
  6. Click the MySQL Binlog component node. In the panel that appears, set the parameters in the Node configuration section.
    Reader
    Parameter Description
    Data source The connection to the MySQL Binlog data store that you have configured. In this example, you can only select a MySQL Binlog connection.

    If no connection can be used, click New data source on the right to create one on the Data Source > page.

    Table The name of the table from which you want to synchronize data in the specified data store. You can click Data preview on the right to preview the selected table.
    In the case of sharding, data can be synchronized from multiple databases and tables in real time.
    Notice To prevent errors, the tables in database shards and table shards must use the same schema.
    Output field The fields that you want to synchronize, including management fields and data fields.
    • Management fields: additional fields that are automatically added to the destination tables when data is synchronized. These fields facilitate data management, sorting, and deduplication.
    • Data fields: the fields in the source tables that you want to synchronize.

    For more information, see Fields used for real-time synchronization.

    MySQL Binlog Reader supports the sharding feature. To use this feature, click Add a database and table data source and select the data sources and tables from the drop-down lists to synchronize data from multiple tables and databases.
    Notice To prevent errors, the selected tables must use the same schema.
  7. Click the Save icon in the top toolbar.