You can use MySQL Reader to read data from tables in your MySQL database in real time by subscribing to binary logs (binlogs). This topic describes how to configure MySQL Reader. This topic also describes the network environment and permissions that you must prepare before you configure MySQL Reader.

Prerequisites

Before you configure MySQL Reader, make sure that the following operations are performed:
  • Prepare a data source: A MySQL data source is created.
  • Plan and prepare resources: An exclusive resource group for Data Integration is purchased and configured. For more information, see Plan and configure resources.
  • Evaluate and plan the network environment: Before you add the data source to DataWorks, connect the data source to an exclusive resource group for Data Integration based on your business requirements. After you connect the data source to the exclusive resource group for Data Integration, configure network access settings such as a vSwitch and a whitelist.
    • If data sources and exclusive resource groups for Data Integration reside in the same region and virtual private cloud (VPC), they are automatically connected.
    • If data sources and exclusive resource groups for Data Integration reside in different network environments, you must connect data sources and resource groups by using methods such as a VPN gateway.
    For more information about how to configure a whitelist, see Configure whitelists for data sources.
  • Execute the following statement to check the version of your MySQL database. You can use a MySQL 5.X or MySQL 8.X database.
    select version();
    Note Real-time data synchronization from or to MySQL is performed based on real-time subscription to MySQL binary logs. For a real-time data synchronization node that uses a MySQL data source, you can use an ApsaraDB RDS for MySQL data source whose MySQL version is V5.X or V8.X. PolarDB for MySQL is not supported. If the MySQL version of your ApsaraDB RDS for MySQL database is not V5.X or V8.X, use an ApsaraDB RDS for MySQL database whose MySQL version is V5.X or V8.X. Otherwise, the data synchronization node fails to run.
  • Prepare an account and grant permissions to the account.

    You must create an account that can be used to access the data sources, read data from the source, and write data to the destination during the data synchronization process.

    For more information, see Create and authorize an account.
  • Enable the MySQL binary logging feature. This feature is required only for real-time data synchronization. For more information about real-time data synchronization, see Overview.

    If the source data source is a MySQL database, you must enable the binary logging feature. Binary logs record changes to all table schemas and modifications on table data. You can execute statements such as CREATE and ALTER to perform operations on table schemas. You can execute statements such as INSERT, UPDATE, and DELETE to perform operations on table data. You can use binary logs to view the change history of the database, back up incremental data and restore data in the database, and replicate data from the primary database to secondary databases.

    Formats of binary logs:
    • Statement: SQL statement-based replication. Binary logs in this format record the SQL statements that are executed to modify data entries.
    • Row: row-based replication. Binary logs in this format record only modification details about data entries in rows.
    • Mixed: replication in mixed mode. This mode combines the statement and row formats. In most cases, binary logs in the statement format are used to record the SQL statements that are executed to modify data entries, such as functions. If data replication from the primary database to secondary databases cannot be implemented by using binary logs in this format, switch to the row format. MySQL determines which format to use based on each SQL statement that is executed.
    For more information about how to enable the binary logging feature, see Enable the binary logging feature.

Limits

  • Data Integration does not support data synchronization from a read-only database by using MySQL Reader.
  • MySQL Reader reads data in your MySQL database in real time by subscribing to binlogs. MySQL Reader supports data synchronization only from ApsaraDB RDS for MySQL 5.x and 8.x. MySQL Reader does not support data synchronization from Distributed Relational Database Service (DRDS).

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click Data Development in the Actions column.
  2. In the Scheduled Workflow pane, move the pointer over the Create icon icon and choose Data Integration > Real-time synchronization.
    Alternatively, click the required workflow, right-click Data Integration, and then choose Create > Real-time synchronization.
  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and set the Node Name and Location parameters.
    Notice The node name can be up to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. On the configuration tab of the real-time sync node, choose Input > MySQL and drag MySQL to the canvas on the right.
  6. Click the MySQL node. In the Node Configuration panel, set the parameters.
    Input
    Parameter Description
    Data source The data source. You can only select a MySQL data source.

    If no data source is available, click New data source to add one on the Data Source page.

    Table The name of the table from which you want to read data. You can click Preview Data on the right to preview the selected table.
    In the database sharding scenario, MySQL Reader can read data from multiple tables and databases in real time.
    Notice To prevent errors, make sure that the tables use the same schema.
    Output Fields The fields that you want to synchronize. You can select management fields and data fields.
    • Manage fields: the additional fields that are automatically added when the fields in the source tables are synchronized to the destination. These fields facilitate data management, sorting, and deduplication.
    • Data fields: the fields in the source tables that you want to synchronize.

    For more information, see Fields used for real-time synchronization.

    MySQL Reader supports the sharding feature. To use this feature, click Add a database and table data source and select the data sources and tables from the drop-down lists to synchronize data from multiple tables and databases.
    Notice To prevent errors, make sure that the selected tables use the same schema.
  7. Click the Save icon icon in the top toolbar.