All Products
Search
Document Center

DataWorks:Configure MySQL Reader

Last Updated:Aug 01, 2023

You can use MySQL Reader to read data from tables in your MySQL database in real time by subscribing to binary logs. This topic describes how to configure MySQL Reader. This topic also describes the network environment and permissions that you must prepare before you configure MySQL Reader.

Prerequisites

Before you configure MySQL Reader, make sure that the following operations are performed:
  • An account that is used to access a MySQL database is created, and required permissions are granted to the account. For more information, see Prepare an account that has the required permissions.

    A real-time synchronization node for a MySQL data source accesses the specified MySQL database by using the account that is configured when you add the MySQL data source to DataWorks. You must make sure that the account is granted the following permissions on the ApsaraDB RDS for MySQL database: SELECT, REPLICATION SLAVE, and REPLICATION CLIENT.

  • The binary logging feature is enabled for the ApsaraDB RDS for MySQL instance that you want to add to DataWorks as a MySQL data source. For more information, see Enable the binary logging feature for an ApsaraDB RDS for MySQL instance.

    Real-time synchronization of incremental data from MySQL is performed based on real-time subscription to MySQL binary logs. Before you configure a real-time synchronization node to synchronize incremental data from an ApsaraDB RDS for MySQL instance, you must enable the binary logging feature for the instance.

  • An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
  • A network connection between the exclusive resource group for Data Integration and the MySQL data source is established. For more information, see Establish a network connection between a resource group and a data source.

Limits

  • Data Integration does not support real-time synchronization of data from a read-only ApsaraDB RDS for MySQL instance by using MySQL Reader.
  • Real-time synchronization of incremental data from MySQL is performed based on real-time subscription to MySQL binary logs. Real-time data synchronization from MySQL supports only ApsaraDB RDS for MySQL data sources that run MySQL 5.X or 8.X. If you want to run a real-time synchronization node to synchronize data from a PolarDB-X 1.0 data source that runs MySQL, you can refer to operations described in Add a DRDS data source to add a PolarDB-X 1.0 data source to DataWorks and use the PolarDB-X 1.0 data source for the node.
  • Real-time data synchronization does not support functional indexes.

Configure MySQL Reader

  1. Go to the DataStudio page.

    1. Log on to the DataWorks console.

    2. In the left-side navigation pane, click Workspaces.

    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. On the Workspaces page, find the workspace and click Shortcuts > Data Development in the Actions column.

  2. In the Scheduled Workflow pane, move the pointer over the Create a table icon and choose Create Node > Data Integration > Real-time synchronization.

    Alternatively, right-click the required workflow, and then choose Create Node > Data Integration > Real-time synchronizationReal-time synchronization.

  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.

    Important

    The node name cannot exceed 128 characters in length and can contain letters, digits, underscores (_), and periods (.).

  4. Click Confirm.

  5. On the configuration tab of the real-time synchronization node, choose Input > MySQL and drag MySQL to the canvas on the right.
  6. Click the MySQL node. In the panel that appears, configure the parameters.
    Input
    ParameterDescription
    Data sourceThe MySQL data source that you added to DataWorks. You can select only a MySQL data source.

    If no data source is available, click New data source to add one on the Data Source page.

    TableThe name of the table from which you want to read data. You can click Preview Data on the right to preview the selected table.
    You can run a real-time synchronization node to synchronize data from tables in sharded MySQL databases. In this case, MySQL Reader can read data from the databases and tables at the same time in real time.
    Important To prevent errors, make sure that the tables use the same schema.
    Output FieldsThe fields that you want to synchronize. The fields include management fields and data fields.
    • Management fields: the additional fields that are automatically added when the fields in the source tables are synchronized to the destination. These fields facilitate data management, sorting, and deduplication.
    • Data fields: the fields in the source tables that you want to synchronize.

    For information about the format of a data record that is synchronized from MySQL in real time , see Fields used for real-time synchronization.

    MySQL Reader supports the sharding feature. To use this feature, click Add a database and table data source and select multiple data sources and tables from the drop-down lists to synchronize data from the databases and tables at the same time.
    Important To prevent errors, make sure that the selected tables use the same schema.
  7. Click the Save icon in the top toolbar to save the configurations.