DataWorks provides MySQL Reader and MySQL Writer for you to read data from and write data to MySQL data sources. You can use the codeless user interface (UI) or code editor to configure synchronization nodes for MySQL data sources. This topic describes how to add a MySQL data source. This topic also describes the network environment and permissions that you must prepare before you add a data source.

Prerequisites

Before you add a data source, make sure that the following operations are performed:
  • Prepare a data source: A MySQL data source is created.
  • Plan and prepare resources: An exclusive resource group for Data Integration is purchased and configured. For more information, see Plan and configure resources.
  • Evaluate and plan the network environment: Before you add the data source to DataWorks, connect the data source to an exclusive resource group for Data Integration based on your business requirements. After the network connection is established between the data source and exclusive resource group for Data Integration, configure network access settings such as a vSwitch and a whitelist.
    • If the data sources and the exclusive resource group for Data Integration reside in the same region and virtual private cloud (VPC), they are automatically connected.
    • If the data sources and the exclusive resource group for Data Integration reside in different network environments, you must connect the data sources and the resource group by using methods such as a VPN gateway.
    For more information about how to configure a whitelist, see Configure whitelists for data sources.
  • Execute the following statement to check the version of your MySQL database. You can use a MySQL 5.X or MySQL 8.X database.
    select version();
    Note If the MySQL version of your ApsaraDB RDS for MySQL database is not V5.X or V8.X, use an ApsaraDB RDS for MySQL database whose MySQL version is V5.X or V8.X. Otherwise, the data synchronization node fails to run.
  • Prepare an account and grant permissions to the account.

    You must create an account that can be used to access the data sources, read data from the source, and write data to the destination during the data synchronization process.

    For more information, see Create and authorize an account.
  • Enable the MySQL binary logging feature. This feature is required only for real-time data synchronization. For more information about real-time data synchronization, see Overview.

    If the source data source is a MySQL database, you must enable the binary logging feature. Binary logs record changes to all table schemas and modifications on table data. You can execute statements such as CREATE and ALTER to perform operations on table schemas. You can execute statements such as INSERT, UPDATE, and DELETE to perform operations on table data. You can use binary logs to view the change history of the database, back up incremental data and restore data in the database, and replicate data from the primary database to secondary databases.

    Formats of binary logs:
    • Statement: SQL statement-based replication. Binary logs in this format record the SQL statements that are executed to modify data entries.
    • Row: row-based replication. Binary logs in this format record only modification details about data entries in rows.
    • Mixed: replication in mixed mode. This mode combines the statement and row formats. In most cases, binary logs in the statement format are used to record the SQL statements that are executed to modify data entries, such as functions. If data replication from the primary database to secondary databases cannot be implemented by using binary logs in this format, switch to the row format. MySQL determines which format to use based on each SQL statement that is executed.
    For more information about how to enable the binary logging feature, see Enable the binary logging feature.

Background information

Workspaces in standard mode allow you to isolate data sources. You can separately add data sources for the development environment and production environment to isolate the data sources. This keeps your data secure. For more information, see Isolate connections between the development and production environments.

Limits

  • Real-time data synchronization from a MySQL data source is performed based on real-time subscription to MySQL binary logs. You can use only an ApsaraDB RDS for MySQL data source whose MySQL version is V5.X or V8.X for a real-time synchronization node that is used to synchronize data from MySQL. PolarDB for MySQL is not supported.
  • You cannot use the real-time synchronization feature to synchronize data on which XA ROLLBACK statements are executed. For transaction data on which XA PREPARE statements are executed, you can use the real-time synchronization feature to synchronize the data to a destination. If XA ROLLBACK statements are executed later on the data, the rollback changes to the data cannot be synchronized to the destination. If the tables that you want to synchronize contain tables on which XA ROLLBACK statements are executed, you must remove the tables on which XA ROLLBACK statements are executed and add the removed tables again to initialize full data in the source and synchronize incremental data.

Add a MySQL data source

  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
    4. In the left-side navigation pane of the Data Integration page, choose Data Source > Data Sources to go to the Data Source page.
  2. On the Data Source page, click Add data source in the upper-right corner.
  3. In the Add data source dialog box, click MySQL in the Relational Database section.
  4. In the Add MySQL data source dialog box, configure the parameters.
    1. Configure basic information for the MySQL data source.
      You can use one of the following modes to add a MySQL data source: Alibaba Cloud Instance Mode and Connection String Mode.
      • The following table describes the parameters you can configure when you add a MySQL data source in Alibaba Cloud instance mode. Alibaba Cloud Instance Mode
        Parameter Description
        Data Source Type The mode in which the data source is added. Set this parameter to Alibaba Cloud Instance Mode.
        Data Source Name The name of the data source. The name can contain letters, digits, and underscores (_) and must start with a letter.
        Data Source Description The description of the data source. The description cannot exceed 80 characters in length.
        Environment The environment in which the data source is used. Valid values: Development and Production.
        Note This parameter is displayed only when the workspace is in standard mode.
        Region The region where the data source resides.
        RDS Instance ID The ID of the ApsaraDB RDS for MySQL instance. You can log on to the ApsaraDB RDS console to obtain the ID.
        RDS Instance Account ID The ID of the Alibaba Cloud account that is used to purchase the ApsaraDB RDS for MySQL instance. You can use the Alibaba Cloud account to log on to the DataWorks console, move the pointer over the profile picture in the upper-right corner, and then view the ID of the account.
        Default Database Name
        The name of the default ApsaraDB RDS for MySQL database. The following descriptions provide instructions for you to configure a data synchronization node or solution that uses a MySQL data source:
        • When you configure a database-level real-time or batch data synchronization node or solution that uses a MySQL data source, you can select one or more databases on which you have access permissions in the ApsaraDB for MySQL instance.
        • If you select multiple databases when you configure a batch synchronization node, you must add a data source for each database.
        Username The username used to log on to the ApsaraDB RDS for MySQL database.
        Password The password that is used to log on to the ApsaraDB RDS for MySQL database. Do not use at signs (@) in the password.
        Note You must add the IP addresses or CIDR blocks that you used to connect to the ApsaraDB RDS for MySQL instance to the whitelist of the instance. For more information, see Configure an IP address whitelist.
      • The following table describes the parameters you can configure when you add a MySQL data source in connection string mode. JDBC
        Parameter Description
        Data Source Type The mode in which the data source is added. Set this parameter to Connection String Mode.
        Data Source Name The name of the data source. The name can contain letters, digits, and underscores (_) and must start with a letter.
        Data Source Description The description of the data source. The description cannot exceed 80 characters in length.
        Environment The environment in which the data source is used. Valid values: Development and Production.
        Note This parameter is displayed only when the workspace is in standard mode.
        JDBC URL The Java Database Connectivity (JDBC) URL of the database. Specify a value for this parameter in the format of jdbc:mysql://Server IP address:Port number/Database name. The Database parameter in the URL specifies the name of the default database in this data source. A data synchronization solution or node can be used to synchronize the data in all the databases on which you have access permissions in this data source.
        Note If you use this data source in a batch data synchronization node, the related reader or writer accesses the specified database in the data source by default. If you use this data source in a data synchronization solution, Data Integration can read data from all databases in the data source.
        Username The username that is used to log on to the ApsaraDB RDS for MySQL database.
        Password The password that is used to log on to the ApsaraDB RDS for MySQL database.
  5. Set Resource Group connectivity to Data Integration.
  6. Find the desired resource group in the resource group list in the lower part of the dialog box and click Test connectivity in the Actions column.
    A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can be normally run, you must test the connectivity of all the resource groups for Data Integration on which your synchronization nodes will be run. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Establish a network connection between a resource group and a data source.
    Note
    • By default, the resource group list displays only exclusive resource groups for Data Integration. To ensure the stability and performance of data synchronization, we recommend that you use exclusive resource groups for Data Integration.
    • If you want to test the network connectivity between the shared resource group or a custom resource group and the data source, click Advanced below the resource group list. In the Warning message, click Confirm. Then, all available shared and custom resource groups appear in the resource group list.
  7. After the data source passes the connectivity test, click Complete.