DataWorks provides MySQL Reader and MySQL Writer for you to read data from and write data to MySQL data sources. You can use the codeless user interface (UI) or code editor to configure synchronization nodes for MySQL data sources. This topic describes how to add a MySQL data source. This topic also describes the network environment and permissions that you must prepare before you add a data source.

Prerequisites

Before you add a data source, make sure that the following operations are performed:
  • Prepare a data source: An ApsaraDB RDS for MySQL instance is created.
  • Create an account used to access an ApsaraDB RDS for MySQL database and grant the required permissions to the account. For more information, see Prepare an account that has the required permissions.

    A real-time synchronization node for a MySQL data source accesses the specified ApsaraDB RDS for MySQL database by using the account that is configured when you add the MySQL data source to DataWorks. You must make sure that the account is granted the following permissions on the database: SELECT, REPLICATION SLAVE, and REPLICATION CLIENT.

  • Enable the binary logging feature for the ApsaraDB RDS for MySQL instance. For more information, see Enable the binary logging feature for an ApsaraDB RDS for MySQL instance.

    Real-time synchronization of incremental data from MySQL is performed based on real-time subscription to MySQL binary logs. Before you configure a real-time synchronization node to synchronize incremental data from an ApsaraDB RDS for MySQL instance, you must enable the binary logging feature for the instance.

  • Purchase an exclusive resource group for Data Integration that meets your business requirements. For more information, see Create and use an exclusive resource group for Data Integration.
  • Establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB RDS for MySQL instance. For more information, see Establish a network connection between a resource group and a data source.

Background information

Workspaces in standard mode allow you to isolate data sources. You can separately add data sources for the development and production environments to isolate the data sources. This keeps your data secure. For more information, see Isolate a data source in the development and production environments.

Limits

  • Real-time synchronization of data from MySQL is performed based on real-time subscription to MySQL binary logs. Real-time data synchronization from MySQL supports only ApsaraDB RDS for MySQL data sources that run MySQL 5.X or 8.X. Real-time data synchronization from MySQL does not support PolarDB-X 1.0 data sources that run MySQL. If you want to synchronize data from a PolarDB-X 1.0 data source that runs MySQL in real time, you can refer to Add a DRDS data source to add a PolarDB-X 1.0 data source and configure a real-time synchronization node for the data source.
  • You cannot use the real-time synchronization feature to synchronize data on which XA ROLLBACK statements are executed. For transaction data on which XA PREPARE statements are executed, you can use the real-time synchronization feature to synchronize the data to a destination. If XA ROLLBACK statements are executed later on the data, the rollback changes to the data cannot be synchronized to the destination. If the tables that you want to synchronize contain tables on which XA ROLLBACK statements are executed, you must remove the tables on which XA ROLLBACK statements are executed and add the removed tables again to initialize full data in the source and synchronize incremental data.
  • If you add an ApsaraDB RDS for MySQL instance that belongs to a different Alibaba Cloud account from the current workspace to DataWorks as a MySQL data source and you configure a data synchronization node for the MySQL data source, you can use only an exclusive resource group for Data Integration to run the node. If you use the shared resource group for Data Integration to run the node, the resource group cannot access data in the MySQL data source.

Add a MySQL data source

  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
    4. In the left-side navigation pane of the Data Integration page, choose Data Source > Data Sources to go to the Data Source page.
  2. On the Data Source page, click Add data source in the upper-right corner.
  3. In the Add data source dialog box, click MySQL in the Relational Database section.
  4. In the Add MySQL data source dialog box, configure the parameters.
    1. Configure basic information for the MySQL data source.
      You can use one of the following modes to add a MySQL data source: Alibaba Cloud instance mode and connection string mode.
      • The following table describes the parameters you can configure when you add a MySQL data source in Alibaba Cloud instance mode. Alibaba Cloud Instance Mode
        ParameterDescription
        Data Source TypeThe mode in which the data source is added. Set this parameter to Alibaba Cloud Instance Mode.
        Data Source NameThe name of the data source. The name can contain only letters, digits, and underscores (_), and must start with a letter.
        Data Source DescriptionThe description of the data source. The description cannot exceed 80 characters in length.
        EnvironmentThe environment in which the data source is used. Valid values: Development and Production.
        Note This parameter is displayed only if the workspace is in standard mode.
        RegionThe region where the data source resides.
        RDS Instance IDThe ID of the ApsaraDB RDS for MySQL instance. You can log on to the ApsaraDB RDS console to obtain the ID.
        RDS Instance Account IDThe ID of the Alibaba Cloud account that is used to purchase the ApsaraDB RDS for MySQL instance. You can log on to the DataWorks console with the Alibaba Cloud account, move the pointer over the profile picture in the upper-right corner, and then view the ID of the account.
        Default Database Name
        The name of the default ApsaraDB RDS for MySQL database. The following descriptions provide instructions for you to configure a data synchronization node or solution that uses a MySQL data source:
        • When you configure a database-level real-time or batch synchronization node or solution that uses a MySQL data source, you can select one or more databases on which you have access permissions in the ApsaraDB for MySQL instance.
        • If you select multiple databases when you configure a batch synchronization node, you must add a data source for each database.
        UsernameThe username that is used to log on to the database.
        PasswordThe password that is used to log on to the database. Do not use at signs (@) in the password.
        Configure Secondary DatabaseIf the ApsaraDB RDS for MySQL instance is a primary instance and has a secondary ApsaraDB RDS for MySQL instance, you can turn on the Configure Secondary Database switch when you add the primary ApsaraDB RDS for MySQL instance to DataWorks as a data source, and select the ID of the secondary ApsaraDB RDS for MySQL instance. This prevents the primary ApsaraDB RDS for MySQL instance from being affected and ensures the performance of the primary ApsaraDB RDS for MySQL instance. If the secondary ApsaraDB RDS for MySQL instance has multiple read-only ApsaraDB RDS for MySQL instances, only data of one of the read-only instances is read.
        Note If you turn on the Configure Secondary Database switch when you add a MySQL data source, you can use only an exclusive resource group to run the node that uses the data source.
        Note You must add the IP addresses or CIDR blocks that you use to connect to the ApsaraDB RDS for MySQL instance to the IP address whitelist of the instance. For more information, see Configure an IP address whitelist.
      • The following table describes the parameters you can configure when you add a MySQL data source in connection string mode. Connection String Mode
        ParameterDescription
        Data Source TypeThe mode in which the data source is added. Set this parameter to Connection String Mode.
        Data Source NameThe name of the data source. The name can contain only letters, digits, and underscores (_), and must start with a letter.
        Data Source DescriptionThe description of the data source. The description cannot exceed 80 characters in length.
        EnvironmentThe environment in which the data source is used. Valid values: Development and Production.
        Note This parameter is displayed only if the workspace is in standard mode.
        JDBC URLThe Java Database Connectivity (JDBC) URL of the database. Specify a value for this parameter in the format of jdbc:mysql://Server IP address:Port number/Database name. To obtain the JDBC URL, you can log on to the ApsaraDB RDS console, find the ApsaraDB RDS for MySQL instance, click the instance ID to go to the Basic Information page, and then click See Detail next to Network Type on this page. The Database parameter in the URL specifies the name of the default database in this data source. A data synchronization solution or node can be used to synchronize the data in all the databases on which you have access permissions in this data source.
        Note If you use the data source in a batch synchronization node, the related Reader or Writer plug-in accesses the specified database in the data source by default. If you use this data source in a data synchronization solution, Data Integration accesses all the databases in the data source.
        UsernameThe username that is used to log on to the database.
        PasswordThe password that is used to log on to the database.
  5. Set Resource Group connectivity to Data Integration.
  6. Find the desired resource group in the resource group list in the lower part of the dialog box and click Test connectivity in the Actions column.
    A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can be normally run, you must test the connectivity of all the resource groups for Data Integration on which your synchronization nodes will be run. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Establish a network connection between a resource group and a data source.
    Note
    • By default, the resource group list displays only exclusive resource groups for Data Integration. To ensure the stability and performance of data synchronization, we recommend that you use exclusive resource groups for Data Integration.
    • If you want to test the network connectivity between the shared resource group or a custom resource group and the data source, click Advanced below the resource group list. In the Warning message, click Confirm. Then, all available shared and custom resource groups appear in the resource group list.
  7. After the data source passes the connectivity test, click Complete.

What to do next

You have learned how to add a MySQL data source. You can proceed to subsequent tutorials. In subsequent tutorials, you will learn how to configure MySQL Reader and MySQL Writer. For more information, see MySQL Reader or MySQL Writer.