DataWorks provides MySQL Reader and MySQL Writer for you to read data from and write data to MySQL data sources. You can use the codeless user interface (UI) or code editor to configure synchronization nodes for MySQL data sources. This topic describes how to add a MySQL data source. This topic also describes the network environment and permissions that you must prepare before you add a data source.
Prerequisites
- Prepare a data source: An ApsaraDB RDS for MySQL instance is created.
- Create an account used to access an ApsaraDB RDS for MySQL database and grant the required permissions to the account. For more information, see Prepare an account that has the required permissions.
A real-time synchronization node for a MySQL data source accesses the specified ApsaraDB RDS for MySQL database by using the account that is configured when you add the MySQL data source to DataWorks. You must make sure that the account is granted the following permissions on the database:
SELECT
,REPLICATION SLAVE
, andREPLICATION CLIENT
. - Enable the binary logging feature for the ApsaraDB RDS for MySQL instance. For more information, see Enable the binary logging feature for an ApsaraDB RDS for MySQL instance.
Real-time synchronization of incremental data from MySQL is performed based on real-time subscription to MySQL binary logs. Before you configure a real-time synchronization node to synchronize incremental data from an ApsaraDB RDS for MySQL instance, you must enable the binary logging feature for the instance.
- Purchase an exclusive resource group for Data Integration that meets your business requirements. For more information, see Create and use an exclusive resource group for Data Integration.
- Establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB RDS for MySQL instance. For more information, see Establish a network connection between a resource group and a data source.
Background information
Workspaces in standard mode allow you to isolate data sources. You can separately add data sources for the development and production environments to isolate the data sources. This keeps your data secure. For more information, see Isolate a data source in the development and production environments.Limits
- Real-time synchronization of data from MySQL is performed based on real-time subscription to MySQL binary logs. Real-time data synchronization from MySQL supports only ApsaraDB RDS for MySQL data sources that run MySQL
5.X
or8.X
. Real-time data synchronization from MySQL does not support PolarDB-X 1.0 data sources that run MySQL. If you want to synchronize data from a PolarDB-X 1.0 data source that runs MySQL in real time, you can refer to Add a DRDS data source to add a PolarDB-X 1.0 data source and configure a real-time synchronization node for the data source. - You cannot use the real-time synchronization feature to synchronize data on which XA ROLLBACK statements are executed. For transaction data on which XA PREPARE statements are executed, you can use the real-time synchronization feature to synchronize the data to a destination. If XA ROLLBACK statements are executed later on the data, the rollback changes to the data cannot be synchronized to the destination. If the tables that you want to synchronize contain tables on which XA ROLLBACK statements are executed, you must remove the tables on which XA ROLLBACK statements are executed and add the removed tables again to initialize full data in the source and synchronize incremental data.
- If you add an ApsaraDB RDS for MySQL instance that belongs to a different Alibaba Cloud account from the current workspace to DataWorks as a MySQL data source and you configure a data synchronization node for the MySQL data source, you can use only an exclusive resource group for Data Integration to run the node. If you use the shared resource group for Data Integration to run the node, the resource group cannot access data in the MySQL data source.
Add a MySQL data source
- Go to the Data Source page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
- In the left-side navigation pane of the Data Integration page, choose to go to the Data Source page.
- On the Data Source page, click Add data source in the upper-right corner.
- In the Add data source dialog box, click MySQL in the Relational Database section.
- In the Add MySQL data source dialog box, configure the parameters.
- Set Resource Group connectivity to Data Integration.
- Find the desired resource group in the resource group list in the lower part of the dialog box and click Test connectivity in the Actions column. A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can be normally run, you must test the connectivity of all the resource groups for Data Integration on which your synchronization nodes will be run. If you want to test the connectivity of multiple resource groups for Data Integration at a time, select the resource groups and click Batch test connectivity. For more information, see Establish a network connection between a resource group and a data source.Note
- By default, the resource group list displays only exclusive resource groups for Data Integration. To ensure the stability and performance of data synchronization, we recommend that you use exclusive resource groups for Data Integration.
- If you want to test the network connectivity between the shared resource group or a custom resource group and the data source, click Advanced below the resource group list. In the Warning message, click Confirm. Then, all available shared and custom resource groups appear in the resource group list.
- After the data source passes the connectivity test, click Complete.