DataWorks provides resource groups for Data Integration for you to synchronize data. Before you synchronize data, you must make sure that the resource group for Data Integration that you use is connected to the related data sources. You can select a network connectivity solution based on the network environments of the data sources and the type of resource group that you use. This topic describes the network connectivity solutions that are available when data sources are deployed in different types of network environments.

Resource group type | Description | Scenario |
---|---|---|
Exclusive resource group for Data Integration | Exclusive resource groups for Data Integration are managed by DataWorks. After you purchase an exclusive resource group for Data Integration, you can use the resources in the resource group in an exclusive manner. For more information, see Create and use an exclusive resource group for Data Integration. |
|
Custom resource group for Data Integration | A type of resource group for Data Integration that consists of idle servers. For more information about how to create a custom resource group for Data Integration, see Create a custom resource group for Data Integration. | If you have idle servers, you can create a custom resource group based on the idle servers to run your nodes. You must make sure that your data sources can be connected to the custom resource group. |
Overview of network connectivity solutions

- Use an exclusive resource group for Data Integration
An exclusive resource group for Data Integration can connect to a data source that is accessible over the Internet or deployed in a virtual private cloud (VPC). However, an exclusive resource group for Data Integration cannot connect to a data source that is deployed in the classic network. For more information, see Use an exclusive resource group for Data Integration.
Data source type Relationship between the data source and workspace Access to the data source over the Internet Access to the data source over a VPC Access to the data source over the classic network Alibaba Cloud data sources - Self-managed data source hosted on an Elastic Compute Service (ECS) instance
- Alibaba Cloud database service
Same Alibaba Cloud account and same region The data source can directly connect to the resource group. For more information about the solution and how to configure network connectivity between the data source and resource group, see Table 1 Synchronize data in a database over the Internet. For more information about the solution and how to configure network connectivity between the data source and resource group, see Table 2 Synchronize data in a database that belongs to the same Alibaba Cloud account and resides in the same region as a workspace over a VPC. Not supported Same Alibaba Cloud account and different regions Use a network connection tool to connect the data source to the workspace. Then, connect the data source to the resource group. For more information about the solution and how to configure network connectivity between the data source and resource group, see Table 3 Synchronize data in a database that belongs to the same Alibaba Cloud account as and resides in a different region from a workspace over a VPC.Additional information Different Alibaba Cloud accounts Use a network connection tool to connect the data source and the workspace. Then, connect the data source to the resource group. For more information about the solution and how to configure network connectivity between the data source and resource group, see Table 4 Synchronize data in a database that belongs to a different Alibaba Cloud account from a workspace over a VPC.Additional information Data sources that do not belong to Alibaba Cloud - Data source in a data center
- Cloud database that does not belong to Alibaba Cloud
N/A Use a network connection tool to connect the data source to Alibaba Cloud. Then, connect the data source to the resource group. For more information about the solution and how to configure network connectivity between the data source and resource group, see Table 5 Synchronize data in a database that resides in a data center or belongs to other cloud service providers.Additional information - Use a custom resource group for Data Integration
A custom resource group for Data Integration can directly connect to a data source that is accessible over the Internet or deployed in the same network environment as the custom resource group. If a data source and a custom resource group for Data Integration are deployed in different network environments, the custom resource group cannot connect to the data source. For more information, see Use a custom resource group for Data Integration.
Use an exclusive resource group for Data Integration
Exclusive resource groups are deployed in the VPC in which DataWorks is hosted. Exclusive resource groups are disconnected from other network environments. To use an exclusive resource group, you must configure network settings for the exclusive resource group to associate it with a VPC that can connect to data sources. This way, the exclusive resource group can access the data sources over the VPC.
- For more information about exclusive resource groups for Data Integration, see Exclusive resource groups for Data Integration.
- After the connectivity is configured, check whether the data source is configured with a whitelist. If the data source is configured with a whitelist, you must add the Classless Inter-Domain Routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure an IP address whitelist.
- If you use a self-managed data source that is hosted on an ECS instance, configure a security group for the instance. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
- Access a database over the Internet
Table 1. Synchronize data in a database over the Internet Network connectivity solution Instruction on connectivity configuration The exclusive resource group for Data Integration can directly connect to the data source. The following figure shows how to configure the network connectivity between an ApsaraDB RDS database and an exclusive resource group for Data Integration. For more information about exclusive resource groups for Data Integration, see Create and use an exclusive resource group for Data Integration. For more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
Notice Take note of the Internet traffic cost. For more information, see Internet traffic generated by Data Integration. - Access a database over a VPC
Table 2. Synchronize data in a database that belongs to the same Alibaba Cloud account and resides in the same region as a workspace over a VPC Network connectivity solution Instruction on connectivity configuration The following figure shows a network connectivity solution. The following figure shows the architecture of the network connectivity solution.
Associate the exclusive resource group for Data Integration with the VPC where the data source resides. Note- You can associate the resource group with a vSwitch in the VPC. After the resource group is associated with a vSwitch, the system automatically adds a route to the VPC. This way, the resource group can connect to the data source.
- If you associate the exclusive resource group for Data Integration with the VPC where the data source resides when you add the data source, the exclusive resource group can access only the vSwitch to which the data source belongs and cannot connect to the VPC. In this case, you must manually add a route. For more information, see Add a route.
For more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
Table 3. Synchronize data in a database that belongs to the same Alibaba Cloud account as and resides in a different region from a workspace over a VPC Network connectivity solution Instruction on connectivity configuration The following figure shows a network connectivity solution. The following figure shows the architecture of the network connectivity solution.
- Associate the exclusive resource group for Data Integration with a VPC.
- Create a VPC in the region where the DataWorks workspace resides.
- Associate the exclusive resource group for Data Integration with the VPC.
- Connect the resource group to the data source.
- Connect the VPC that is created in the previous step to the VPC where the data source resides by using Express Connect circuits or VPN gateways.
- Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the VPC where the data source resides. For more information, see Add a route.
For more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
Table 4. Synchronize data in a database that belongs to a different Alibaba Cloud account from a workspace over a VPC Network connectivity solution Instruction on connectivity configuration The following figure shows a network connectivity solution. The following figure shows the architecture of the network connectivity solution.
- Associate the exclusive resource group for Data Integration with a VPC.
- Create a VPC in the region where the DataWorks workspace resides.
- Associate the exclusive resource group for Data Integration with the VPC.
- Connect the resource group to the data source.
- Connect the VPC that is created in the previous step to the VPC where the data source resides by using Express Connect circuits or VPN gateways.
- Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the VPC where the data source resides. For more information, see Add a route.
For more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
- Synchronize data in a database that does not belong to Alibaba Cloud
Table 5. Synchronize data in a database that resides in a data center or belongs to other cloud service providers Network connectivity solution Instruction on connectivity configuration The following figure shows a network connectivity solution. The following figure shows the architecture of the network connectivity solution.
- Associate the exclusive resource group for Data Integration with a VPC.
- Create a VPC in the region where the DataWorks workspace resides.
- Associate the exclusive resource group for Data Integration with the VPC.
- Connect the resource group to the data source.
- Connect the VPC that is created in the previous step to the data center where the data source resides by using Express Connect circuits or VPN gateways.
- Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the data center where the data source resides. For more information, see Add a route.
- Associate the exclusive resource group for Data Integration with a VPC.
- Synchronize data in a database over the classic network
The exclusive resource group for Data Integration cannot connect to the data source.Note We recommend that you migrate the data source to a VPC and do not use the classic network of Alibaba Cloud.
Use a custom resource group for Data Integration
- You must activate DataWorks Professional Edition before you can use custom resource groups. For more information about custom resource groups, see Custom resource groups.
- After the connectivity is configured, check whether the data source is configured with a whitelist. If the data source is configured with a whitelist, you must add the Classless Inter-Domain Routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure an IP address whitelist.
- If you use a self-managed data source that is hosted on an ECS instance, configure a security group for the instance. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
Network environment of the data source | Network connectivity solution | Instruction on connectivity configuration |
---|---|---|
The data source is accessible over the Internet. | The custom resource group for Data Integration can directly connect to the data source.
![]() |
For more information about custom resource groups for Data Integration, see Create a custom resource group for Data Integration.
Note Take note of the Internet traffic cost. For more information, see Internet traffic generated by Data Integration.
|
The data source and custom resource group for Data Integration use the same IP address on the classic network or are in the same VPC or data center. | The custom resource group for Data Integration can directly connect to the data source.
![]() |
|
The data source and the custom resource group for Data Integration use different IP addresses on the classic network or are in different VPCs or data centers. | The following figure shows a network connectivity solution.![]() |
Connect the custom resource group for Data Integration to the data source by using Express Connect circuits or VPN gateways. |
Additional information
- The following services may be involved in network connectivity solutions:
- View the resource group on which a synchronization node is run.
- If the logs contain information that is similar to the following example, the synchronization
node is run on the shared resource group:
running in Pipeline[basecommon_ group_xxxxxxxxx] - If ApsaraDB RDS databases are involved, an OXS cluster is used to run the synchronization node. The logs are in the format of running in Pipeline[basecommon_ group_xxx_oxs]. - If ApsaraDB RDS databases are not involved, an ECS cluster is used to run the synchronization node. The logs are in the format of running in Pipeline[basecommon_ group_xxx_ecs].
- If the logs contain information that is similar to the following example, the synchronization
node is run on an exclusive resource group for Data Integration:
running in Pipeline[basecommon_S_res_group_xxx]
- If the logs contain information that is similar to the following example, the synchronization
node is run on a custom resource group for Data Integration:
running in Pipeline[basecommon_xxxxxxxxx]
- If the logs contain information that is similar to the following example, the synchronization
node is run on the shared resource group:
What to do next
- Configure network connectivity between a resource group and data sources.
- After you select an appropriate network connectivity solution, connect a resource group to a data source by following the related instructions on connectivity configuration.
- After the connectivity is configured, check whether the data source is configured with a whitelist. If the data source is configured with a whitelist, you must add the Classless Inter-Domain Routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure an IP address whitelist.
- If you use a self-managed data source that is hosted on an ECS instance, configure a security group for the instance. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
- Configure a data synchronization solution or node. For more information, see the following topics: