Before you configure a data synchronization solution or node, you must make sure that your exclusive resource group for Data Integration and data sources are connected to each other. You can select appropriate network connectivity solutions to establish network connections between the resource group and data sources based on the network environments in which the data sources are deployed. This topic describes the network connectivity solutions that are available when data sources are deployed in different types of network environments.

Precautions

  • You can run a data synchronization node only if network connections are established between the data sources and the resource group for the node. Therefore, before you commit your data synchronization node to the production environment for running, you must make sure that the data sources used for the node pass the network connectivity test. Take note that network connectivity does not necessarily ensure a successful running result of a data synchronization node.
  • You can refer to the instructions provided in this topic to establish network connections between an exclusive resource group for scheduling and data sources.
  • An exclusive resource group for Data Integration cannot connect to a data source that is deployed on the classic network. Before you synchronize data from or to such a data source, we recommend that you migrate the data source to a virtual private cloud (VPC).
  • If you synchronize data from a data source over the Internet, the data transmission speed cannot be ensured. We recommend that you synchronize data over a VPC.

Background information

You can use an exclusive resource group for Data Integration to synchronize data between heterogeneous data sources in a complex network environment. Before you run a data synchronization node to synchronize data, you must establish network connections between the exclusive resource group for Data Integration and the data sources.

Data synchronizationBefore you perform data synchronization, you must establish network connections between the resource group that you want to use and your data sources, as shown in the preceding figure. This topic describes how to establish network connections between an exclusive resource group for Data Integration and data sources.

Purchase an exclusive resource group for Data Integration

For more information about how to purchase an exclusive resource group for Data Integration, see Create and use an exclusive resource group for Data Integration.
Note
  • The maximum number of data synchronization nodes that can be run in parallel on a resource group and the maximum number of parallel threads supported by a resource group vary based on the specifications of the resource groups. You must purchase a resource group with appropriate specifications based on your business requirements.
  • We recommend that you use different resource groups to run a batch synchronization node and a real-time synchronization node. If you use the same resource group to run a batch synchronization node and a real-time synchronization node, the two nodes compete for resources and affect each other. For example, CPU resources, memory resources, and networks used by the two nodes may affect each other. In this case, the batch synchronization node may slow down, or the real-time synchronization node may be delayed. Even worse, out-of-memory (OOM) errors may occur due to the lack of resources.

Configure network connectivity

Step 1: Associate a resource group with a VPC

The network connectivity solution that you can use varies based on the network relationship between your exclusive resource group for Data Integration and a data source. The following figure and table provide the related information.

Network connectivity solution
Network type for data synchronizationData source typeRelationship between the data source and the exclusive resource group for Data IntegrationCommon logic for network connectionsSample configuration
VPCAlibaba Cloud data sources
  • Self-managed data source hosted on an Elastic Compute Service (ECS) instance
  • Alibaba Cloud database service
Same Alibaba Cloud account and same regionVPCAssociate the exclusive resource group for Data Integration with the VPC in which the data source resides. Scenario 1: Establish a network connection between an exclusive resource group for Data Integration and a data source that belong to the same Alibaba Cloud account and reside in the same region
Different Alibaba Cloud accounts or regionsAccess a data source that resides in a different region
  1. Use a network connection tool, such as a Cloud Enterprise Network (CEN) instance, Express Connect circuit, or VPN gateway, to perform one of the following operations: 1. Establish a network connection between the VPC in which the data source resides and a VPC (referred to as VPC 1) in the region in which the exclusive resource group for Data Integration resides. 2. Establish a network connection between the VPC in which the data source resides and a VPC (referred to as VPC 2) within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs.
  2. Associate the exclusive resource group for Data Integration with VPC 1 or VPC 2.
    Note If you select an advanced security group when you associate an exclusive resource group for Data Integration with a VPC, you must add security group rules to the advanced security group on the Security Groups page in the ECS console after the association. You must add the following security group rules to the advanced security group:
    • Outbound rule: Add the IP address of the data source that the exclusive resource group for Data Integration needs to access as the authorization object.
    • Inbound rule: Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated as the authorization object.
    For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to the whitelist of a data source.
  3. Add a route that points to the CIDR block of the data source for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.
Data sources that do not belong to Alibaba Cloud
  • Data source in a data center
  • Cloud database that does not belong to Alibaba Cloud
Access a data source that does not belong to Alibaba CloudScenario 4: Establish a network connection between an exclusive resource group for Data Integration and a data source that resides in a data center
Internet-Synchronize data over the InternetThe exclusive resource group for Data Integration can directly connect to the data source that is accessible over the Internet. -
Note If the data source is configured with an IP address whitelist, you must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated or the elastic IP address (EIP) of the exclusive resource group for Data Integration to the IP address whitelist regardless of the scenario in which you want to synchronize data. For information about how to obtain the CIDR block or IP address that must be added to the IP address whitelist, see Configure a whitelist.

Step 2: Configure the IP address whitelist of the data source

If the data source is configured with an IP address whitelist, you must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated or the EIP of the exclusive resource group for Data Integration to the IP address whitelist regardless of the scenario in which you want to synchronize data.

  • If you want to synchronize data over a VPC, you must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the data source.
  • If you want to synchronize data over the Internet, you must add the EIP of the exclusive resource group for Data Integration to the IP address whitelist of the data source.
You can use one of the following methods to obtain the CIDR block or IP address that must be added to the IP address whitelist:
  • If you want to use an exclusive resource group for Data Integration to run a node to synchronize data from a data source over a VPC, you must add the CIDR block of the vSwitch to which the exclusive resource group is bound to an IP address whitelist of the data source. To obtain and add the CIDR block of the vSwitch to which the resource group is bound to an IP address whitelist of the data source, perform the following operations:
    On the Exclusive Resource Groups tab of the DataWorks console, find the desired exclusive resource group for Data Integration and click Network Settings in the Actions column to view the CIDR block of the vSwitch to which the resource group is bound. Then, add the CIDR block to the IP address whitelist of the data source. View the CIDR block of the vSwitch to which the resource group is bound
  • If you want to use an exclusive resource group for Data Integration to run a node to synchronize data from a data source over the Internet, add the EIP of the exclusive resource group to an IP address whitelist of the data source. To obtain and add the EIP of the exclusive resource group for Data Integration to an IP address whitelist of the data source, perform the following operations:
    On the Exclusive Resource Groups tab of the DataWorks console, find the exclusive resource group for Data Integration whose EIP you want to view and click View Information in the Actions column. In the Exclusive Resource Groups dialog box, copy the EIP. Then, add the copied EIP to the IP address whitelist of the data source. View the EIP of the exclusive resource group for Data Integration
    Note If you upgrade the configuration of the exclusive resource group for Data Integration, you must check whether the EIP of the resource group changes. If the EIP of the resource group changes, add the new EIP to the IP address whitelist of the data source after the configuration upgrade. This ensures the normal running of your synchronization node.

Sample configurations for different scenarios

The first three scenarios that are described in this section demonstrate how to establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance over a VPC. For information about how to obtain the VPC information of an ApsaraDB RDS instance, see Change the VPC and vSwitch for an ApsaraDB RDS for MySQL instance.
Note In the following scenarios, the exclusive resource group for Data Integration is associated with a basic security group. For information about basic security groups, see Overview.

Scenario 1: Establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance that belong to the same Alibaba Cloud account and reside in the same region

Instruction on establishing a network connectionIllustration
  1. Associate the exclusive resource group for Data Integration with the VPC in which the ApsaraDB RDS instance resides.
  2. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the ApsaraDB RDS instance.
Same Alibaba Cloud account and same region

Scenario 2: Establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance that belong to the same Alibaba Cloud account but reside in different regions

Instruction on establishing a network connectionIllustration
  1. Establish a network connection between the region in which the exclusive resource group for Data Integration resides and the region in which the ApsaraDB RDS instance resides.

    Use a network connection tool, such as a CEN instance or VPN gateway, to establish a network connection between the VPC in which the ApsaraDB RDS instance resides and a VPC (referred to as VPC 1) in the region in which the exclusive resource group for Data Integration resides.

  2. Establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB RDS instance.
    1. Associate the exclusive resource group for Data Integration with VPC 1.
    2. Add a route that points to the CIDR block of the ApsaraDB RDS instance for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.
  3. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the ApsaraDB RDS instance.
Same Alibaba Cloud account but different regions

Scenario 3: Establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance that belong to different Alibaba Cloud accounts

Instruction on establishing a network connectionIllustration
  1. Establish a network connection between the Alibaba Cloud accounts.

    Use a network connection tool, such as a CEN instance or VPN gateway, to establish a network connection between the VPC in which the ApsaraDB RDS instance resides and a VPC (referred to as VPC 1) within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs.

  2. Establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB RDS instance.
    1. Associate the exclusive resource group for Data Integration with VPC 1.
    2. Add a route that points to the CIDR block of the ApsaraDB RDS instance for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.
  3. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the ApsaraDB RDS instance.
Different Alibaba Cloud accounts

Scenario 4: Establish a network connection between an exclusive resource group for Data Integration and a data source that resides in a data center

If the data source that you want to use does not belong to Alibaba Cloud, you can refer to this scenario to establish a network connection between the data source and the resource group that you want to use.

  1. Establish a network connection between the network environment in which the data source resides and Alibaba Cloud.

    Use an Express Connect circuit to establish a network connection between the network environment in which the data source resides and a VPC within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs.

  2. Establish a network connection between the exclusive resource group for Data Integration and the data source.
    1. Associate the exclusive resource group for Data Integration with the VPC that is connected to the data source.
    2. Add a route that points to the CIDR block of the data source for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.
  3. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the data source.

What to do next

Configure a data synchronization solution or node. For more information, see the following topics: