Before you configure a data synchronization solution or node, you must make sure that your exclusive resource group for Data Integration and data sources are connected to each other. You can select appropriate network connectivity solutions to establish network connections between the resource group and data sources based on the network environments in which the data sources are deployed. This topic describes the network connectivity solutions that are available when data sources are deployed in different types of network environments.

Background information

Data synchronizationBefore you synchronize data, you must establish network connections between the resource group for Data Integration and data sources, as shown in the preceding figure. This topic focuses on network connections between an exclusive resource group for Data Integration and data sources.
Note If your data source is configured with an IP address whitelist, you must add the elastic IP address (EIP) of the exclusive resource group for Data Integration that you use to the whitelist. For more information about how to obtain the EIP of an exclusive resource group for Data Integration, see Configure an IP address whitelist.

Purchase an exclusive resource group for Data Integration

For more information about how to purchase an exclusive resource group for Data Integration, see Create and use an exclusive resource group for Data Integration.
Note
  • The maximum number of data synchronization nodes that can be run in parallel on a resource group and the maximum number of parallel threads supported by a resource group vary based on the specifications of the resource groups. You must purchase a resource group with appropriate specifications based on your business requirements.
  • We recommend that you use different resource groups to run a batch synchronization node and a real-time synchronization node. If you use the same resource group to run a batch synchronization node and a real-time synchronization node, the two nodes compete for resources and affect each other. For example, CPU resources, memory resources, and networks used by the two nodes may affect each other. In this case, the batch synchronization node may slow down, or the real-time synchronization node may be delayed. Even worse, out-of-memory (OOM) errors may occur due to the lack of resources.

Configure network connectivity

The network connectivity solution that you can use depend on the network relationship between your exclusive resource group for Data Integration and data source. The following figure and table provide the related information in detail.

Network connectivity solution
Network type for data synchronization Data source type Relationship between the data source and DataWorks workspace Common logic for network connections Configuration example
Synchronize data over a virtual private cloud (VPC) Alibaba Cloud data sources
  • Self-managed data source hosted on an Elastic Compute Service (ECS) instance
  • Alibaba Cloud database service
Same Alibaba Cloud account and same regionVPC Associate the exclusive resource group for Data Integration with the VPC in which the data source is deployed. Scenario 1: Establish a network connection between a data source and a DataWorks workspace that belong to the same Alibaba Cloud account and reside in the same region
Different Alibaba Cloud accounts or regionsAccess a data source that resides in a different region
  1. Use a network connection tool, such as a CEN instance, Express Connect circuit, or VPN gateway, to establish a network connection between the VPC in which the data source is deployed and a VPC in the region where the DataWorks workspace resides.
  2. Associate the exclusive resource group for Data Integration with the VPC that is connected to the VPC in which the data source is deployed.
  3. Add a route for the exclusive resource group for Data Integration in the DataWorks console to connect the resource group to the data source. For more information, see Add a route.
Data sources that do not belong to Alibaba Cloud
  • Data source in a data center
  • Cloud database that does not belong to Alibaba Cloud
- Scenario 4: Establish a network connection between a data source that resides in a data center and a DataWorks workspace
Synchronize data over the Internet - - The exclusive resource group for Data Integration can directly connect to the data source that is accessible over the Internet. -
Note
  • If the data source is configured with an IP address whitelist, you must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated or the EIP of the exclusive resource group for Data Integration to the whitelist regardless of the scenario in which you want to synchronize data. For more information about how to obtain the information that needs to be added to the whitelist, see Configure an IP address whitelist.
  • An exclusive resource group for Data Integration cannot connect to a data source that is deployed in the classic network. Before you synchronize data from such a data source, we recommend that you migrate the data source to a VPC.
  • If you synchronize data from a data source over the Internet, the data transmission speed cannot be ensured. We recommend that you synchronize data over a VPC.

Sample configurations for different scenarios

In the first three scenarios that are described in this section, an ApsaraDB RDS instance is used to demonstrate how to establish a network connection between a database in the instance and an exclusive resource group for Data Integration. For more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Change the VPC and vSwitch for an ApsaraDB RDS for MySQL instance.

  • Access the ApsaraDB RDS database over a VPC

    Scenario 1: Establish a network connection between a data source and a DataWorks workspace that belong to the same Alibaba Cloud account and reside in the same region

    Instruction on establishing a network connection Illustration
    1. Associate the exclusive resource group for Data Integration with the VPC in which the data source is deployed.
    2. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the ApsaraDB RDS instance.
    Same Alibaba Cloud account and same region

    Scenario 2: Establish a network connection between a data source and a DataWorks workspace that belong to the same Alibaba Cloud account and reside in different regions

    Instruction on establishing a network connection Illustration
    1. Establish a network connection between the data source that resides in one region and the DataWorks workspace that resides in another region.

      Use a network connection tool, such as a CEN instance or VPN gateway, to establish a network connection between the VPC in which the data source is deployed and a VPC in the region where the DataWorks workspace resides.

    2. Establish a network connection between the data source and exclusive resource group for Data Integration.
      1. Associate the exclusive resource group for Data Integration with the VPC that is connected to the VPC in which the data source is deployed.
      2. Add a route for the exclusive resource group for Data Integration in the DataWorks console to connect the resource group to the VPC in which the data source is deployed. For more information, see Add a route.
    Same Alibaba Cloud account and different regions

    Scenario 3: Establish a network connection between a data source and a DataWorks workspace that belong to different Alibaba Cloud accounts

    Instruction on establishing a network connection Illustration
    1. Establish a network connection between a data source and a DataWorks workspace that belong to different Alibaba Cloud accounts.

      Use a network connection tool, such as a CEN instance or VPN gateway, to establish a network connection between the VPC in which the data source is deployed and a VPC within the Alibaba Cloud account to which the DataWorks workspace belongs.

    2. Establish a network connection between the data source and exclusive resource group for Data Integration.
      1. Associate the exclusive resource group for Data Integration with the VPC that is connected to the data source.
      2. Add a route for the exclusive resource group for Data Integration in the DataWorks console to connect the resource group to the VPC in which the data source is deployed. For more information, see Add a route.
    Different Alibaba Cloud accounts
  • Access a data source that does not belong to Alibaba Cloud

    Scenario 4: Establish a network connection between a data source that resides in a data center and an exclusive resource group for Data Integration

    1. Establish a network connection between the network environment in which the data source is deployed and Alibaba Cloud.

      Use an Express Connect circuit to establish a network connection between the network environment in which the data source is deployed and a VPC within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs.

    2. Establish a network connection between the data source and exclusive resource group for Data Integration.
      1. Associate the exclusive resource group for Data Integration with the VPC that is connected to the data source.
      2. Add a route for the exclusive resource group for Data Integration in the DataWorks console to connect the resource group to the VPC that is connected to the data source. For more information, see Add a route.

What to do next

  1. Configure security settings.
    1. After the network connectivity is configured, check whether the data source is configured with an IP address whitelist. If the data source is configured with an IP address whitelist, you must add the EIP of the exclusive resource group for Data Integration or the CIDR block of the vSwitch with which the resource group is associated to the whitelist. This way, the resource group can be used to read data from or write data to the data source. For more information, see Configure an IP address whitelist.
    2. If you use a self-managed data source that is hosted on an ECS instance, configure a security group rule for the instance. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
  2. Configure a data synchronization solution or node. For more information, see the following topics: