All Products
Search
Document Center

DataWorks:Establish a network connection between a resource group and a data source

Last Updated:Nov 02, 2023

Before you configure a synchronization task, you must make sure that network connections are established between your exclusive resource group for Data Integration and your data sources. You can select appropriate network connectivity solutions to establish network connections between the resource group and data sources based on the network environments in which the data sources are deployed. This topic describes the network connectivity solutions that are available when data sources are deployed in different types of network environments.

Precautions

  • You can run a synchronization task only if network connections are established between the data sources and the resource group for the synchronization task. Therefore, before you commit your synchronization task to the production environment for running, you must make sure that the data sources used for the synchronization task pass the network connectivity test. Take note that network connectivity does not necessarily ensure a successful running result of a synchronization task.

  • You can refer to the instructions provided in this topic to establish network connections between an exclusive resource group for scheduling and your data sources.

  • An exclusive resource group for Data Integration cannot connect to a data source that is deployed on the classic network. Before you synchronize data from or to such a data source, we recommend that you migrate the data source to a virtual private cloud (VPC).

  • If you synchronize data from a data source over the Internet, the speed of data transmission and the stability of your synchronization task cannot be ensured. We recommend that you synchronize data over a VPC or by using Cloud Enterprise Network (CEN).

Background information

You can use an exclusive resource group for Data Integration to synchronize data between heterogeneous data sources in a complex network environment. Before you run a synchronization task to synchronize data, you must establish network connections between the exclusive resource group for Data Integration and the data sources.

数据同步Before you perform data synchronization, you must establish network connections between the resource group that you want to use and your data sources, as shown in the preceding figure. This topic focuses on network connections between an exclusive resource group for Data Integration and data sources.

Purchase an exclusive resource group for Data Integration

For more information about how to purchase an exclusive resource group for Data Integration, see Create and use an exclusive resource group for Data Integration.

Note
  • The maximum number of synchronization tasks that can be run in parallel on a resource group and the maximum number of parallel threads supported by a resource group vary based on the specifications of the resource groups. You must purchase a resource group with appropriate specifications based on your business requirements.

  • We recommend that you use different resource groups to run a batch synchronization task and a real-time synchronization task. If you use the same resource group to run a batch synchronization task and a real-time synchronization task, the two synchronization tasks compete for resources and affect each other. For example, CPU resources, memory resources, and networks used by the two synchronization tasks may affect each other. In this case, the batch synchronization task may slow down, or the real-time synchronization task may be delayed. Even worse, the batch synchronization task or real-time synchronization task may be killed by an out of memory (OOM) killer.

  • Exclusive resource groups in the same region use the same elastic IP address (EIP). If you block the EIP for the region, all exclusive resource groups in the region cannot access your data.

Configure network connectivity

Step 1: Associate a resource group with a VPC

The network connectivity solution that you can use varies based on the network relationship between your exclusive resource group for Data Integration and a data source. The following figure and table provide the related information.

网络连通方案

Network type for data synchronization

Data source type

Relationship between the data source and the exclusive resource group for Data Integration

Common logic for network connections

Sample configuration

VPC

Alibaba Cloud data sources

  • Self-managed data source hosted on an Elastic Compute Service (ECS) instance

  • Alibaba Cloud database service

Same Alibaba Cloud account and same regionVPC

Associate the exclusive resource group for Data Integration with the VPC in which the data source is deployed.

Scenario 1: Establish a network connection between an exclusive resource group for Data Integration and a data source that belong to the same Alibaba Cloud account and reside in the same region

Different Alibaba Cloud accounts or regionsVPC跨地域

  1. Use a network connection tool, such as a CEN instance, Express Connect circuit, or VPN gateway, to perform one of the following operations: 1. Establish a network connection between the VPC in which the data source is deployed and a VPC (referred to as VPC 1) in the region in which the exclusive resource group for Data Integration resides. 2. Establish a network connection between the VPC in which the data source resides and a VPC (referred to as VPC 2) within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs.

  2. Associate the exclusive resource group for Data Integration with VPC 1 or VPC 2.

    Note

    If you select an advanced security group when you associate an exclusive resource group for Data Integration with a VPC, you must add security group rules to the advanced security group on the Security Groups page in the ECS console after the association. You must add the following security group rules to the advanced security group:

    • Outbound rule: Add the IP address of the data source that the exclusive resource group for Data Integration needs to access as the authorization object.

    • Inbound rule: Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated as the authorization object.

    For more information, see Add the EIP or CIDR block of an exclusive resource group for Data Integration to the whitelist of a data source.

  3. Add a route that points to the CIDR block of the data source for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.

Data sources that do not belong to Alibaba Cloud

  • Data source in a data center

  • Cloud database that does not belong to Alibaba Cloud

数据库不在阿里云

Scenario 4: Establish a network connection between an exclusive resource group for Data Integration and a data source that resides in a data center

Internet

-

公网

The exclusive resource group for Data Integration can directly connect to the data source that is accessible over the Internet.

-

Note

If the data source is configured with an IP address whitelist, you must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated or the EIP of the exclusive resource group for Data Integration to the IP address whitelist regardless of the scenario in which you want to synchronize data. For information about how to obtain the CIDR block or IP address that must be added to the IP address whitelist, see Configure an IP address whitelist.

Step 2: Configure the IP address whitelist of the data source

If the data source is configured with an IP address whitelist, you must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated or the EIP of the exclusive resource group for Data Integration to the IP address whitelist regardless of the scenario in which you want to synchronize data.

  • If you want to synchronize data over a VPC, you must add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the data source.

  • If you want to synchronize data over the Internet, you must add the EIP of the exclusive resource group for Data Integration to the IP address whitelist of the data source.

You can use one of the following methods to obtain the CIDR block or IP address that must be added to the IP address whitelist:

  • If you want to use an exclusive resource group for Data Integration to run a node to synchronize data from a data source over a VPC, you must add the CIDR block of the vSwitch to which the exclusive resource group is bound to an IP address whitelist of the data source. To obtain and add the CIDR block of the vSwitch to which the resource group is bound to an IP address whitelist of the data source, perform the following operations:
    On the Exclusive Resource Groups tab of the DataWorks console, find the desired exclusive resource group for Data Integration and click Network Settings in the Actions column to view the CIDR block of the vSwitch to which the resource group is bound. Then, add the CIDR block to the IP address whitelist of the data source. View the CIDR block of the vSwitch to which the resource group is bound
  • If you want to use an exclusive resource group for Data Integration to run a node to synchronize data from a data source over the Internet, add the EIP of the exclusive resource group to an IP address whitelist of the data source. To obtain and add the EIP of the exclusive resource group for Data Integration to an IP address whitelist of the data source, perform the following operations:
    On the Exclusive Resource Groups tab of the DataWorks console, find the exclusive resource group for Data Integration whose EIP you want to view and click View Information in the Actions column. In the Exclusive Resource Groups dialog box, copy the EIP. Then, add the copied EIP to the IP address whitelist of the data source. View the EIP of the exclusive resource group for Data Integration
    Note If you upgrade the configuration of the exclusive resource group for Data Integration, you must check whether the EIP of the resource group changes. If the EIP of the resource group changes, add the new EIP to the IP address whitelist of the data source after the configuration upgrade. This ensures the normal running of your synchronization node.

Sample configurations for different scenarios

The first three scenarios that are described in this section demonstrate how to establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance over a VPC. For information about how to obtain the VPC information of an ApsaraDB RDS instance, see Change the VPC and vSwitch.

Note

In the following scenarios, the exclusive resource group for Data Integration is associated with a basic security group. For information about basic security groups, see Overview.

Scenario 1: Establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance that belong to the same Alibaba Cloud account and reside in the same region

Instruction on establishing a network connection

Illustration

  1. Associate the exclusive resource group for Data Integration with the VPC in which the ApsaraDB RDS instance resides.

  2. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the ApsaraDB RDS instance.

同账号同地域

Scenario 2: Establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance that belong to the same Alibaba Cloud account but reside in different regions

Instruction on establishing a network connection

Illustration

  1. Establish a network connection between the region in which the exclusive resource group for Data Integration resides and the region in which the ApsaraDB RDS instance resides.

    Use a network connection tool, such as a CEN instance or VPN gateway, to establish a network connection between the VPC in which the ApsaraDB RDS instance resides and a VPC (referred to as VPC 1) in the region in which the exclusive resource group for Data Integration resides.

  2. Establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB RDS instance.

    1. Associate the exclusive resource group for Data Integration with VPC 1.

    2. Add a route that points to the CIDR block of the ApsaraDB RDS instance for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.

  3. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the ApsaraDB RDS instance.

同账号不同地域

Scenario 3: Establish a network connection between an exclusive resource group for Data Integration and an ApsaraDB RDS instance that belong to different Alibaba Cloud accounts

Instruction on establishing a network connection

Illustration

  1. Establish a network connection between the Alibaba Cloud accounts.

    Use a network connection tool, such as a CEN instance or VPN gateway, to establish a network connection between the VPC in which the ApsaraDB RDS instance resides and a VPC (referred to as VPC 1) within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs.

  2. Establish a network connection between the exclusive resource group for Data Integration and the ApsaraDB RDS instance.

    1. Associate the exclusive resource group for Data Integration with VPC 1.

    2. Add a route that points to the CIDR block of the data source for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.

  3. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the ApsaraDB RDS instance.

不同账号

Scenario 4: Establish a network connection between an exclusive resource group for Data Integration and a data source that resides in a data center

If the data source that you want to use does not belong to Alibaba Cloud, you can refer to this scenario to establish a network connection between the data source and the resource group that you want to use.

  1. Establish a network connection between the network environment in which the data source resides and Alibaba Cloud.

    Use an Express Connect circuit to establish a network connection between the network environment in which the data source resides and a VPC within the Alibaba Cloud account to which the exclusive resource group for Data Integration belongs.

  2. Establish a network connection between the exclusive resource group for Data Integration and the data source.

    1. Associate the exclusive resource group for Data Integration with the VPC that is connected to the data source.

    2. Add a route that points to the CIDR block of the data source for the exclusive resource group for Data Integration in the DataWorks console. For more information, see General reference: Add a route.

  3. Add the CIDR block of the vSwitch with which the exclusive resource group for Data Integration is associated to the IP address whitelist of the data source.

What to do next

Configure a synchronization task. For information about the capabilities supported by the full and incremental synchronization feature, batch synchronization feature, and real-time synchronization feature, see the following topics: