All Products
Search
Document Center

DataWorks:Network connectivity solutions

Last Updated:Mar 13, 2025

To ensure the smooth operation of data synchronization and scheduling tasks in DataWorks, a network connection must be established between the virtual private cloud (VPC) associated with your resource group and the data source you want to access. This topic describes network connectivity solutions for data sources deployed across various network environments.

Background information

In DataWorks tasks such as data synchronization, development, and scheduling, if the data source to be accessed is outside the VPC linked to the current DataWorks resource group (for example, in another VPC or IDC), an appropriate network connectivity solution must be selected based on the network conditions to connect the VPC of the DataWorks resource group to the network where the data source resides. For more information, see the referenced document.

For instance, during data synchronization, a network connection must be established between the VPC linked to the resource group and both the data source and destination.

image

Prerequisites

You have purchased a resource group with the required specifications. For more information, see add and use serverless resource groups.

Note
  • For more information about resource groups, see DataWorks resource group overview.

  • The network connectivity solutions discussed here apply only to Serverless resource groups, along with the exclusive resource group for Data Integration, exclusive resource group for scheduling, and exclusive resource group for DataService Studio within legacy resource groups.

Usage notes

  • You can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Data source on the Internet section in this topic.

  • The speed and stability of tasks over a public network cannot be guaranteed. It is advisable to synchronize through an internal network or Cloud Enterprise Network (CEN).

  • Establishing network connectivity between the resource group and the data source is a prerequisite for the successful execution of subsequent tasks.

  • Network connections cannot be made between a resource group and data sources in the classic network. If the data source or business you need to access is in the classic network, migrating to a VPC is recommended.

Configure network connectivity

Note

If you are unable to establish a connection between the resource group and the data source by following the guidance in this topic due to a complex network environment, please submit a ticket for assistance.

Step 1: associate a resource group with a VPC

The selection of a network connectivity solution depends on the relationship between the data source and the DataWorks workspace resource group, including the following scenarios:

Network Selection

Data Source Environment

Relationship between Data Source and DataWorks Workspace

General Logic for Network Connectivity

Configuration Example

VPC (Internal Network)

On Alibaba Cloud

  • Hosted on Elastic Compute Service (ECS) instances

  • Alibaba Cloud services

Same Alibaba Cloud account and same region

Link the resource group to the VPC where the data source is deployed.

Scenario 1: Database and DataWorks Workspace in the Same Alibaba Cloud Account and Region

  • Different Alibaba Cloud accounts

  • Different regions

  1. First, use network connectivity tools (Cloud Enterprise Network (CEN), Express Connect, VPN Gateway) to establish a network connection between the region where the data source resides and the region where the DataWorks workspace is located, or between the accounts where they reside.

  2. Link the resource group to the VPC that is connected to the data source in the current Alibaba Cloud account.

    Note

    If an advanced security group is selected when linking the resource group to a VPC, navigate to the security group management page after the association and add a security policy for the advanced security group as follows:

    • Outbound rule: Add the IP address of the data source that needs to be accessed by the resource group.

    • Inbound rule: Add the CIDR block of the vSwitch associated with the resource group.

  3. Add a custom route for the resource group pointing to the IP address of the data source. For more information, see General reference: Add a route.

Not on Alibaba Cloud

  • Data sources or businesses in data centers

  • Data sources in other cloud environments

Scenario 4: Database in a Data Center

Public Network

On the Internet

  • Serverless resource groups do not have public access capabilities by default. To enable public access to the data source, configure an Internet NAT gateway and EIP for the associated VPC.

  • Old-version resource groups have public access capabilities and can connect directly.

Scenario 5: Data Source on the Internet

Step 2: Configure the IP address whitelist of the data source

In any of the above scenarios, if the data source has IP address whitelist access control, you must add the CIDR block of the vSwitch associated with the resource group, the EIP of the old-version resource group itself, or the EIP configured for the VPC associated with the serverless resource group to its IP address whitelist.

  • Access over an internal network

    Add the CIDR block of the vSwitch associated with the resource group to the data source's IP address whitelist.

    In the DataWorks console, navigate to the Resource Group page, which is located under the Exclusive Resource Group tab. Click on Network Settings adjacent to the desired resource group to check the Vswitch CIDR Block and include it in the IP address whitelist for the data source.

  • Access over the Internet

    • Serverless resource group: Add the EIP configured for the VPC associated with the serverless resource group to the data source's IP address whitelist.

      In the Internet NAT gateway console, locate the configured SNAT entry to retrieve the public IP address linked to the vSwitch.

      image

    • Old-version resource group: Add the EIP of the resource group itself to the data source's IP address whitelist.

      On the resource group list page, under the Exclusive Resource Group tab, you can click Details in the Actions column next to the desired resource group. This will take you to the Resource Group Details page where you can retrieve the EIP Address in the Basic Information section.

Note

If you scale out the resource group in subsequent operations, ensure to check if the EIP changes. If there is a change, promptly add the updated EIP to the data source's IP address whitelist to maintain task functionality.

Step 3: test connectivity

Note

If your data source is hosted on an ECS instance, open the ICMP protocol port and the ports for the data source service in the security group to permit access from the DataWorks resource group VPC CIDR block or the public IP linked to the VPC. Otherwise, the connectivity test may fail.

  • If the resource group requires access to a data source supported by DataWorks, you can verify connectivity by adding the DataWorks data source.

    1. Go to the Data Integration page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

    2. In the left-side navigation pane, click Data Sources. Next, from the list of data sources, click Add Data Source. Choose the appropriate data source for your needs and configure the necessary connection parameters.

    3. In the resource group list at the bottom, select the purchased resource group and click Test Connectivity .

      Note

      If the connectivity test result is Failed, you can troubleshoot the issue using the Connectivity Diagnostic Tool. Should the connection between the resource group and the data source still not be established, please or submit a ticket for further assistance.

  • If the data source that needs to be accessed by the resource group is a service deployed in another network, test the connectivity with the data source in the business code as per your actual situation.

    Note

    If your service is deployed on an ECS instance, configure the security group to allow access from the CIDR block of the vSwitch associated with the resource group or the public IP configured for the VPC linked to the resource group.

Sample configurations for different scenarios

This section provides configuration examples for establishing network connections between a resource group and data sources deployed in various network environments, using an ApsaraDB RDS database or a self-managed database in a data center or on the Internet as the data source.

Note

The following examples are for scenarios where the resource group is associated with a basic security group.

Scenario 1: database and DataWorks workspace in the same Alibaba Cloud account and region

Network connectivity configuration operations

Configuration operation illustration

  1. Associate the resource group with the VPC in which the data source is deployed.

  2. Configure the IP address whitelist of the data source: Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

同账号同地域

Scenario 2: database and DataWorks workspace in the same Alibaba Cloud account but different regions

Network connectivity configuration operations

Configuration operation illustration

  1. Establish a network connection between the two regions.

    Use Cloud Enterprise Network (CEN) or VPN Gateway to connect the VPCs in the two regions.

  2. Establish a network connection between the data source and the resource group.

    1. Associate the resource group with the VPC that is connected to the database in the current account.

    2. Add a route in the console to connect to the target data source IP address segment. For more information, see General reference: Add a route.

  3. Configure the IP address whitelist of the data source: Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

同账号不同地域

Scenario 3: database and DataWorks workspace in different Alibaba Cloud accounts

Network connectivity configuration operations

Configuration operation illustration

  1. Establish a network connection between the two Alibaba Cloud accounts.

    Use Cloud Enterprise Network (CEN) or VPN Gateway to connect the VPCs in the two Alibaba Cloud accounts.

  2. Establish a network connection between the data source and the resource group.

    1. Associate the resource group with the VPC that is connected to the data source in the current account.

    2. Add a route in the console to connect to the target data source IP address segment. For more information, see General reference: Add a route.

  3. Configure the IP address whitelist of the data source: Add the CIDR block of the vSwitch with which the resource group is associated to the IP address whitelist of the data source.

不同账号

Scenario 4: database in a data center

Refer to this scenario for configuration if the data source is outside the Alibaba Cloud environment.

  1. Create a network connection between the two environments.

    Connect the data center to the Alibaba Cloud VPC using Express Connect.

  2. Connect the data source to the resource group.

    1. Link the resource group to the VPC that is connected to the database within the current account.

    2. In the console, add a route to connect to the target data source IP segment. For more information, see General reference: Add a route.

  3. Configure the data source's IP address whitelist: Add the CIDR block of the vSwitch associated with the resource group to the whitelist.

Scenario 5: Data source on the Internet

Note
  • This solution is only suitable for serverless resource groups. EIPs are automatically associated with old-version resource groups.

  • Serverless resource groups do not have public access capabilities by default. To access data sources or networks over the Internet, configure an Internet NAT gateway and EIP for the VPC associated with the serverless resource group.

Network connectivity configuration operations

Configuration operation illustration

  1. Configure an Internet NAT gateway for the VPC and vSwitch with which the resource group is associated. For specific operations, see Use the SNAT function of an Internet NAT gateway to access the Internet.

  2. Configure the IP address whitelist of the data source to allow the public IP address associated with the VPC and vSwitch to access the database.

  3. Add the data source to the workspace, fill in the public endpoint of the data source, and test the network connectivity.

幻灯片5

References