To ensure the smooth operation of data synchronization and scheduling tasks in DataWorks, a network connection must be established between the virtual private cloud (VPC) associated with your resource group and the data source you want to access. This topic describes network connectivity solutions for data sources deployed across various network environments.
Background information
In DataWorks tasks such as data synchronization, development, and scheduling, if the data source to be accessed is outside the VPC linked to the current DataWorks resource group (for example, in another VPC or IDC), an appropriate network connectivity solution must be selected based on the network conditions to connect the VPC of the DataWorks resource group to the network where the data source resides. For more information, see the referenced document.
For instance, during data synchronization, a network connection must be established between the VPC linked to the resource group and both the data source and destination.
Prerequisites
You have purchased a resource group with the required specifications. For more information, see add and use serverless resource groups.
-
For more information about resource groups, see DataWorks resource group overview.
-
The network connectivity solutions discussed here apply only to Serverless resource groups, along with the exclusive resource group for Data Integration, exclusive resource group for scheduling, and exclusive resource group for DataService Studio within legacy resource groups.
Usage notes
-
You can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Data source on the Internet section in this topic.
-
The speed and stability of tasks over a public network cannot be guaranteed. It is advisable to synchronize through an internal network or Cloud Enterprise Network (CEN).
-
Establishing network connectivity between the resource group and the data source is a prerequisite for the successful execution of subsequent tasks.
-
Network connections cannot be made between a resource group and data sources in the classic network. If the data source or business you need to access is in the classic network, migrating to a VPC is recommended.
Configure network connectivity
If you are unable to establish a connection between the resource group and the data source by following the guidance in this topic due to a complex network environment, please submit a ticket for assistance.
Step 1: associate a resource group with a VPC
The selection of a network connectivity solution depends on the relationship between the data source and the DataWorks workspace resource group, including the following scenarios:
Network Selection |
Data Source Environment |
Relationship between Data Source and DataWorks Workspace |
General Logic for Network Connectivity |
Configuration Example |
VPC (Internal Network) |
On Alibaba Cloud
|
Same Alibaba Cloud account and same region |
Link the resource group to the VPC where the data source is deployed. |
Scenario 1: Database and DataWorks Workspace in the Same Alibaba Cloud Account and Region |
|
|
|||
Not on Alibaba Cloud
|
||||
Public Network |
On the Internet |
|
Step 2: Configure the IP address whitelist of the data source
In any of the above scenarios, if the data source has IP address whitelist access control, you must add the CIDR block of the vSwitch associated with the resource group, the EIP of the old-version resource group itself, or the EIP configured for the VPC associated with the serverless resource group to its IP address whitelist.
-
Access over an internal network
Add the CIDR block of the vSwitch associated with the resource group to the data source's IP address whitelist.
In the DataWorks console, navigate to the Resource Group page, which is located under the Exclusive Resource Group tab. Click on Network Settings adjacent to the desired resource group to check the Vswitch CIDR Block and include it in the IP address whitelist for the data source.
-
Access over the Internet
-
Serverless resource group: Add the EIP configured for the VPC associated with the serverless resource group to the data source's IP address whitelist.
In the Internet NAT gateway console, locate the configured SNAT entry to retrieve the public IP address linked to the vSwitch.
-
Old-version resource group: Add the EIP of the resource group itself to the data source's IP address whitelist.
On the resource group list page, under the Exclusive Resource Group tab, you can click Details in the Actions column next to the desired resource group. This will take you to the Resource Group Details page where you can retrieve the EIP Address in the Basic Information section.
-
If you scale out the resource group in subsequent operations, ensure to check if the EIP changes. If there is a change, promptly add the updated EIP to the data source's IP address whitelist to maintain task functionality.
Step 3: test connectivity
If your data source is hosted on an ECS instance, open the ICMP protocol port and the ports for the data source service in the security group to permit access from the DataWorks resource group VPC CIDR block or the public IP linked to the VPC. Otherwise, the connectivity test may fail.
-
If the resource group requires access to a data source supported by DataWorks, you can verify connectivity by adding the DataWorks data source.
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
-
In the left-side navigation pane, click Data Sources. Next, from the list of data sources, click Add Data Source. Choose the appropriate data source for your needs and configure the necessary connection parameters.
-
In the resource group list at the bottom, select the purchased resource group and click Test Connectivity .
NoteIf the connectivity test result is Failed, you can troubleshoot the issue using the Connectivity Diagnostic Tool. Should the connection between the resource group and the data source still not be established, please or submit a ticket for further assistance.
-
If the data source that needs to be accessed by the resource group is a service deployed in another network, test the connectivity with the data source in the business code as per your actual situation.
NoteIf your service is deployed on an ECS instance, configure the security group to allow access from the CIDR block of the vSwitch associated with the resource group or the public IP configured for the VPC linked to the resource group.
Sample configurations for different scenarios
This section provides configuration examples for establishing network connections between a resource group and data sources deployed in various network environments, using an ApsaraDB RDS database or a self-managed database in a data center or on the Internet as the data source.
The following examples are for scenarios where the resource group is associated with a basic security group.
Scenario 1: database and DataWorks workspace in the same Alibaba Cloud account and region
Network connectivity configuration operations | Configuration operation illustration |
|
Scenario 2: database and DataWorks workspace in the same Alibaba Cloud account but different regions
Network connectivity configuration operations | Configuration operation illustration |
|
Scenario 3: database and DataWorks workspace in different Alibaba Cloud accounts
Network connectivity configuration operations | Configuration operation illustration |
|
Scenario 4: database in a data center
Refer to this scenario for configuration if the data source is outside the Alibaba Cloud environment.
-
Create a network connection between the two environments.
Connect the data center to the Alibaba Cloud VPC using Express Connect.
-
Connect the data source to the resource group.
-
Link the resource group to the VPC that is connected to the database within the current account.
-
In the console, add a route to connect to the target data source IP segment. For more information, see General reference: Add a route.
-
-
Configure the data source's IP address whitelist: Add the CIDR block of the vSwitch associated with the resource group to the whitelist.
Scenario 5: Data source on the Internet
-
This solution is only suitable for serverless resource groups. EIPs are automatically associated with old-version resource groups.
-
Serverless resource groups do not have public access capabilities by default. To access data sources or networks over the Internet, configure an Internet NAT gateway and EIP for the VPC associated with the serverless resource group.
Network connectivity configuration operations | Configuration operation illustration |
|
References
-
For more details about resource groups, see DataWorks Resource Group Overview.
-
For guidance on creating and using resource groups, see Add and Use Serverless Resource Groups.
-
For instructions on associating a resource group with a VPC, see Associate a VPC.
-
For steps on configuring an Internet NAT gateway for the VPC and vSwitch associated with the resource group, see Use the SNAT Function of an Internet NAT Gateway to Access the Internet.