All Products
Search
Document Center

DataWorks:Solution 6: Network connectivity for public network data sources

Last Updated:Jun 30, 2026

This topic describes how to connect DataWorks to a data source over the internet, using a MySQL instance with a public endpoint as an example.

Use case

This solution is recommended if your data source meets the following condition:

  • The data source has a public endpoint.

How it works

  • Serverless resource groups cannot access the public network by default. You must configure an Internet NAT Gateway and EIPs for the VPC attached to the resource group before you can access data sources over the Internet.

  • Legacy resource groups have public network access by default and can connect directly.

    Note

    Legacy resource groups are being phased out. We recommend that you use serverless resource groups.

Network connectivity diagram

Slide 9 illustrating network connectivity for public data sources

Alibaba Cloud uniformly maintains and allocates the egress IP address for each resource group. You cannot modify or customize the egress IP address of a resource group by using a proxy IP address or other methods. Even with proxy settings configured in your Python code or other programs, DataWorks tasks still use the IP address that is allocated to the resource group at runtime. To access geo-restricted data sources, use network products like VPN Gateway or Express Connect to establish cross-border connections.

Prerequisites

If your data source is not an Alibaba Cloud database, such as a self-managed database or a third-party cloud database like TencentDB for MySQL or PostgreSQL, you can choose any region for your DataWorks workspace. You can choose any region within the Chinese mainland, and there is no strict requirement for region consistency. We recommend that you select a region that is geographically close to your data source to minimize network latency.

Billing

To enable internet access for a serverless resource group, you must configure an Internet NAT Gateway and bind an EIP for the VPC in which the resource group resides. For more information about Internet NAT Gateway and EIP billing, see NAT Gateway billing and EIP billing overview.

Configure network connectivity

Note

The following steps outline the general procedure for establishing network connectivity and explain the core logic. For more configuration details, this topic also provides a specific configuration example for your reference.

Step 1: Get basic information

Data source side

  • Public IP address of the server where the data source resides

    Obtain the public IP address of the server that hosts the data source, or ask your network administrator for it.

DataWorks side

Serverless resource group

VPC and vSwitch information associated with the resource group

  1. Go to the DataWorks Resource Group List page, find the target resource group, and click Network Settings in the Operation column.

  2. View the associated VPC and vSwitch under the corresponding feature module.

    For example, if you want to connect a MySQL instance with a public endpoint to DataWorks for data synchronization, view the corresponding VPC and vSwitch information under Task Scheduling & Data Integration.

    Go to the Network Settings page of the resource group, click the VPC Binding tab, and view the associated VPC information in the Bound VPC column under the corresponding feature module.

Legacy exclusive resource group

EIP address of the resource group

  1. Go to the DataWorks Resource Group List page, find the target resource group, and click Details in the Operation column to go to the resource group details page.

  2. Obtain the EIP address.

    In the Basic Information panel of the Data Integration resource group, find the EIP Address field and record the address. If you need to transfer data over the internet, add this EIP to the allowlist on the data source side.

Step 2: Establish network connectivity

  • Serverless resource group: A serverless resource group does not have internet access by default. You must configure an Internet NAT Gateway and bind an EIP for the VPC associated with the resource group before it can access data sources over the internet.

  • Legacy exclusive resource group: A legacy exclusive resource group has internet access by default and can connect directly.

Note

If you encounter issues during network configuration, submit a ticket to contact Alibaba Cloud technical support.

Note

If the existing NAT gateway already has DNAT entries configured and these DNAT entries conflict with the SNAT requirements for DataWorks internet access (for example, using the same EIP for both DNAT and SNAT causes a conflict), but the DNAT entries cannot be deleted due to other business dependencies, you can create a separate SNAT entry on the current NAT gateway to resolve the issue. When creating the SNAT entry, configure the parameters as described in "Step 2: Establish network connectivity" in this topic to ensure that the VPC where the serverless resource group resides can access external data sources through the new SNAT entry.

Step 3: (Optional) Add IP addresses to the allowlist

If the data source has allowlist controls, add the public IP address associated with the resource group to the data source allowlist to allow access from the resource group.

This topic uses MySQL IP allowlist configuration as an example, where a specific user is allowed to access the database only from the public IP address associated with the resource group.

Important

Note the following when configuring the allowlist:

  • When a serverless resource group accesses a data source over the internet, the egress IP is the EIP bound to the NAT gateway, not the CIDR block IP of the vSwitch where the resource group resides. You must add this EIP address to the security group or firewall allowlist on the data source side, instead of only configuring the vSwitch CIDR block IP.

  • If the allowlist becomes invalid due to a source IP change, check whether the DNS resolution or routing policy correctly maps to the EIP of the NAT gateway. Common causes include: the VPC route table is not properly configured, the SNAT entry of the NAT gateway has not taken effect, or multiple NAT gateway instances cause an uncertain egress IP.

  • The egress IP of a legacy exclusive resource group is the EIP bound to the resource group itself. Obtain this address from the resource group details page and add it to the data source allowlist.

  1. Log on to the database as an administrator.

  2. Create an account for DataWorks to access the data source and configure the required permissions.

    -- "dataworks_user" is the username, which you can customize.
    -- "StrongPassword123!" is the user password, which you can customize.
    CREATE USER 'dataworks_user'@'<Public IP address bound to the resource group>' IDENTIFIED BY 'StrongPassword123!';
    -- Grant the user access to the specified database (e.g., mydatabase) from the public IP address bound to the resource group.
    GRANT ALL PRIVILEGES ON mydatabase.* TO 'dataworks_user'@'<Public IP address bound to the resource group>' WITH GRANT OPTION;
  3. Run the FLUSH PRIVILEGES; command to refresh permissions, and then exit the database (exit).

Verify network connectivity

  1. Log on to the DataWorks console. In the target region, click Data Integration > Data Integration in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Data Sources. On the data source list page, click the button to add a data source, select the data source type, and configure the connection parameters as needed.

  3. In the Connection Configuration section, select the resource group that has network connectivity with the data source, and click Test Connectivity next to the target resource group. If a green check mark and Connectable are displayed, the network connectivity between the resource group and the data source is normal.

    Note

    If the connectivity test result is Not Connectable, use the Network Connectivity Diagnostic Tool to troubleshoot the issue. If you still cannot connect the resource group to the data source, submit a ticket.

Configuration example

This section uses a MySQL instance with a public endpoint and a DataWorks workspace in the China (Shanghai) region as an example to describe how to configure network connectivity.

1. Basic information

Parameter

Data source (RDS MySQL)

DataWorks resource group

Region

-

China (Shanghai)

Network information

  • Public IP address: 47.117.XX.XX

  • VPC name: vpc-shanghai

  • vSwitch: sh-l

On the serverless resource group details page, click the VPC Binding tab. In the Task Scheduling & Data Integration section, verify the associated Bound VPC (such as vpc-uf***jvg), vSwitch, and security group information. If no VPC is associated, click Add Binding to add one.

2. Establish network connectivity

This procedure applies only to serverless resource groups. It uses an Internet NAT Gateway to enable internet access for the VPC associated with the resource group. Legacy resource groups already have a bound EIP by default and do not require this configuration.

Note

If you encounter issues during network configuration, submit a ticket to contact Alibaba Cloud technical support.

  1. Go to the DataWorks Resource Group List page, find the target resource group, and click Network Settings in the Operation column.

  2. Under the corresponding feature module, find the associated VPC and click the image icon next to the VPC to go to the VPC Basic Information page.

    For example, if you want to connect a MySQL instance with a public endpoint to DataWorks for data synchronization, find the corresponding VPC under Task Scheduling & Data Integration and click the image icon next to the VPC.

  3. Switch to the Resource management tab. In the Internet Access Service section, click Create under Internet NAT Gateway to enable internet access for the VPC associated with the resource group.

    Configure the following key parameters:

    Parameter

    Value

    VPC

    Use the same VPC and vSwitch that are associated with the resource group.

    Associated vSwitch

    Access mode

    Select Full VPC mode (SNAT).

    EIP Instance

    Select Purchase New EIP.

    Associated Role

    When you create a NAT gateway for the first time, you must create a service-linked role. Click Create Associated Role.

  4. Click Buy Now, complete the payment, and create the NAT gateway instance.

    After the purchase is complete, the page displays a success message for the NAT gateway instance purchase, along with the results of three resource operations: Create EIP (created), Create NAT Gateway (created), and Bind EIP (bound).

  5. After the NAT gateway instance is purchased, click the button to return to the console and create an SNAT entry for the NAT gateway instance.

    Note

    Only after an SNAT entry is configured can resource groups using this VPC access the internet.

    SNAT entries support four granularity levels: VPC granularity, vSwitch granularity, ECS/ENI granularity, and custom CIDR block granularity. The vSwitch granularity only covers the selected vSwitches. If the DataWorks resource group is deployed on a vSwitch that is not selected, it cannot access the internet through this SNAT entry. We recommend that you use VPC granularity to configure SNAT entries. With this granularity, the source CIDR block is 0.0.0.0/0, which covers all CIDR blocks in the VPC and ensures that resource groups on all vSwitches can access the internet through the NAT gateway.

    1. Click Management in the Operation column of the newly purchased instance to go to the management page of the target NAT gateway instance, and switch to the SNAT Management tab.

    2. In the SNAT Entry List section, click Create SNAT Entry to create a NAT entry. The following are the key configurations:

      Parameter

      Value

      SNAT entry granularity

      Select VPC granularity to ensure that all resource groups in the VPC of the NAT gateway can access the internet through the configured EIP.

      Select EIP address

      Configure the EIP address bound to the current NAT gateway instance.

      After you configure the SNAT entry parameters, click OK to create the SNAT entry.

    In the SNAT Entry List section, when the Status of the newly created SNAT entry changes to Enabled, the VPC associated with the resource group has internet access.

    If you previously configured a vSwitch-granularity SNAT entry, you can delete it after creating a VPC-granularity SNAT entry. The VPC-granularity SNAT entry has a source CIDR block of 0.0.0.0/0, which already covers all vSwitch CIDR blocks in the VPC. The old vSwitch-granularity entry no longer needs to be retained, and deleting it does not affect the VPC-granularity SNAT entry.

3. Add IP addresses to the allowlist

  1. Obtain the public IP address associated with the resource group.

    Serverless resource group

    1. Go to the VPC console. In the left-side navigation pane, click NAT Gateway > Internet NAT Gateway to go to the Internet NAT Gateway list page.

    2. Find the Internet NAT Gateway that you created and view the EIP column to obtain the EIP address.

    Legacy resource group

    1. Go to the DataWorks Resource Group List page, find the target resource group, and click Details in the Operation column to go to the resource group details page.

    2. Obtain the EIP address.

  2. Log on to the database as an administrator.

  3. Create an account for DataWorks to access the data source and configure the required permissions.

    -- "dataworks_user" is the username, which you can customize.
    -- "StrongPassword123!" is the user password, which you can customize.
    CREATE USER 'dataworks_user'@'<Public IP address bound to the resource group>' IDENTIFIED BY 'StrongPassword123!';
    -- Grant the user access to the specified database (e.g., mydatabase) from the vSwitch CIDR block bound to the resource group.
    GRANT ALL PRIVILEGES ON mydatabase.* TO 'dataworks_user'@'<Public IP address bound to the resource group>' WITH GRANT OPTION;
  4. Run the FLUSH PRIVILEGES; command to refresh permissions, and then exit the database (exit).

4. Test connectivity

  1. Log on to the DataWorks console. In the target region, click Data Integration > Data Integration in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Data Sources to go to the Data Sources page, and then click Add Connection.

  3. Select the MySQL data source type and configure the data source connection information.

    • Set Configuration Mode to User-created Data Store with Public IP Addresses.

    • Set Host Address to the public IP address of the server where MySQL resides (in this example, 47.117.XX.XX).

    • Set Port Number to 3306.

    • Set Database Name to the name of an existing database.

    • Set Username and Password to the dataworks_user user and password created in the Add IP addresses to the allowlist step.

  4. In the Connection Configuration section, click Test Connectivity next to the resource group associated with the workspace and check whether the result is Connectable.

    Note

    If the test result shows Failed, use the Network Connectivity Diagnostic Tool to troubleshoot. If connectivity still fails, submit a ticket for assistance.

FAQ

What do I do if a serverless resource group times out when connecting to a public FTP data source?

A serverless resource group does not have internet access by default. If a connection timeout occurs when you access a public FTP data source, follow these steps to troubleshoot:

  1. Verify that a NAT gateway and SNAT entry have been configured for the VPC where the serverless resource group resides. If not, complete the NAT gateway binding and SNAT entry creation as described in "Step 2: Establish network connectivity" in this topic.

  2. Check whether the NAT gateway is bound to the VPC and vSwitch actually used by the serverless resource group. Ensure that the binding relationship is consistent with the resource group network settings.

  3. In the VPC route table, add a route entry for the CIDR block corresponding to the FTP server IP address. For example, if the FTP server address is 121.4.x.x, add a route entry with the destination CIDR block 121.4.0.0/16 and set the next hop to the NAT gateway instance.

  4. Check the allowlist restrictions on the FTP server side and add the egress IP of the serverless resource group (the EIP bound to the NAT gateway) to the FTP server access allowlist.

Why does the table search feature fail to load the list but scripts can run when I connect to PostgreSQL over the internet?

When you connect to a PostgreSQL data source over the internet, the table search feature in the console requires multiple interactions to retrieve metadata, which requires high network stability. When the public network environment is unstable or has transmission limitations, metadata retrieval interactions may time out or fail, causing the table search to fail to load the list. In contrast, script mode directly submits and runs synchronization tasks without relying on multiple metadata retrieval interactions, and is therefore not affected by public network instability.

We recommend the following solutions:

  • Use a private network connection preferentially: Access the PostgreSQL data source through a VPC connection to avoid issues caused by public network instability. For more information about network solutions, see Overview of network connectivity solutions.

  • Use the SDK to create tasks: If you cannot switch to a private network connection at the moment, create data synchronization tasks by using the DataWorks OpenAPI or SDK to avoid the limitations of the table search feature in the console.

What do I do if the DataWorks console for an overseas region is inaccessible from a Chinese mainland network?

When you access the DataWorks console for an overseas region (such as US (Silicon Valley)) from the Chinese mainland, the page may fail to load or load slowly due to cross-border network latency or restrictions. We recommend that you follow these steps:

  1. Try switching to a different computer or network environment (such as a mobile hotspot) to rule out local network issues.

  2. If the issue is confirmed to be caused by cross-border network conditions, contact your company's network administration team to adjust network policies to allow access to the DataWorks console in the overseas region.

  3. If you need long-term stable access to the console of an overseas region, we recommend that you use Alibaba Cloud Global Accelerator (GA) to optimize cross-border access. Global Accelerator provides low-latency, highly available cross-border network acceleration.

Can I use a proxy IP to disguise a resource group as an overseas IP?

No. Alibaba Cloud uniformly maintains and allocates the egress IP address for each resource group. You cannot modify or customize the egress IP address of a resource group by using a proxy IP address or other methods. Even with proxy settings configured in your Python code or other programs, DataWorks tasks still use the IP address that is allocated to the resource group at runtime. To access geo-restricted data sources, use network products like VPN Gateway or Express Connect to establish cross-border connections.