When you synchronize data, you must select a network connectivity solution based on the network where your data source resides. This way, the corresponding resource groups can connect to the data source over the network. This topic describes the network connectivity solutions that are available when data sources are deployed on different types of networks.

Overview

Network connectivity solutionsAs shown in the preceding figure, network connectivity solutions vary depending on the network environment of the data source and the type of resource group to be used to run sync nodes.
  • Exclusive resource groups for Data Integration:

    An exclusive resource group for Data Integration can connect to a data source that is accessible over the Internet or deployed in a virtual private cloud (VPC) or an Internet data center (IDC). However, an exclusive resource group for Data Integration cannot connect to a data source that is deployed on the classic network. For more information, see Use an exclusive resource group for Data Integration.

  • Public resource groups for Data Integration:

    A public resource group for Data Integration can connect to a data source that is accessible over the Internet or deployed on the classic network. However, a public resource group for Data Integration cannot connect to a data source that is deployed in a VPC or an IDC. For more information, see Use public resource groups for Data Integration.

  • Custom resource groups for Data Integration:

    A custom resource group can directly connect to a data source that is accessible over the Internet or deployed on the same network as the custom resource group. If a data source and a custom resource group are deployed on different networks, the custom resource group cannot directly connect to the data source. For more information, see Use custom resource groups.

Use an exclusive resource group for Data Integration

Exclusive resource groups are deployed in a VPC where DataWorks is hosted. Exclusive resource groups are disconnected from other network environments. To use an exclusive resource group, you must configure network settings to associate the exclusive resource group with a VPC that can connect to data sources. This way, the exclusive resource group can connect to the data sources over the VPC.

Note
Network environment of the data source Network connectivity solution Instructions on connectivity configurations
The data source is accessible over the Internet. Exclusive resource groups for Data Integration can directly connect to the data source. Exclusive resource groups For more information about exclusive resource groups for Data Integration, see Exclusive resource groups for data integration.
Notice Pay attention to the Internet traffic cost. For more information, see Internet traffic generated by Data Integration.
The data source is deployed in a VPC, and the VPC and the DataWorks workspace are in the same region. The following figure shows a network connectivity solution.VPC
  • If the DataWorks workspace and the VPC where the data source resides are in the same zone, bind the exclusive resource group for Data Integration to the VPC.
    Note You can bind the resource group to an arbitrary vSwitch in the VPC. After the resource group is bound to a vSwitch, the system automatically adds a route to the entire VPC. This way, the resource group can connect to the data source.
  • If the DataWorks workspace and the VPC where the data source resides are in different zones, bind the exclusive resource group for Data Integration to the VPC and add a route between the zones in the DataWorks console. For more information, see Add a route.
The data source is deployed in a VPC, and the VPC and the DataWorks workspace are in different regions. The following figure shows a network connectivity solution.Access to a data store in another VPC
  1. Bind the exclusive resource group for Data Integration to a VPC.
    1. Create a VPC in the region where the DataWorks workspace resides.
    2. Bind the exclusive resource group for Data Integration to the VPC.
  2. Connect the resource group to the data source.
    1. Connect the VPC that is created in the previous step to the VPC where the data source resides by using Express Connect circuits or VPN gateways.
    2. Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the VPC where the data source resides. For more information, see Add a route.
The data source is deployed in an IDC. The following figure shows a network connectivity solution.IDC
  1. Bind the exclusive resource group for Data Integration to a VPC.
    1. Create a VPC in the region where the DataWorks workspace resides.
    2. Bind the exclusive resource group for Data Integration to the VPC.
  2. Connect the resource group to the data source.
    1. Connect the VPC that is created in the previous step to the IDC where the data source resides by using Express Connect circuits or VPN gateways.
    2. Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the IDC where the data source resides. For more information, see Add a route.
The data source is deployed on the classic network.
Exclusive resource groups for Data Integration cannot connect to the data source.
Note We recommend that you migrate the data source to a VPC and do not use the classic network of Alibaba Cloud.

Use public resource groups for Data Integration

Public resource groups compose a public resource pool. Nodes that use resources in the public resource pool may not be run as scheduled due to insufficient resources. If you need your nodes to be run as expected, use an exclusive resource group. For more information, see Exclusive resource groups for data integration and Create and use an exclusive resource group for scheduling.
Note
  • For more information about public resource groups for Data Integration, see Public resource groups.
  • After the connectivity is configured, check whether the data source is configured with a whitelist. If yes, you must add the classless inter-domain routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure whitelists.
  • If you use a self-managed data source that is deployed on an ECS instance, you must configure a security group. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
Network environment of the data source Network connectivity solution Instructions on connectivity configurations
The data source is accessible over the Internet or deployed on the classic network. Public resource groups for Data Integration can directly connect to the data source. Public resource groups
Note We recommend that you migrate the data source to a VPC and do not use the classic network of Alibaba Cloud.
For more information about public resource groups, see Public resource groups.
Notice Pay attention to the Internet traffic cost. For more information, see Internet traffic generated by Data Integration.
The data source is deployed in a VPC, and the VPC and the DataWorks workspace are in the same region.
Public resource groups for Data Integration cannot connect to the data source.
Note We recommend that you use exclusive resource groups for Data Integration.
The data source is deployed in a VPC, and the VPC and the DataWorks workspace are in different regions.
Public resource groups for Data Integration cannot connect to the data source.
Note We recommend that you use exclusive resource groups for Data Integration.
The data source is deployed in an IDC.
Public resource groups for Data Integration cannot connect to the data source.
Note We recommend that you use exclusive resource groups for Data Integration.

Use custom resource groups

DataWorks supports custom resource groups. If you have surplus server resources, you can configure the resources as a custom resource group and run sync nodes on the resource group.
Notice
  • You must activate DataWorks Professional Edition to use custom resource groups. For more information about custom resource groups, see Custom resource groups.
  • After the connectivity is configured, check whether the data source is configured with a whitelist. If yes, you must add the classless inter-domain routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure whitelists.
  • If you use a self-managed data source that is deployed on an ECS instance, you must configure a security group. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
Network environment of the data source Network connectivity solution Instructions on connectivity configurations
The data source is accessible over the Internet. Custom resource groups can directly connect to the data source. Internet For more information about custom resource groups, see Create a custom resource group for Data Integration.
Notice Pay attention to the Internet traffic cost. For more information, see Internet traffic generated by Data Integration.
The data source and custom resource group share the same IP address on the classic network or are in the same VPC or IDC. Custom resource groups can directly connect to the data source. Same network
The data source and resource group use different IP addresses on the classic network or are in different VPCs or IDCs. The following figure shows a network connectivity solution.Different networks Connect the custom resource group to the data source by using Express Connect circuits or VPN gateways.

Additional information

  • The following services may be involved in network connectivity solutions:
  • View the resource group on which a sync node is run.
    • If the logs contain information that is similar to the following example, the sync node is run on the default resource group:
      running in Pipeline[basecommon_ group_xxxxxxxxx]
      - If RDS databases are involved, an OXS cluster is used to run the sync node. The logs are in the format of running in Pipeline[basecommon_ group_xxx_oxs].
      - If RDS databases are not involved, an Elastic Compute Service (ECS) cluster is used to run the sync node. The logs are in the format of running in Pipeline[basecommon_ group_xxx_ecs].
    • If the logs contain information that is similar to the following example, the sync node is run on an exclusive resource group for Data Integration:
      running in Pipeline[basecommon_S_res_group_xxx]
    • If the logs contain information that is similar to the following example, the sync node is run on a custom resource group for Data Integration:
      running in Pipeline[basecommon_xxxxxxxxx]

What to do next

  1. After you select an appropriate network connectivity solution, you can refer to the corresponding instructions on connectivity configurations to connect a resource group to a data source.
  2. After the connectivity is configured, check whether the data source is configured with a whitelist. If yes, you must add the classless inter-domain routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure whitelists.
  3. If you use a self-managed data source that is deployed on an ECS instance, you must configure a security group. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.