DataWorks provides resource groups for Data Integration for you to synchronize data. Before you synchronize data, you must make sure that the resource group for Data Integration that you use is connected to the related data sources. You can select a network connectivity solution based on the network environments of the data sources and the type of resource group that you use. This topic describes the network connectivity solutions that are available when data sources are deployed in different types of network environments.

Data synchronizationBefore you synchronize data, you must connect the data sources to your resource group for Data Integration, as shown in the preceding figure. DataWorks allows you to use exclusive or custom resource groups for Data Integration to synchronize data. You can select a resource group type based on your business scenario.
Resource group type Description Scenario
Exclusive resource group for Data Integration Exclusive resource groups for Data Integration are managed by DataWorks. After you purchase an exclusive resource group for Data Integration, you can use the resources in the resource group in an exclusive manner. For more information, see Create and use an exclusive resource group for Data Integration.
  • You need to enable DataWorks to access a data source that resides in a different network environment from DataWorks over an internal network.
  • A large number of data synchronization nodes in Data Integration must be run in parallel. In this case, you must use exclusive computing resources to ensure fast and reliable data transmission.
Custom resource group for Data Integration A type of resource group for Data Integration that consists of idle servers. For more information about how to create a custom resource group for Data Integration, see Create a custom resource group for Data Integration. If you have idle servers, you can create a custom resource group based on the idle servers to run your nodes. You must make sure that your data sources can be connected to the custom resource group.
Network connectivity solutions vary based on the network environments where your data sources and resource group reside. The following section describes the network connectivity solutions that can be used in different scenarios. For more information, see Overview of network connectivity solutions.

Overview of network connectivity solutions

Network connectivity solutionsNetwork connectivity solutions vary based on the network environments of the data sources and the type of resource group that you use to run data synchronization nodes, as shown in the preceding figure.

Use an exclusive resource group for Data Integration

Exclusive resource groups are deployed in the VPC in which DataWorks is hosted. Exclusive resource groups are disconnected from other network environments. To use an exclusive resource group, you must configure network settings for the exclusive resource group to associate it with a VPC that can connect to data sources. This way, the exclusive resource group can access the data sources over the VPC.

Note
  • Access a database over the Internet
    Table 1. Synchronize data in a database over the Internet
    Network connectivity solution Instruction on connectivity configuration
    The exclusive resource group for Data Integration can directly connect to the data source. Exclusive resource groups The following figure shows how to configure the network connectivity between an ApsaraDB RDS database and an exclusive resource group for Data Integration. Access over the InternetFor more information about exclusive resource groups for Data Integration, see Create and use an exclusive resource group for Data Integration. For more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
    Notice Take note of the Internet traffic cost. For more information, see Internet traffic generated by Data Integration.
  • Access a database over a VPC
    Table 2. Synchronize data in a database that belongs to the same Alibaba Cloud account and resides in the same region as a workspace over a VPC
    Network connectivity solution Instruction on connectivity configuration
    The following figure shows a network connectivity solution.VPCThe following figure shows the architecture of the network connectivity solution.Same Alibaba Cloud account and same region - a self-managed database hosted on an ECS instance Associate the exclusive resource group for Data Integration with the VPC where the data source resides.
    Note
    • You can associate the resource group with a vSwitch in the VPC. After the resource group is associated with a vSwitch, the system automatically adds a route to the VPC. This way, the resource group can connect to the data source.
    • If you associate the exclusive resource group for Data Integration with the VPC where the data source resides when you add the data source, the exclusive resource group can access only the vSwitch to which the data source belongs and cannot connect to the VPC. In this case, you must manually add a route. For more information, see Add a route.
    The following figure shows how to configure the network connectivity between an ApsaraDB RDS database and an exclusive resource group for Data Integration. Same Alibaba Cloud account and same region - ApsaraDB RDSFor more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
    Table 3. Synchronize data in a database that belongs to the same Alibaba Cloud account as and resides in a different region from a workspace over a VPC
    Network connectivity solution Instruction on connectivity configuration
    The following figure shows a network connectivity solution.Access to a data source that resides in a different regionThe following figure shows the architecture of the network connectivity solution.Same Alibaba Cloud account and different regions - ECS
    1. Associate the exclusive resource group for Data Integration with a VPC.
      1. Create a VPC in the region where the DataWorks workspace resides.
      2. Associate the exclusive resource group for Data Integration with the VPC.
    2. Connect the resource group to the data source.
      1. Connect the VPC that is created in the previous step to the VPC where the data source resides by using Express Connect circuits or VPN gateways.
      2. Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the VPC where the data source resides. For more information, see Add a route.
    The following figure shows how to configure the network connectivity between an ApsaraDB RDS database and an exclusive resource group for Data Integration. Same Alibaba Cloud account and different regions - ApsaraDB RDSFor more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
    Table 4. Synchronize data in a database that belongs to a different Alibaba Cloud account from a workspace over a VPC
    Network connectivity solution Instruction on connectivity configuration
    The following figure shows a network connectivity solution.Access to a data source that resides in a different regionThe following figure shows the architecture of the network connectivity solution.Different Alibaba Cloud accounts - ECS
    1. Associate the exclusive resource group for Data Integration with a VPC.
      1. Create a VPC in the region where the DataWorks workspace resides.
      2. Associate the exclusive resource group for Data Integration with the VPC.
    2. Connect the resource group to the data source.
      1. Connect the VPC that is created in the previous step to the VPC where the data source resides by using Express Connect circuits or VPN gateways.
      2. Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the VPC where the data source resides. For more information, see Add a route.
    The following figure shows how to configure the network connectivity between an ApsaraDB RDS database and an exclusive resource group for Data Integration. Different Alibaba Cloud accounts - ApsaraDB RDSFor more information about how to obtain the VPC information of an ApsaraDB RDS instance, see Switch an ApsaraDB RDS for MySQL instance to a new VPC and a new vSwitch.
  • Synchronize data in a database that does not belong to Alibaba Cloud
    Table 5. Synchronize data in a database that resides in a data center or belongs to other cloud service providers
    Network connectivity solution Instruction on connectivity configuration
    The following figure shows a network connectivity solution.IDCThe following figure shows the architecture of the network connectivity solution.IDC
    1. Associate the exclusive resource group for Data Integration with a VPC.
      1. Create a VPC in the region where the DataWorks workspace resides.
      2. Associate the exclusive resource group for Data Integration with the VPC.
    2. Connect the resource group to the data source.
      1. Connect the VPC that is created in the previous step to the data center where the data source resides by using Express Connect circuits or VPN gateways.
      2. Add a route in the DataWorks console to connect the VPC where the DataWorks workspace resides to the data center where the data source resides. For more information, see Add a route.
  • Synchronize data in a database over the classic network
    The exclusive resource group for Data Integration cannot connect to the data source.
    Note We recommend that you migrate the data source to a VPC and do not use the classic network of Alibaba Cloud.

Use a custom resource group for Data Integration

If you have idle servers, you can create a custom resource group based on the idle servers to run your nodes.
Notice
  • You must activate DataWorks Professional Edition before you can use custom resource groups. For more information about custom resource groups, see Custom resource groups.
  • After the connectivity is configured, check whether the data source is configured with a whitelist. If the data source is configured with a whitelist, you must add the Classless Inter-Domain Routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure an IP address whitelist.
  • If you use a self-managed data source that is hosted on an ECS instance, configure a security group for the instance. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
Network environment of the data source Network connectivity solution Instruction on connectivity configuration
The data source is accessible over the Internet. The custom resource group for Data Integration can directly connect to the data source. Access over the Internet For more information about custom resource groups for Data Integration, see Create a custom resource group for Data Integration.
Note Take note of the Internet traffic cost. For more information, see Internet traffic generated by Data Integration.
The data source and custom resource group for Data Integration use the same IP address on the classic network or are in the same VPC or data center. The custom resource group for Data Integration can directly connect to the data source. Same network environment
The data source and the custom resource group for Data Integration use different IP addresses on the classic network or are in different VPCs or data centers. The following figure shows a network connectivity solution.Different network environments Connect the custom resource group for Data Integration to the data source by using Express Connect circuits or VPN gateways.

Additional information

  • The following services may be involved in network connectivity solutions:
  • View the resource group on which a synchronization node is run.
    • If the logs contain information that is similar to the following example, the synchronization node is run on the shared resource group:
      running in Pipeline[basecommon_ group_xxxxxxxxx]
      - If ApsaraDB RDS databases are involved, an OXS cluster is used to run the synchronization node. The logs are in the format of running in Pipeline[basecommon_ group_xxx_oxs].
      - If ApsaraDB RDS databases are not involved, an ECS cluster is used to run the synchronization node. The logs are in the format of running in Pipeline[basecommon_ group_xxx_ecs].
    • If the logs contain information that is similar to the following example, the synchronization node is run on an exclusive resource group for Data Integration:
      running in Pipeline[basecommon_S_res_group_xxx]
    • If the logs contain information that is similar to the following example, the synchronization node is run on a custom resource group for Data Integration:
      running in Pipeline[basecommon_xxxxxxxxx]

What to do next

  1. Configure network connectivity between a resource group and data sources.
    1. After you select an appropriate network connectivity solution, connect a resource group to a data source by following the related instructions on connectivity configuration.
    2. After the connectivity is configured, check whether the data source is configured with a whitelist. If the data source is configured with a whitelist, you must add the Classless Inter-Domain Routing (CIDR) block of the resource group to the whitelist of the data source. This way, the resource group can read data from and write data to the data source. For more information, see Configure an IP address whitelist.
    3. If you use a self-managed data source that is hosted on an ECS instance, configure a security group for the instance. For more information, see Configure a security group for an ECS instance where a self-managed data store resides.
  2. Configure a data synchronization solution or node. For more information, see the following topics: