When you use the Data Integration service of DataWorks to synchronize data, data sync nodes consume resource groups. This topic describes what a resource group is and how the connectivity and performance of a resource group affect data synchronization. This topic also introduces resource groups of different types. You can select appropriate resource groups based on your needs.

Basic concepts

A resource group is a collection of computing resources on which batch sync nodes of Data Integration are run. Generally, a resource group refers to one or more servers that consist of CPU, memory, and network resources.

In the process of running a sync node, the resource group pulls data from the data source and pushes the data to the data destination. Flowchart

Types of resource groups

Resource groups can be divided into public, exclusive, and custom resource groups.
  • Public resource groups:

    Public resource groups are provided by DataWorks for all users to use. During peak hours, some users may preempt resources.

    For more information about how to use public resource groups, see Public resource groups.
  • Exclusive resource groups:

    Exclusive resource groups can be used only by the users who purchase them. Assume that you must concurrently run a large number of sync nodes during peak hours. In this case, you can use exclusive resource groups to ensure that the data is transmitted in a fast and stable manner.

    For more information about exclusive resource groups, see Exclusive resource groups for Data Integration. For more information about how to use exclusive resource groups, see Exclusive resource groups for data integration.
  • Custom resource groups:

    DataWorks supports custom resource groups. If you have surplus server resources, you can configure the resources as a custom resource group and run sync nodes on the resource group.

    For more information about custom resource groups, see Custom resource groups. For more information about how to use custom resource groups, see Create a custom resource group for Data Integration.

Key to resource planning: connectivity and performance

When you use resource groups, you must pay attention to their connectivity and performance.
  • Connectivity

    To ensure that data can be properly synchronized, make sure that a resource group can reach the data source and destination over the network and that the resource group is allowed to access the data source and destination. Before you use Data Integration to synchronize data, you must make sure that the resource group is properly connected to the data source and destination. If the resource group is disconnected from the data source and destination, sync nodes cannot be run.

    Connectivity is the most important factor to consider when you select a resource group. You can select an appropriate resource group based on the network environment of the data source and destination and the network connectivity solutions of different resource groups. For more information about the network connectivity solutions of different resource groups, see Test data store connectivity.

  • Performance

    Sync nodes consume the CPU, memory, and network resources on the servers where the nodes are run. Insufficient resources may lead to various issues. For example, the nodes fail to start, wait for resources for a prolonged period after startup, transmit data at a low rate, or fail to generate results as scheduled.

    To ensure that sync nodes can be properly run, you must allocate adequate resources to them. We recommend that you use exclusive resource groups to run sync nodes so that the nodes do not need to compete for resources in the public resource pool. For more information about the performance metrics of exclusive resource groups, see Performance metrics and billing standards of exclusive resource groups for Data Integration.

Comparison and recommendation of resource groups

The three types of resource groups are applicable to different scenarios. The following table compares the resource groups based on dimensions such as the ownership of resources, network connectivity, and billing method. When you run sync nodes, you can select appropriate resource groups based on your needs.
Type Exclusive resource group Public resource group Custom resource group
Ownership of resources The resources are maintained by DataWorks and exclusively used by the tenant that purchases the exclusive resources. The resources are maintained by DataWorks and shared among all tenants. The resources are maintained by yourself and reside in your Internet data center (IDC).
Network connectivity Can reach Alibaba Cloud data stores on all types of networks and data stores that are not provided by Alibaba Cloud but are deployed on virtual private clouds (VPCs) or the Internet. Can reach Alibaba Cloud data stores on all types of networks and data stores that are not provided by Alibaba Cloud but are deployed on the classic network or the Internet. Can reach Alibaba Cloud data stores on all types of networks and data stores that are not provided by Alibaba Cloud but are deployed on VPCs or the Internet.
Billing method Subscription based on the server specifications Tiered pricing based on the number of node instances Monthly billing in pay-as-you-go mode based on the DataWorks edition
Supported data stores All data stores Specific data stores All data stores
Security High High Depending on the environment where your server resides
Node execution efficiency

Node execution efficiency refers to whether nodes can be allocated sufficient computing resources to deliver the optimal performance.

High Low Depending on the environment where your server resides
Reliability

Reliability refers to whether nodes can be started and generate results as scheduled in the case that network resources are occupied by other tenants.

High Low Depending on the environment where your server resides
Scenario Suitable for the execution of a large number of important production nodes Suitable for the execution of a small number of non-important, non-urgent, or testing nodes Suitable for the following scenarios:
  • You want to make full use of the computing resources you have purchased.
  • Both the data source and destination are in the same IDC as the custom resource group.
Recommendation rating ★★★★★ ★★

Based on the preceding comparison, we recommend that you use exclusive resource groups to run sync nodes.