DataWorks supports shared resource groups, exclusive resource groups, and custom resource groups. This topic provides an overview on the use of these resource groups.

Background information

  • When you activate DataWorks, you are provided with shared resource groups by default. You can purchase a subscription exclusive resource group. You can also use servers in your data center as a resource group after you upgrade the DataWorks service to a more advanced edition. For more information about different types of resource groups, see Description of resource groups.
  • In different execution phases of a node, different types of resource groups are used. In this topic, a batch synchronization node is used to describe the mechanism for issuing a node and resource groups used in different execution phases of the node. For more information, see Mechanism for issuing nodes.
  • A node can use different types of resource groups in a specific execution phase. For example, in the scheduling phase, a batch synchronization node can use a subscription exclusive resource group for scheduling or the pay-as-you-go shared resource group for scheduling. In the data integration phase, a batch data synchronization node can use a subscription exclusive resource group for Data Integration or a custom resource group for Data Integration supported by an advanced DataWorks edition. For more information about how to select a resource group, see Details of resource groups.

Billing

For more information about the billable items and billing methods of DataWorks, see Overview.

Limits

Only an Alibaba Cloud account or a RAM user to which the AliyunBSSOrderAccess and AliyunDataWorksFullAccess policies are attached can purchase a resource group.

Description of resource groups

DataWorks resource groups are classified into exclusive resource groups, shared resource groups, and custom resource groups. They can be further classified into the following types based on scenarios such as node scheduling, data integration, and data service provision: exclusive resource group for scheduling, exclusive resource group for Data Integration, exclusive resource group for DataService Studio, shared resource group for scheduling, shared resource group for DataService Studio, custom resource group for scheduling, and custom resource group for Data Integration. A resource group for scheduling is used to schedule nodes. A resource group for Data Integration is used to transmit data in Data Integration nodes.

Resource group type Description Resource group for scheduling Resource group for Data Integration
Exclusive resource group
  • This type of resource group is managed by Alibaba Cloud. After you purchase an exclusive resource group, you can exclusively use this resource group.
  • You can associate an exclusive resource group with a workspace. This way, resources are isolated between workspaces.
  • Flexible configurations for a resource group such as scale-out, scale-in, upgrade, or change of specifications are supported.
  • The subscription billing method is used.
Exclusive resource group for scheduling Exclusive resource group for Data Integration
Shared resource group
  • After you activate DataWorks, DataWorks provides you with a shared resource group for scheduling. The shared resource group is shared by tenants in DataWorks. If you run nodes on the shared resource group, the nodes may enter the Waiting for resources state during peak hours.
  • The pay-as-you-go billing method is used.
Shared resource group for scheduling -
Custom resource group If you have idle servers, you can create a custom resource group based on the idle servers to run your nodes.
Note You can create a custom resource group for scheduling only in DataWorks Professional Edition or a more advanced edition. For more information, see Custom resource groups.
Custom resource group for scheduling Custom resource group for Data Integration

Mechanism for issuing nodes

DataWorks uses resource groups for scheduling to issue batch synchronization nodes to compute engine instances or servers and run the nodes by using the compute engine instances or servers. For example, DataWorks uses a resource group for scheduling to issue a batch synchronization node to a MaxCompute compute engine instance and run the node by using the MaxCompute compute engine instance. DataWorks uses resource groups for scheduling to issue batch data synchronization nodes in Data Integration to resource groups for Data Integration and run the nodes by using the resource groups for Data Integration. Shared resource group for Data Integration

Details of resource groups

The following tables describe different types of resource groups from various dimensions, such as users of server resources, network, timeliness requirements for node execution, use scenarios, and billing methods. This helps you select a resource group based on your business requirements.
  • Resource group for scheduling
    Dimension Shared resource group for scheduling Exclusive resource group for scheduling Custom resource group for scheduling
    Server resource user The resources are maintained by DataWorks and shared by tenants. The resources are maintained by DataWorks and exclusively used by each tenant. The resources are maintained by users and reside in data centers.
    Network
    This type of resource group can be connected to data sources of Alibaba Cloud. The following data sources are not supported:
    • Data sources that are deployed on the Internet and for which whitelists are configured to limit access from unknown IP addresses
    • Data sources that are deployed in virtual private clouds (VPCs) of Alibaba Cloud
    You can select a network connectivity solution to connect this type of resource group to a data source that is deployed in a complex network environment. For more information, see Exclusive resource groups for scheduling. You can select a network connectivity solution based on the environment where your server resides.
    Timeliness requirements for node execution This type of resource group is shared by DataWorks users and the service cannot ensure that timeliness requirements for node execution are met. This type of resource group is exclusively used by a tenant, and the maximum number of nodes that can be run in parallel on the resource group can be controlled. This ensures that the nodes are scheduled on time. You can select a network connectivity solution based on the environment where your server resides.
    Use scenario If this type of resource group is used, tenants share resources in the resource group and may preempt the resources when a large number of nodes need to be run. As a result, some nodes in a workspace are blocked from running. This type of resource group is suitable for the scenario where a small number of nodes are run in parallel and the nodes are scheduled at a low frequency. A tenant can control the maximum number of nodes that can be run in parallel on this type of resource group and scale out, scale in, or change the specifications of the resource group. This type of resource group is suitable for the scenario where a large number of nodes need to be run and the nodes must be scheduled on time. The custom development scenario is used.
    Billing method Pay-as-you-go Subscription You are billed for the DataWorks edition that you use on a monthly basis based on the pay-as-you-go billing method.
    Star rating ★★★★★
    Selection guideline
    • If you want nodes to be run on time, we recommend that you select an exclusive resource group for scheduling. The shared resource group for scheduling is not recommended because the resource group is shared by tenants in DataWorks.
    • If you want to access a data source that is deployed on a network other than the Internet and for which a whitelist is configured, we recommend that you select an exclusive resource group for scheduling. The shared resource group for scheduling is not suitable for this scenario.
    • If you need to schedule a large number of nodes, we recommend that you select a subscription exclusive resource group for scheduling.
  • Resource group for Data Integration
    Dimension Exclusive resource group for Data Integration Custom resource group for Data Integration
    Server resource user The resources are maintained by DataWorks and exclusively used by each tenant. The resources are maintained by users and reside in data centers.
    Network
    • This type of resource group can access a data source that is deployed on the Internet.
    • You can select a network connectivity solution to connect this type of resource group to a data source that is deployed in any network environments.
    You can select a network connectivity solution based on the environment where your server resides.
    Supported data source All data sources. All data sources.
    Timeliness requirements for node execution This type of resource group is exclusively used by a tenant, and the maximum number of nodes that can be run in parallel on the resource group can be controlled. This ensures that the nodes are scheduled on time. You can select a network connectivity solution based on the environment where your server resides.
    Use scenario This type of resource group is suitable for the scenario where a large number of important production nodes need to be run.
    • You have computing resources in your data center. In this case, you can connect your data center to Alibaba Cloud to use the computing resources, and you do not need to purchase Alibaba cloud computing resources.
    • Data sources from which you want to synchronize data are deployed in your data center.
    Billing method Subscription You are billed for the DataWorks edition that you use on a monthly basis based on the pay-as-you-go billing method.
    Star rating ★★★★★
    Selection guideline
    • If a large number of Data Integration nodes must be run in parallel, exclusive computing resources are required to ensure fast and reliable data transmission. We recommend that you select an exclusive resource group for Data Integration in this case.
    • An exclusive resource group for Data Integration is suitable for the scenario where you want to access a data source that is deployed on the Internet. You cannot directly access specific data sources that are deployed on the Internet. For more information about how to access the data sources, see Supported data sources, readers, and writers.
    • If you want to access a data source that is deployed in a complex network environment, we recommend that you select a subscription exclusive resource group for Data Integration. For more information about network connectivity solutions, see Select a network connectivity solution.
    • If you want to synchronize data in real time, we recommend that you select an exclusive resource group for Data Integration.
  • Resource group for DataService Studio

    The shared resource group for DataService Studio is shared by tenants. If you need to call DataService Studio APIs at a high frequency and with a high parallelism and require data to be returned in a timely manner, you must use exclusive computing resources to ensure the availability and stability of DataService Studio. We recommend that you use an exclusive resource group for DataService Studio in this case.