All Products
Search
Document Center

DataWorks:Overview

Last Updated:Jan 03, 2024

DataWorks supports shared resource groups and exclusive resource groups. This topic describes the details about these resource groups and how to select a resource group based on your business requirements.

Background information

  • When you activate DataWorks, you are provided with pay-as-you-go shared resource groups by default. You can also purchase a subscription exclusive resource group. You can use servers in your data center as a resource group after you upgrade the DataWorks service to a more advanced edition. For more information about different types of resource groups, see Description of resource groups.

  • In different execution phases of a node, different types of resource groups are used. In this topic, a batch synchronization node is used to describe the mechanism for issuing a node and resource groups used in different execution phases of the node. For more information, see Mechanism for issuing nodes.

  • A node can use different types of resource groups in a specific execution phase. For example, in the scheduling phase, a batch synchronization node can use a subscription exclusive resource group for scheduling or the pay-as-you-go shared resource group for scheduling. In the data integration phase, a batch synchronization node can use a subscription exclusive resource group for Data Integration. For more information about how to select a resource group, see Details of resource groups.

  • By default, shared resource groups and exclusive resource groups that are provided by DataWorks are automatically protected by Security Center Basic. Security Center Basic provides you with basic security protection features to harden the security of your assets. You can use the features to detect risks on your assets. The risks include unusual logons, DDoS attacks, and common vulnerabilities. For more information, see Introduction to Security Center Basic.

Billing

For more information about the billable items and billing methods of DataWorks, see Billing overview.

Limits

Only an Alibaba Cloud account or a RAM user to which the AliyunBSSOrderAccess and AliyunDataWorksFullAccess policies are attached can purchase a resource group. For more information, see Grant permissions to a RAM user.

Description of resource groups

DataWorks resource groups are classified into exclusive resource groups and shared resource groups. They can be further classified into the following types based on scenarios such as node scheduling, data integration, and data service provision: exclusive resource group for scheduling, exclusive resource group for Data Integration, exclusive resource group for DataService Studio, shared resource group for scheduling, and shared resource group for DataService Studio. A resource group for scheduling is used to schedule nodes. A resource group for Data Integration is used to transmit data in Data Integration nodes. A resource group for DataService Studio is used to call API operations.

Resource group type

Billing method

Description

Resource group for scheduling

Resource group for Data Integration

Resource group for DataService Studio

Exclusive resource group

Subscription

  • This type of resource group is managed by Alibaba Cloud. After you purchase an exclusive resource group, you can exclusively use this resource group.

  • You can associate an exclusive resource group with a workspace. This way, resources are isolated between workspaces.

  • Flexible configurations for a resource group such as scale-out, scale-in, upgrade, or change of specifications are supported.

Exclusive resource group for scheduling

Exclusive resource group for Data Integration

Exclusive resource group for DataService Studio

Note

Exclusive resource groups for DataService Studio are available only in the China (Shanghai) region.

Shared resource group

Pay-as-you-go

After you activate DataWorks, DataWorks provides you with the shared resource group for DataService Studio and the shared resource group for scheduling. The shared resource groups are shared by tenants in DataWorks. If you run nodes on the shared resource groups, the nodes may enter the state of waiting for resources during peak hours.

Use a shared resource group

-

Use a shared resource group

Mechanism for issuing nodes

DataWorks uses resource groups for scheduling to issue batch synchronization nodes to compute engine instances or servers and run the nodes by using the compute engine instances or servers. For example, DataWorks uses a resource group for scheduling to issue a batch synchronization node to a MaxCompute compute engine instance and run the node by using the MaxCompute compute engine instance. DataWorks uses resource groups for scheduling to issue batch data synchronization nodes in Data Integration to resource groups for Data Integration and run the nodes by using the resource groups for Data Integration.公共数据集成资源组

Details of resource groups

The following tables describe different types of resource groups from various dimensions, such as users of server resources, network, timeliness requirements for node execution, use scenarios, and billing methods. This helps you select a resource group based on your business requirements.

Resource group for scheduling

Dimension

Shared resource group for scheduling

Exclusive resource group for scheduling

Server resource user

The resources are maintained by DataWorks and shared by tenants.

The resources are maintained by DataWorks and exclusively used by each tenant.

Networking

The network connectivity between specific data sources and this type of resource group cannot be ensured. The data sources include but are not limited to the following data sources:

  • Data sources that are deployed on the Internet and for which whitelists are configured to limit access from unknown IP addresses

  • Data sources that are deployed in virtual private clouds (VPCs) of Alibaba Cloud

You can select a network connectivity solution to connect this type of resource group to a data source that is deployed in a complex network environment. For more information, see Exclusive resource groups for scheduling.

Timeliness requirements for node execution

This type of resource group is shared by DataWorks users and the service cannot ensure that timeliness requirements for node execution are met.

This type of resource group is exclusively used by a tenant, and the maximum number of nodes that can be run in parallel on the resource group can be controlled. This ensures that the nodes are scheduled on time.

Use scenario

If this type of resource group is used, tenants share resources in the resource group and may preempt the resources when a large number of nodes need to be run. As a result, some nodes in a workspace are blocked from running. This type of resource group is suitable for the scenario where a small number of nodes are run in parallel and the nodes are scheduled at a low frequency.

A tenant can control the maximum number of nodes that can be run in parallel on this type of resource group and scale out, scale in, or change the specifications of the resource group. This type of resource group is suitable for the scenario where a large number of nodes need to be run and the nodes must be scheduled on time.

Billing method

Pay-as-you-go. For more information, see Billing of the shared resource group for scheduling (pay-as-you-go).

Subscription. For more information, see Billing of exclusive resource groups for scheduling (subscription).

Star rating

★★★★★

Selection guideline

  • If you want nodes to be run on time, we recommend that you select an exclusive resource group for scheduling. The shared resource group for scheduling is not recommended because the resource group is shared by tenants in DataWorks.

  • If you want to access a data source that is deployed on a network other than the Internet and for which a whitelist is configured, we recommend that you select an exclusive resource group for scheduling. The shared resource group for scheduling is not suitable for this scenario.

  • If you need to schedule a large number of nodes by day, we recommend that you select a subscription exclusive resource group for scheduling.

Resource group for Data Integration

Dimension

Exclusive resource group for Data Integration

Server resource user

The resources are maintained by DataWorks and exclusively used by each tenant.

Networking

  • This type of resource group can access a data source that is deployed on the Internet.

  • You can select a network connectivity solution to connect this type of resource group to a data source that is deployed in any network environments.

Supported data source

All data sources.

Timeliness requirements for node execution

This type of resource group is exclusively used by a tenant, and the maximum number of nodes that can be run in parallel on the resource group can be controlled. This ensures that the nodes are scheduled on time.

Use scenario

This type of resource group is suitable for the scenario where a large number of important production nodes need to be run.

Billing method

Subscription. For more information, see Billing of exclusive resource groups for Data Integration (subscription).

Star rating

★★★★★

Selection guideline

  • If a large number of Data Integration nodes must be run in parallel, exclusive computing resources are required to ensure fast and reliable data transmission. We recommend that you select an exclusive resource group for Data Integration in this case.

  • An exclusive resource group for Data Integration is suitable for the scenario where you want to access a data source that is deployed on the Internet. You cannot directly access specific data sources that are deployed on the Internet. For more information about how to access the data sources, see Supported data source types, Reader plug-ins, and Writer plug-ins.

  • If you want to access a data source that is deployed in a complex network environment, we recommend that you select a subscription exclusive resource group for Data Integration. For more information about network connectivity solutions, see Establish a network connection between a resource group and a data source.

  • If you want to synchronize data in real time, we recommend that you select an exclusive resource group for Data Integration.

  • Specific data sources can be connected to only exclusive resource groups for Data Integration. For more information, see Data source types that support real-time synchronization.

Resource group for DataService Studio

The shared resource group for DataService Studio is shared by tenants. If you need to call DataService Studio APIs at a high frequency and with a high parallelism and require data to be returned in a timely manner, you must use exclusive computing resources to ensure the availability and stability of DataService Studio. We recommend that you use an exclusive resource group for DataService Studio in this case. For more information about the billing details, see Billing of exclusive resource groups for DataService Studio (subscription) and DataService Studio.