All Products
Search
Document Center

DataWorks:Overview

Last Updated:Oct 16, 2024

A DataWorks resource group is a collection of computing resources that can be used by DataWorks services. DataWorks resource groups are a prerequisite to use DataWorks. The status of a resource group affects the stability of DataWorks services, and the quota of a resource group affects the execution efficiency of tasks or services. This topic describes the properties and characteristics of DataWorks resource groups.

Background information

To address user experience issues such as complex billing logic and inconsistent purchase management of old-version resource groups, DataWorks has been gradually releasing serverless resource groups in different regions since June 10, 2024. Old-version resource groups include the shared resource group for scheduling, exclusive resource groups for Data Integration, exclusive resource groups for scheduling, and exclusive resource groups for DataService Studio. All DataWorks features support serverless resource groups. The billing logic of serverless resource groups is clear and simple. You can use a serverless resource group to complete various operations, such as data synchronization, task scheduling, and calling and management of DataService Studio APIs.

Characteristics of serverless resource groups

  • General-purpose use: Serverless resource groups apply to all DataWorks services without differentiation.

  • Flexible billing method: The pay-as-you-go and subscription billing methods are supported.

  • Dynamic scaling: Tasks that are run on the resource groups are not affected when the resource groups are scaled.

  • On-demand use: You can purchase resources based on your business requirements to reduce resource waste. The smallest unit of purchase is two compute units (CUs).

  • High isolation and security: The resources in serverless resource groups are exclusive to you and you can control the network of the resource groups. This provides better security and isolation.

Billing methods of serverless resource groups

A serverless resource group uses CU as the billing unit instead of resource specifications. The performance of 1 CU is approximately equal to 1 vCPU core and 4 GiB of memory.

For more information about the billing of serverless resource groups, see Billing of serverless resource groups.

Billing examples

A user in the China (Hangzhou) region needs to run 20 DataWorks synchronization tasks to synchronize data from a MySQL database to a MaxCompute data warehouse every day in the morning. Each task runs for one hour.

  • The following figure shows the consumed resources in the situation where a serverless resource group is used to run the synchronization tasks and each task consumes 1 CU per hour.image

    Note

    The unit price of a pay-as-you-go serverless resource group in the China (Hangzhou) region is USD 0.077399 per CU-hour.

    Fees of the serverless resource group for one day = CUs consumed by a single task per hour × Unit price of CUs × Number of tasks × Running duration of tasks = 1 CU × USD 0.077399 per CU-hour × 20 CUs × 1 hour = USD 1.54798.

  • If you use an exclusive resource group for Data Integration, the minimum specifications of an exclusive resource group for Data Integration for you to purchase contain 4 vCPUs and 8 GiB of memory. The unit price of an exclusive resource group for Data Integration with the minimum specifications in the China (Hangzhou) region is USD 76.23 per month, which is approximately USD 2.541 per day.

Conclusion

Serverless resource groups help avoid unnecessary resource waste and reduce the resource cost by about 40%.

Limits

  • Serverless resource groups are supported in the following regions: China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Hong Kong), China (Zhangjiakou), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), UK (London), US (Silicon Valley), Germany (Frankfurt), and US (Virginia).

  • You are granted the required permissions.

  • A pay-as-you-go serverless resource group can use up to 500 CUs.

  • For other types of tasks except real-time synchronization tasks, a maximum of 16 CUs can be allocated to a single task.

Precautions

You have exclusive rights to use DataWorks resource groups, including serverless resource groups and old-version resource groups. Accordingly, any legal obligations and liabilities arising from the code logic executed or scheduled based on the resource groups will also be borne by you. We recommend that you adhere to relevant laws and regulations, and use resources reasonably to maintain a good community environment and protect your own rights and interests.

Comparison between serverless resource groups and old-version resource groups

Comparison item

Old-version resource group (exclusive resource groups and shared resource groups)

Serverless resource group

Classification

Resource groups are classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on their purposes.

Resource groups are used for general purposes and are not classified.

Support for features

Some capabilities of DataWorks are not supported.

All capabilities of DataWorks are supported.

Support for mixed use

Each type of resource group serves only one purpose.

A resource group can be used in data synchronization, scheduling, and DataService Studio at the same time.

Sales mode

Resource groups are charged based on the specifications and the number of machines.

A resource group must contain at least one machine, and the minimum specifications of a machine are 4 vCPUs and 8 GiB of memory. The minimum step size for scaling out a resource group is one machine whose specifications are 4 vCPUs and 8 GiB of memory.

Resource groups are sold by compute unit (CU).

A resource group must contain at least two CUs. The minimum step size for scaling out a resource group is one CU.

Billing method

  • Exclusive resource groups support only the subscription billing method.

  • Shared resource groups support only the pay-as-you-go billing method.

Both the subscription and pay-as-you-go billing methods are supported.

Resource waste

DataWorks provides only limited types of specifications for resource groups. This causes a specific amount of resource fragments to be generated on machines of each type of specifications. As a result, resources are wasted.

You can determine the number of CUs based on your business requirements. This prevents resource waste.

Scalability

  • You can upgrade or downgrade the specifications of a resource group.

  • You can also increase or reduce the number of machines in a resource group.

You can directly change the number of CUs for a resource group.

Impact generated by scale-out or scale-in

Running tasks are affected.

Running tasks are not affected.

Network security

DataWorks manages inbound and outbound Internet traffic for resource groups. The Internet bandwidth of resource groups is shared by multiple users. This causes resource competition.

Users use their own Internet capabilities, which makes the behavior of users controllable.

Development trend

Old-version resource groups will be discontinued in the future.

Serverless resource groups will become the only resource groups that are supported by DataWorks.

Support for custom images

Custom images are not supported.

Custom images are supported. If you use a serverless resource group to deploy tasks, you can create an image that contains all components required for running tasks. This helps meet more conditions for running tasks.

References

Appendix: Old-version resource groups

Note

Serverless resource groups support more capabilities than old-version resource groups (exclusive resource groups and shared resource groups), can be purchased in a more unified manner, and can effectively utilize resource fragments to avoid waste. We recommend that you use serverless resource groups.

Introduction to old-version resource groups

DataWorks old-version resource groups include exclusive resource groups and shared resource groups. DataWorks resource groups can be further classified into the following types based on scenarios such as task scheduling, data integration, and data service provision: exclusive resource group for scheduling, exclusive resource group for Data Integration, exclusive resource group for DataService Studio, shared resource group for scheduling, and shared resource group for DataService Studio. A resource group for scheduling is used to schedule tasks. A resource group for Data Integration is used to transmit data in Data Integration tasks. A resource group for DataService Studio is used to call API operations.

Resource group type

Billing method

Description

Resource group for scheduling

Resource group for Data Integration

Resource group for DataService Studio

Exclusive resource groups

Subscription

  • This type of resource group is managed by Alibaba Cloud. After you purchase an exclusive resource group, you can exclusively use this resource group.

  • You can associate an exclusive resource group with a workspace. This way, resources are isolated between workspaces.

  • Flexible configurations for a resource group such as scale-out, scale-in, upgrade, or change of specifications are supported.

Exclusive resource group for scheduling

Exclusive resource group for Data Integration

Exclusive resource group for DataService Studio

Note

Exclusive resource groups for DataService Studio are available only in the China (Shanghai) region.

Shared resource groups

Pay-as-you-go

After you activate DataWorks, DataWorks provides you with the shared resource group for DataService Studio and the shared resource group for scheduling. The shared resource groups are shared by tenants in DataWorks. If you run tasks on the shared resource groups, the tasks may enter the state of waiting for resources during peak hours.

Use a shared resource group

-

Use a shared resource group

Comparison between shared resource groups and exclusive resource groups

Resource group for scheduling

Dimension

Shared resource group for scheduling

Exclusive resource group for scheduling

Ownership of resources

The resources are maintained by DataWorks and shared among all tenants.

The resources are maintained by DataWorks and exclusively used by each tenant.

Network connectivity

The network connectivity between specific data sources and this type of resource group cannot be ensured. The data sources include but are not limited to the following data sources:

  • Data sources that are deployed on the Internet and for which whitelists are configured to limit access from unknown IP addresses

  • Data sources that are deployed in virtual private clouds (VPCs) of Alibaba Cloud

You can select a network connectivity solution to connect this type of resource group to a data source that is deployed in a complex network environment. For more information, see Exclusive resource groups for scheduling.

Timeliness requirements for task execution

This type of resource group is shared by DataWorks users and the timeliness of task execution cannot be guaranteed.

This type of resource group is exclusively used by a tenant, and the maximum number of tasks that can be run in parallel on the resource group can be controlled. This ensures that the tasks are scheduled on time.

Use scenario

If this type of resource group is used, tenants in the workspace with which the resource group is associated share resources in the resource group and may compete for the resources when a large number of tasks need to be run. As a result, some tasks in the workspace are blocked from running. This type of resource group is suitable for the scenario where a small number of tasks are run in parallel and the tasks are scheduled at a low frequency.

A tenant can control the maximum number of tasks that can be run in parallel on this type of resource group and scale out, scale in, or change the specifications of the resource group. This type of resource group is suitable for the scenario where a large number of tasks need to be run and the tasks must be scheduled on time.

Billing method

Pay-as-you-go. For more information, see Billing of the shared resource group for scheduling (pay-as-you-go).

Subscription. For more information, see Billing of exclusive resource groups for scheduling (subscription).

Other dimensions

  • If you want tasks to be run on time, we recommend that you select an exclusive resource group for scheduling. The shared resource group for scheduling is not recommended because the resource group is shared by tenants in DataWorks.

  • If you want to access a data source that is deployed on a network other than the Internet and for which an IP address whitelist is configured, we recommend that you select an exclusive resource group for scheduling. The shared resource group for scheduling is not suitable for this scenario.

  • If you need to schedule a large number of tasks by day, we recommend that you select a subscription exclusive resource group for scheduling.

Resource group for Data Integration

Dimension

Exclusive resource group for Data Integration

Ownership of resources

The resources are maintained by DataWorks and exclusively used by each tenant.

Network connectivity

  • This type of resource group can access a data source that is deployed on the Internet.

  • You can select a network connectivity solution to connect this type of resource group to a data source that is deployed in any network environments.

Supported data source

All data sources.

Timeliness requirements for task execution

This type of resource group is exclusively used by a tenant, and the maximum number of tasks that can be run in parallel on the resource group can be controlled. This ensures that the tasks are scheduled on time.

Use scenario

This type of resource group is suitable for the scenario where a large number of important production tasks need to be run.

Billing method

Subscription. For more information, see Billing of exclusive resource groups for Data Integration (subscription).

Selection guideline

  • If a large number of Data Integration tasks must be run in parallel, exclusive computing resources are required to ensure fast and reliable data transmission. We recommend that you select an exclusive resource group for Data Integration in this case.

  • An exclusive resource group for Data Integration is suitable for the scenario where you want to access a data source that is deployed on the Internet. You cannot directly access specific data sources that are deployed on the Internet. For more information about how to access the data sources, see Supported data source types, Reader plug-ins, and Writer plug-ins.

  • If you want to access a data source that is deployed in a complex network environment, we recommend that you select a subscription exclusive resource group for Data Integration. For more information about network connectivity solutions, see Network connectivity solutions.

  • If you want to synchronize data in real time, we recommend that you select an exclusive resource group for Data Integration.

  • Specific data sources can be connected to only exclusive resource groups for Data Integration. For more information, see Data source types that support real-time synchronization.

Resource group for DataService Studio

The shared resource group for DataService Studio is shared by tenants. If you need to call DataService Studio APIs at a high frequency and with a high parallelism and require data to be returned in a timely manner, you must use exclusive computing resources to ensure the availability and stability of DataService Studio. We recommend that you use an exclusive resource group for DataService Studio in this case. For information about the billing of resource groups for DataService Studio, see Billing of exclusive resource groups for DataService Studio (subscription) and DataService Studio.

Billing

For more information about the billing of different types of resource groups, see Billing of old-version resource groups.

Precautions

  • By default, DataWorks provides pay-as-you-go shared resource groups for existing users when they activate DataWorks. Existing users can also purchase subscription exclusive resource groups or upgrade the DataWorks service from DataWorks Basic Edition to DataWorks Standard Edition, DataWorks Professional Edition, or DataWorks Enterprise Edition to use the machines in a data center as resource groups to run DataWorks tasks. For more information, see the Introduction to old-version resource groups section in this topic.

  • Different types of old-version resource groups are used in different execution phases of a task. In this topic, a batch synchronization task is used to describe the mechanism for issuing a task and how resource groups are used in different execution phases of the task. For more information, see the Mechanism for issuing tasks that are run on old-version resource groups section in this topic.

  • A task can use different types of resource groups in a specific execution phase. For example, in the scheduling phase, a batch synchronization task can use a subscription exclusive resource group for scheduling or the pay-as-you-go shared resource group for scheduling. In the data integration phase, the batch synchronization task can use a subscription exclusive resource group for Data Integration. For information about how to select a resource group, see the Comparison between shared resource groups and exclusive resource groups section in this topic.

  • By default, shared resource groups and exclusive resource groups that are provided by DataWorks are automatically protected by Security Center Basic. Security Center Basic provides you with basic security protection features to harden the security of your assets. You can use the features to detect risks on your assets. The risks include unusual logons, DDoS attacks, and common vulnerabilities. For more information, see Introduction to Security Center Basic.

Mechanism for issuing tasks that are run on old-version resource groups

DataWorks uses resource groups for scheduling to issue batch synchronization tasks to compute engines or servers and run the tasks by using the compute engines or servers. For example, DataWorks uses a resource group for scheduling to issue a batch synchronization task to a MaxCompute compute engine and run the task by using the MaxCompute compute engine. DataWorks uses resource groups for scheduling to issue batch data synchronization tasks in Data Integration to resource groups for Data Integration and run the tasks by using the resource groups for Data Integration.公共数据集成资源组