DataWorks has launched serverless resource groups, which consolidate the core functions of exclusive resource groups for scheduling, Data Integration, and Data Service. With a single resource group, you can perform data synchronization, task scheduling, and API service invocation and management. This document is an end-to-end guide for using serverless resource groups in DataWorks. By following this guide, you can complete the entire process from creation and network configuration to workspace binding and daily monitoring and management.
Make sure that you have read Resource group management and understand the core concepts and benefits of serverless resource groups.
Prerequisites
-
You have the required permissions for resource groups:
-
Purchase permissions: You need the
AliyunBSSOrderAccessandAliyunDataWorksFullAccesspermissions.The AliyunVPCReadOnlyAccess permission is also required to read VPC information.
-
Management permissions: Only a workspace administrator for the target DataWorks workspace can bind a resource group to that workspace or modify the binding.
Unsure how to grant permissions? See View the permissions of a RAM user and Manage the permissions of a RAM user. For information about permission control for other resource group operations, see Object-level permission control policies in the console.
-
-
Environment and network planning:
-
Same-region principle: The serverless resource group must be in the same region as your DataWorks workspace.
-
VPC planning (required): Serverless resource groups require a Virtual Private Cloud (VPC). You must have a VPC and a VSwitch ready.
Important-
Do not change the VPC and VSwitch environment bound to a serverless resource group. Changing the environment can cause tasks in DataWorks to fail.
-
Serverless resource groups do not support VPCs that use IP addresses in the 21.0.0.0/8 CIDR block, which means IP addresses from
21.0.X.Xto21.255.XXX.XXXare prohibited.
-
-
Network connectivity: To ensure the resource group can access your data sources (databases, data services, or other data in target network environments), configure network connectivity according to your data source environment.
ImportantBy default, serverless resource groups do not have internet access. To access data sources over the internet, you must configure an Internet NAT Gateway and an EIP for the VPC that is bound to the serverless resource group.
-
-
Relationship between DataWorks resource groups and MaxCompute compute resources: To run ODPS data synchronization tasks, you need the following two types of resources:
-
DataWorks serverless resource group: Used to schedule and run synchronization tasks. Purchase the resource group in the DataWorks console.
-
MaxCompute compute CUs: Used to scan and process data in MaxCompute (ODPS) projects. Purchase and configure compute CUs separately in the MaxCompute console.
These two types of resources are independent and must be purchased separately. If you purchase only a DataWorks serverless resource group but the MaxCompute project does not have sufficient compute CUs, ODPS synchronization tasks may fail or run slowly due to insufficient resources during the data scanning phase.
-
Create a serverless resource group
Log on to the DataWorks console. In the target region, click Resource Group in the left-side navigation pane to open the resource group list.
-
Click Create Resource Group to go to the serverless resource group purchase page. The key parameters are as follows:
For users of earlier versions, the operation is on the default Exclusive Resource Group tab.
Parameter
Description
Region and zone
Select a region. The region must be the same as that of your DataWorks workspace.
Billing method
Two billing methods are available: prepaid Subscription and postpaid Pay-As-You-Go.
-
Pay-As-You-Go: Suitable for workloads with large fluctuations or testing scenarios. You are not charged when no resources are used. Supports switching to Subscription.
ImportantPay-as-you-go CU limit: The maximum specification for a single pay-as-you-go resource group is 500 CUs.
-
Subscription: Suitable for long-term stable production workloads at a lower cost. Switching from Subscription to Pay-As-You-Go is not supported.
You can purchase multiple resource groups with different billing methods to meet your business requirements.
Resource group specifications
When the billing method is Subscription, set the resource group specifications. The minimum purchase quantity is 2 CUs, and the minimum scaling increment is 1 CU.
1 CU = 1 CPU core + 4 GiB memory. For purchase recommendations and the minimum specifications required to run various tasks, see Resource group specifications and pricing.VPC
Select the VPC based on the network that the resource group needs to connect to. If no options are available in the drop-down list, go to the VPC console to create one.
For more information about VPCs, see What is a VPC?.
-
Data source is in the same account and region as the serverless resource group: Configure the VPC and VSwitch where the data source resides.
-
Data source is in other complex network environments: You also need to connect the VPC bound to the serverless resource group to the VPC where the data source resides by using VPN Gateway or Express Connect. For more information, see Network connectivity.
Important-
A resource group supports binding to multiple VPCs. You can bind additional VPCs after the purchase is complete.
-
If the billing method of the resource group is Subscription, the VPC configured here cannot be changed or replaced for Data Service after it is applied to Data Service, data computing, and Data Integration. Plan ahead.
VSwitch
Billing cycle
When the billing method is Subscription, you need to set the billing cycle.
ImportantWe recommend that you select Auto Renewal to avoid business interruptions caused by resource expiration and shutdown or release. After you select this option, the auto-renewal cycle is monthly, and fees are automatically deducted at the real-time price before the instance expires.
Service-linked role
For the first purchase, you need to Create Service-linked Role (AliyunServiceRoleForDataWorks). Subsequent purchases will automatically use the created role.
This role is used to access VPC, Elastic Network Interface (ENI), and security group resources. If you see the
Please create AliyunServiceRoleForDataWorksprompt, provide this authorization URL to the Alibaba Cloud account owner or other authorized personnel for authorization, and then proceed.ImportantAfter a serverless resource group is created, you cannot change the vSwitch that is bound to the resource group. If you select an incorrect vSwitch during creation, you cannot directly modify the binding. You must unsubscribe from the current resource group and purchase a new one with the correct vSwitch. Make sure that you verify the vSwitch configuration before you create a resource group.
-
After you complete the purchase, the resource group list may take 1 to 2 minutes to display the newly created resource group. If the resource group does not appear in the list immediately, manually refresh the page. After creation, the resource group status is displayed as Creating. You can use the resource group only after the status changes to Running.
Make sure that you associate the resource group with a workspace immediately after the status changes to Running. Otherwise, the resource group cannot be selected in task configurations.
Resource group configuration and usage
1. Associate a resource group with a workspace
After you create a resource group, you must associate it with a workspace. After the association is complete, you can select and use the serverless resource group when you create tasks in the target workspace.
-
Associate a resource group when creating a workspace
-
Log on to the DataWorks console. In the target region, click Workspace in the left-side navigation pane to open the workspace list.
-
Click Create a workspace. On the Create a workspace page, change the Default Resource Group of DataWorks Workspace parameter in the advanced settings to the target resource group you created.
-
-
Associate a resource group with an existing workspace
Log on to the DataWorks console. In the target region, click Resource Group in the left-side navigation pane to open the resource group list.
-
In the Operation column of the target resource group, click Associate Workspace. Find the workspace you want to bind, and then click Operation > Bind.
2. Configure network connectivity
You need to configure network connectivity to ensure that the resource group can access your data sources. This is a critical step for tasks to run properly. Serverless resource groups do not have internet access by default.
Scenario 1: Access data sources within a VPC (such as RDS or self-managed databases on ECS)
Make sure that the VPC bound to the resource group is the same as the VPC of the data source, or that the VPCs are connected through CEN or VPC peering.
Scenario 2: Access public IP addresses
You must configure a NAT gateway and an EIP for the VPC bound to the resource group to enable internet access.
Scenario 3: Access an on-premises IDC
You need to connect your VPC to the IDC network through VPN Gateway or Express Connect.
For more complex network scenarios, see Network connectivity.
3. Use the resource group in tasks
After you create and configure a serverless resource group, you need to specify the resource group for Data Integration, data scheduling, Data Service, and other tasks so that the serverless resource group is used in those tasks.
-
For Data Integration tasks: In the Resource group configuration section of the synchronization task, select the serverless resource group you created.
-
For data development tasks (Shell, Python, etc.): On the right side of the node editing page, go to , and select the serverless resource group you created.
-
For Data Service APIs: In the Resource Group for DataService Studio section on the right side of the API configuration page, select the serverless resource group you created.
For all scenarios that involve resource groups, see Use resource groups.
Resource group O&M and monitoring
Allocate CU quotas to tasks
You can configure Maximum CUs or Minimum CUs separately for data computing, Data Integration, Data Service, and personal development environments to ensure that tasks run smoothly.
-
For pay-as-you-go resource groups, configure the CU limit to prevent excessive resource usage.
-
For Subscription resource groups, configure the minimum CU guarantee to set the minimum guaranteed CU quota.
Instructions: On the resource group list page, click
> Manage Quota in the Operation column of the target resource group, and then modify the Maximum CUs or Minimum CUs values for different purposes.
You can also click the target Resource Group Name on the resource group list page to go to the resource group details page. Click Manage Quota in the upper-right corner, and then modify the Maximum CUs or Minimum CUs values for different purposes.
Recommended CUs per task: For compute tasks such as Python, Notebook, and PyODPS, we recommend that you configure no more than 16 CUs per task (with an upper limit of 64 CUs) for optimal startup and runtime stability. For synchronization tasks, except real-time synchronization tasks, each task can be allocated a maximum of 16 CUs.
Multi-task resource isolation
Different tasks that share the same pay-as-you-go serverless resource group compete for resources, which may cause some tasks to be delayed. We recommend that you isolate resources for critical tasks by using the following methods:
-
Configure Maximum CUs for different purposes (data computing, Data Integration, and Data Service) to limit the maximum resource usage for each purpose.
-
For subscription resource groups, configure Minimum CUs to guarantee minimum resources for critical tasks.
-
During peak business hours, scale out the resource group specifications to increase the total available CUs and prevent tasks from queuing due to insufficient resources.
Adjust the concurrency limit for scheduling
In data scheduling scenarios, you can manually set the task concurrency limit to control the maximum number of tasks that can run simultaneously. This configuration is unrelated to task execution and does not restrict task execution behavior. By default, a single resource group can run up to 50 scheduled tasks concurrently, and this value can be increased to a maximum of 200.
Instructions: On the resource group list page, click
> Specify Threshold for Parallel Threads of Data Scheduling in the Operation column of the target resource group, and then modify the Specify Threshold for Parallel Threads of Data Scheduling value.
You can also click the target Resource Group Name on the resource group list page to go to the resource group details page. Click Specify Threshold for Parallel Threads of Data Scheduling in the upper-right corner, and then modify the Specify Threshold for Parallel Threads of Data Scheduling value.
View resource group utilization
When a Subscription resource group has high compute resource utilization, newly submitted tasks may enter a queue and wait until resources become available. You can use the following methods to view tasks running on the resource group, the current utilization of the resource group, historical resource usage at specific points in time, and the resource usage of each task.
Instructions: On the resource group list page, view the resource group utilization displayed in the Used CUs column of the target resource group.
You can also click the target Resource Group Name on the resource group list page to go to the resource group details page. Use the Resource Usage chart to review historical resource usage at specific points in time, and view details of tasks that are running or waiting to run by resource group usage scenario.
Scale a resource group
If you observe high utilization on the details page of a Subscription resource group, you can manually scale out the resource group to improve the task processing performance of Data Integration, task scheduling, and Data Service. If the actual utilization of a Subscription resource group is low, you can manually scale in the resource group to reduce costs.
-
On the resource group list page, click
> Scale Out or Scale In in the Operation column of the target resource group.ImportantScaling in may cause tasks to run slower. Evaluate the impact before you proceed.
-
On the resource group configuration change page, adjust the Resource Group Specifications, select Terms of Service, and click Buy Now.
Resource group cost management
Freeze and unfreeze pay-as-you-go resource groups
-
Freeze a pay-as-you-go serverless resource group: If a pay-as-you-go serverless resource group has not been used within 7 days, the resource group will be frozen. You can view the resource group status on the resource group list page. The following scenarios indicate that a resource group is not being used:
-
Scheduled tasks: The resource group is not used to run any scheduled tasks.
-
Data compute tasks: The resource group is not used to execute compute tasks.
-
Data Integration tasks: The resource group is not used to run Data Integration tasks.
-
Data Analysis queries: The resource group is not used to execute Data Analysis queries.
-
Connectivity tests: The resource group is not used to perform connectivity tests.
-
Metadata collection: The resource group is not used for metadata collection tasks.
-
Personal development environment: The resource group is not used for the personal development environment.
-
Data Service: The resource group is not used to support Data Service.
-
Large model service: The resource group is not used to support large model services.
-
-
Start a frozen serverless resource group: To start a frozen serverless resource group, find the target resource group on the resource group list page, and click in the Operation column.
Switch from pay-as-you-go to Subscription
You can switch the billing method of a pay-as-you-go serverless resource group to Subscription. After the switch, the resource group will be billed at the Subscription resource group unit price.
-
On the resource group list page, click the
icon in the Operation column of the target resource group, and select Billing Type Conversion to open the Switch to Subscription dialog. -
In the dialog, you can adjust the Destination Instance Type and Duration of the resource group as needed.
-
After you finish the configuration, click Confirm to go to the Alibaba Cloud checkout page and complete the Order.
When switching a pay-as-you-go serverless resource group to Subscription, the switch takes approximately 1 to 2 minutes, and tasks are not affected.
Renew and unsubscribe from resource groups
On the resource group list page, when viewing instance status: if a resource group shows Expired, click
> Renew in the Operation column of the target resource group.
To unsubscribe from a serverless resource group, click
> Unsubscribe in the Operation column of the target resource group. For more information, see Unsubscribe from a resource group.
Security group release after unsubscription or deletion
-
After you unsubscribe from or delete a serverless resource group, the associated security groups are typically cleaned up automatically. We recommend that you refresh the console immediately after unsubscription to verify that the security groups have been released.
-
If you encounter an error message stating that a security group has associated resources when you try to delete a VPC, but the console shows no associated resources in the security group, the DataWorks backend resources may not have been fully released. Make sure that all related DataWorks serverless resource groups have been unsubscribed from or frozen, and wait a few minutes before retrying.
-
Deleting an expired serverless resource group or one with a zero-cost renewal does not incur additional charges.
FAQ
Q: Are there region restrictions for purchases?
A: The purchase restrictions are as follows.
-
If you want to use serverless resource groups in a reseller environment, first confirm whether your provider supports selling this product.
-
Purchases are not supported in the Thailand (Bangkok) region.
Q: After purchasing a free trial resource package for serverless resource groups, I cannot bind a resource group in DataWorks and no resource groups appear in the list. What should I do?
A: A free trial resource package is a cost deduction voucher and does not automatically create a resource group. You need to manually create a pay-as-you-go serverless resource group, and the system will automatically use the resource package to offset the costs generated by that resource group. After the resource package quota is exhausted, charges will be deducted from your account balance on a pay-as-you-go basis.
Q: How can a serverless resource group access a host address?
A: Serverless resource groups do not support direct host access. If needed, you can use Privatezone DNS to resolve and access the host.
-
Activate the PrivateZone DNS service
NoteIf PrivateZone DNS is already activated, you can skip this step.
-
You have created a zone.
Using the host domain name
header-1-cn-shanghaias an example, perform authoritative resolution for the domain nameheader-1-cn-shanghai. You can adjust this parameter based on your host domain name configuration. -
Set the Record Value to the
private IP addresscorresponding to the host bound to the domain name. -
When setting the VPC for the domain name to take effect, select the VPC bound to the resource group.
Q: I claimed a new user resource package or deduction package, but I cannot select a resource group in the binding page or task configuration. Why?
A: New user resource packages and deduction packages are cost deduction vouchers only. They do not automatically create resource group instances. You need to go to the Resource Group List page, click Create Resource Group, and select the Pay-As-You-Go billing method to create a resource group instance. After the resource group is created, refresh the page and the resource group will appear in the drop-down list.
If the page redirects to the free trial page during creation, manually navigate back to the serverless resource group purchase page to create the resource group. The system will automatically use your claimed resource package to offset the costs generated by the resource group.
Q: A Data Integration task has been running for a long time with 0% progress, or an error message says the resource group type is not supported. What should I do?
A: This issue is typically caused by using a legacy exclusive resource group (such as an exclusive resource group for Data Integration), where machine slot fragmentation prevents tasks from obtaining sufficient resources. We recommend that you create a serverless resource group and switch the task to run on the new serverless resource group. If the task still cannot run, try the following:
-
Reduce the concurrency configuration of the synchronization task.
-
Increase the serverless resource group specifications to allocate more CUs.
Legacy exclusive resource groups are no longer recommended. We recommend that you migrate to serverless resource groups for better elasticity and stability.
References
-
You can use the intelligent monitoring feature in Operation Center to monitor the utilization of a resource group and the number of instances waiting for resources. For more information, see Create a custom rule.
-
If tasks running on a serverless resource group require a specific development environment (such as third-party library dependencies), you can create a custom image that integrates the necessary packages and dependencies, and then specify the serverless resource group as the execution resource and the image as the runtime environment when running the task.