To facilitate the management of resources in DataWorks and improve user experience, DataWorks introduces serverless resource groups. A serverless resource group can implement the core features of an exclusive resource group for scheduling, an exclusive resource group for Data Integration, and an exclusive resource group for DataService Studio at the same time. You can perform operations such as data synchronization, task scheduling and running, and API calling and management by using only one serverless resource group. This topic describes how to create and use a serverless resource group.
Prerequisites
You are familiar with the details about serverless resource groups, such as the specifications, performance, and billing standards. You have determined the specifications and subscription duration that you require based on your business scenario. For more information, see Overview of DataWorks resource groups and Billing of serverless resource groups.
Serverless resource groups are supported in the following regions: China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Hong Kong), China (Zhangjiakou), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Japan (Tokyo), UK (London), US (Silicon Valley), Germany (Frankfurt), and US (Virginia).
NoteIf users who have activated DataWorks of any edition in the Germany (Frankfurt) region want to use serverless resource groups, the users must submit a ticket to contact technical support.
You are granted the required permissions.
Only RAM users to which the AliyunBSSOrderAccess and AliyunDataWorksFullAccess policies are attached can purchase resource groups. For information about how to grant permissions to a RAM user and view the permissions of a RAM user, see Grant permissions to a RAM user and View the permissions of a RAM user.
Only a workspace administrator can associate a resource group with a workspace and change the workspace with which a resource group is associated.
For information about the permissions required to perform other operations on a resource group, see Custom policies used to manage permissions on the entities in the DataWorks console.
If you want to use a serverless resource group in a virtual network operator (VNO) environment, you must first check whether your service provider allows you to purchase serverless resource groups.
Comparison between serverless resource groups and old-version resource groups
Comparison item | Old-version resource group (exclusive resource groups and shared resource groups) | Serverless resource group |
Classification | Resource groups are classified into resource groups for Data Integration, resource groups for scheduling, and resource groups for DataService Studio based on their purposes. | Resource groups are used for general purposes and are not classified. |
Support for features | Some capabilities of DataWorks are not supported. | All capabilities of DataWorks are supported. |
Support for mixed use | Each type of resource group serves only one purpose. | A resource group can be used in data synchronization, scheduling, and DataService Studio at the same time. |
Sales mode | Resource groups are charged based on the specifications and the number of machines. A resource group must contain at least one machine, and the minimum specifications of a machine are 4 vCPUs and 8 GiB of memory. The minimum step size for scaling out a resource group is one machine whose specifications are 4 vCPUs and 8 GiB of memory. | Resource groups are sold by compute unit (CU). A resource group must contain at least two CUs. The minimum step size for scaling out a resource group is one CU. |
Billing method |
| Both the subscription and pay-as-you-go billing methods are supported. |
Resource waste | DataWorks provides only limited types of specifications for resource groups. This causes a specific amount of resource fragments to be generated on machines of each type of specifications. As a result, resources are wasted. | You can determine the number of CUs based on your business requirements. This prevents resource waste. |
Scalability |
| You can directly change the number of CUs for a resource group. |
Impact generated by scale-out or scale-in | Running tasks are affected. | Running tasks are not affected. |
Network security | DataWorks manages inbound and outbound Internet traffic for resource groups. The Internet bandwidth of resource groups is shared by multiple users. This causes resource competition. | Users use their own Internet capabilities, which makes the behavior of users controllable. |
Development trend | Old-version resource groups will be discontinued in the future. | Serverless resource groups will become the only resource groups that are supported by DataWorks. |
Support for custom images | Custom images are not supported. | Custom images are supported. If you use a serverless resource group to deploy tasks, you can create an image that contains all components required for running tasks. This helps meet more conditions for running tasks. |
Precautions
To ensure that your resource group can access the desired data source, such as a database, a data service, or other data in a specific network environment, you must establish a network connection between the resource group and the data source in advance based on the situations of the data source. For more information, see Network connectivity solutions.
ImportantYou can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet section in this topic.
If you have associated a serverless resource group with a VPC and a vSwitch, do not modify the configurations of the VPC and vSwitch. Otherwise, DataWorks tasks that run on the serverless resource group may fail.
Billing of serverless resource groups
For information about billing of serverless resource groups, see Billing of serverless resource groups.
Step 1: Create a serverless resource group
Go to the Resource Groups page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Resource Groups in the left-side navigation pane.
On the Exclusive Resource Groups tab, click Create Resource Group to go to the buy page of serverless resource groups.
Parameter
Description
Region and Zone
The region in which you want to create the resource group. The region must be the same as the region in which the workspace resides.
Billing Method
Subscription: You must pay for the resource group before you use it.
Pay-as-you-go: You can use the resource group before you pay for it.
Resource Group Specifications
This parameter is required only if you set the Billing Method parameter to Subscription.
Valid values: 2 to 99999999. Unit: CU.
Note1 CU = 1 vCPU core + 4 GiB of memory
. If you want to use the resource group in DataService Studio, you must purchase at least four CUs.The value
99999999
indicates that the number of CUs you can purchase is not limited. However, the number of CUs that you can purchase may be affected by the inventory. If the inventory is insufficient, you must pay attention to the prompt that is displayed when you purchase a serverless resource group.
Resource Group Name
The name of the resource group.
Resource Group Description
The description of the resource group.
VPC
The VPC and vSwitch with which you want to associate the resource group. You can select a VPC based on the network that the resource group needs to access.
If the resource group needs to access a data source that belongs to the same Alibaba Cloud account and resides in the same region as the resource group, you can select the VPC where the data source resides and a vSwitch in the VPC.
If the resource group needs to access a data source that resides in a complex network environment, you must use a connection tool such as VPN Gateway or Express Connect to establish a network connection between the VPC with which the resource group is associated and the VPC where the data source resides. For more information, see Network connectivity solutions.
NoteIf no VPCs or vSwitches are available, you must go to the VPC console to create a VPC or a vSwitch. For more information about VPCs, see What is a VPC?
You can associate the resource group with one or more other VPCs after the resource group is created.
If you set the Billing Method parameter to Subscription and the VPC that you select is used in DataService Studio, data computing, and data synchronization, you cannot associate the resource group with another VPC or change the associated VPC for the resource group when you use the resource group in DataService Studio. Make appropriate planning in advance.
If you have associated a serverless resource group with a VPC and a vSwitch, do not modify the configurations of the VPC and vSwitch. Otherwise, DataWorks tasks that run on the serverless resource group may fail.
vSwitch
Billing Cycle
The subscription duration of the resource group. This parameter is required only if you set the Billing Method parameter to Subscription.
ImportantTo prevent your business from being affected due to service suspension or resource release when the resource group expires, we recommend that you select Auto-renewal. After you select Auto-renewal, fees are automatically deducted from your Alibaba Cloud account based on the actual prices before the resource group expires. The auto-renewal cycle is one month. You can disable auto-renewal if you no longer require this feature.
Service-linked Role
The service-linked role. The first time you create a serverless resource group, you must create a service-linked role named AliyunServiceRoleForDataWorks. When you create a serverless resource group in subsequent operations, the system automatically assigns the service-linked role.
NoteThe service-linked role AliyunServiceRoleForDataWorks is used to access resources in a VPC, in an elastic network interface (ENI), and in a security group. For more information about the service-linked role, see DataWorks service-linked role.
Step 2: Associate the resource group with a workspace
After the resource group is created, you must associate the resource group with a workspace. Then, you can select the resource group when you create tasks in the workspace.
Associate the resource group when you create a workspace
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Workspaces in the left-side navigation pane.
On the Workspaces page, click Create Workspace. In the Create Workspace panel, select the created resource group from the Default Resource Group drop-down list.
Associate the resource group with an existing workspace
Go to the Resource Groups page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Resource Groups in the left-side navigation pane.
On the Resource Groups page, find the created resource group, click the icon in the Actions column, and then click Change Workspace. In the Change Workspace panel, find the workspace with which you want to associate the resource group and click Associate in the Actions column.
Step 3: Configure network connectivity
To ensure that your task can run as expected, you must complete network connectivity configuration. This way, the resource group can access the desired data source. For more information, see Network connectivity solutions.
You can associate a serverless resource group with a virtual private cloud (VPC) to enable the resource group to access a data source or an address in a complex network environment over an internal network. By default, serverless resource groups cannot access the Internet. If you want to use a serverless resource group to access a data source or a network environment over the Internet, you must configure an Internet NAT gateway for the VPC with which the resource group is associated and associate the Internet NAT gateway with an elastic IP address (EIP). For more information, see the Scenario 5: Establish a network connection between a resource group and a data source that is deployed on the Internet section in this topic.
Step 4: Modify configuration items for the resource group
Manage quotas
If you want to use the resource group in data computing, data synchronization, and DataService Studio, you can configure the maximum CU quotas or the minimum CU quotas for the resource group. This ensures that your tasks can run as expected.
If the billing method of the resource group is pay-as-you-go, you can configure the maximum CU quotas for the resource group to prevent excessive resource consumption.
If the billing method of the resource group is subscription, you can configure the minimum CU quotas for the resource group.
Go to the Resource Groups page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Resource Groups in the left-side navigation pane.
Change quotas for the resource group.
Change quotas for the resource group on the Resource Groups page.
On the Resource Groups page, find the resource group, click the icon in the Actions column, and then click Manage Quota. In the Manage Quota dialog box, change the maximum CU quotas or the minimum CU quotas for different purposes.
Change quotas for the resource group on the details page of the resource group.
On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. In the upper-right corner of the details page, click Manage Quota. In the Manage Quota dialog box, change the maximum CU quotas or the minimum CU quotas for the resource group.
Change the maximum number of parallel tasks allowed in data scheduling
If you want to use the resource group in data scheduling, you can specify the maximum number of parallel tasks that are allowed to run on the resource group.
By default, the maximum number of parallel tasks that are allowed is 50. The upper limit for the maximum number of parallel tasks that are allowed is 200.
Go to the Resource Groups page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Resource Groups in the left-side navigation pane.
Change the maximum number of parallel tasks allowed in data scheduling.
Change the maximum number of parallel tasks on the Resource Groups page.
On the Resource Groups page, find the resource group, click the icon in the Actions column, and then click Specify Threshold for Parallel Threads of Data Scheduling. In the Specify Threshold for Parallel Threads of Data Scheduling dialog box, change the value of the Specify Threshold for Parallel Threads of Data Scheduling parameter.
Change the maximum number of parallel tasks on the details page of the resource group.
On the Resource Groups page, find the resource group and click the resource group name to go to the details page of the resource group. In the upper-right corner of the details page, click Specify Threshold for Parallel Threads of Data Scheduling. In the Specify Threshold for Parallel Threads of Data Scheduling dialog box, change the value of the Specify Threshold for Parallel Threads of Data Scheduling parameter.
NoteThe value that you specify for the Specify Threshold for Parallel Threads of Data Scheduling parameter is the upper limit for the number of tasks that can be scheduled in parallel on the resource group. The value is not related to task running or does not limit task running behavior.
Next step: Configure the resource group for different tasks
After the resource group is created and configured, you need to configure the resource group for data synchronization, data scheduling, and DataService Studio tasks to use the resource group to run the tasks. For more information, see General reference: Change the resource groups used by tasks.
Other operations
References
For more information about resource groups, see Overview of DataWorks resource groups.
You can use the intelligent monitoring feature provided in Operation Center to monitor the resource usage of a resource group and the number of instances that are waiting for resources in a resource group. For more information about how to use the intelligent monitoring feature, see Create a custom alert rule.
When you view the status of a resource group on the Resource Groups page, take note of the following items:
If the resource group is expired, you can click the icon in the Actions column and then click Renew to renew the resource group.
If the resource usage of the resource group reaches the warning threshold, you can click the icon in the Actions column and then click Scale Out to scale out the resource group. For more information, see the Scale out or in a resource group section in this topic.
If a specific development environment, such as an environment with third-party library dependencies, is required for running your tasks on a serverless resource group, you can create a custom image that contains all required development packages and dependencies. Then, you can use the custom image as the runtime environment when you run tasks on the serverless resource group.