DataWorks supports shared resource groups, exclusive resource groups, and custom resource groups. This topic describes the scenarios and methods of using these resource groups.

DataWorks supports shared resource groups, exclusive resource groups, and custom resource groups for scheduling and for Data Integration. Resources for both scheduling and Data Integration are managed by the scheduling system. If you use data integration nodes in DataWorks, you must distinguish between resource groups for scheduling and resource groups for Data Integration.

Shared resource groups

When a tenant activates DataWorks, the system automatically creates shared resource groups. All workspaces of the tenant share these resource groups. You can directly use shared resource groups to run nodes without configurations or operations. However, you must specify the nodes to be run on shared resource groups.

This section describes the details about shared resource groups.
  • Limits:
    • The maximum resources that shared resource groups can schedule are fixed. All workspaces of a tenant share these resource groups. This may lead to the situation where some workspaces preempt resources. In this case, nodes that use resources may not be run as scheduled as early as possible due to insufficient resources.
    • Shared resource groups are in the DataWorks shared cluster where all tenants share resources. During peak hours, some tenants may preempt resources. In this case, nodes that use resources may not be run as scheduled as early as possible due to insufficient resources.
  • Scenarios:
    • Shared resource groups are automatically created when a tenant activates DataWorks. You can use shared resource groups to perform operations. For example, you can develop data or run tests.
      Note DataWorks provides shared resource groups for free.
    • Shared resource groups are suitable for scenarios where a small number of nodes are run with no demand for the timeliness of data output.
  • Features:
    Shared resource groups support all node types, but not the following nodes:
    • Nodes that must be configured with a whitelist to use external services through the Internet.
    • Nodes that must connect to a virtual private cloud (VPC).
    • For nodes that must connect to the Internet, we recommend that you use other resource groups.

    Shared resource groups provide the security sandbox feature for nodes.

  • Billing:

    You are charged for instances and data synchronization threads based on the pay-as-you-go billing method.

Exclusive resource groups for scheduling

DataWorks provides exclusive resource groups for scheduling and exclusive resource groups for Data Integration. Data Integration is a DataWorks service. You must purchase exclusive resources for Data Integration together with exclusive resources for scheduling.

This section describes the details about exclusive resource groups for scheduling.
  • Exclusive resource groups for scheduling are available for all DataWorks editions.
  • Limits:
    • Exclusive resource groups do not support cross-tenant collaboration nodes and Machine Learning experiment nodes.
    • Before you change the resource group for a node, you must confirm the network connectivity between the node and the selected resource group.
  • Scenarios:
    • The resources in exclusive resource groups can be scheduled at any time to ensure the data output of nodes. We recommend that you use exclusive resource groups for production nodes.
    • You must use exclusive resource groups in scenarios where a large number of nodes are run and data output is required as early as possible.
    • If nodes must connect to the Internet or a VPC, use exclusive resource groups.
    • If you need to configure a whitelist for nodes to use external services, use exclusive resource groups.
    • If your VPC is connected to a data center and your nodes need to connect to the data center, bind exclusive resource groups to the VPC.
  • Billing:

    If more than 500 instances are scheduled, no instance fee but only a resource fee is charged when you use exclusive resource groups.

When you change the resource group for a node to an exclusive resource group for scheduling, take note of the following items:
  • If you change the resource group for an auto triggered node in Operation Center, the changed resource group takes effect on the next day. You can run a test or generate retroactive data to ensure that the changed resource group takes effect immediately.
  • Before you change the resource group for multiple nodes, run a test for each node separately.
  • For cases where network connectivity is required, you must configure network connectivity and verify network connectivity during testing.
To change the resource group for a node to an exclusive resource group for scheduling, perform the following steps:
  1. Purchase exclusive resources for scheduling.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Resource Groups to go to the Exclusive Resource Groups tab.
    3. Click Create a dedicated resource group.
    4. In the Create a dedicated resource group panel, select Exclusive Resource Groups for the Resource Group Type parameter and click Purchase next to the Order Number parameter to go to the buy page.
    5. On the buy page, set the Region, Type, Exclusive scheduling resources, Units, and Duration parameters as required and click Buy Now.
      Note
      • You must select Exclusive scheduling resources for the Type parameter.
      • You can determine the quantity of resources based on the actual number of concurrent nodes. For more information, see Scenario 3 in the Purchase guide topic.
      • We recommend that you purchase at least two exclusive resource groups to ensure the effectiveness of disaster recovery.
    6. After you confirm that the order information is correct, read and agree to DataWorks Exclusive Resources Agreement of Service by selecting the check box and click Pay.
  2. Create an exclusive resource group for scheduling.
    1. On the Resource Groups > Exclusive Resource Groups tab, click Create a dedicated resource group.
    2. In the Create a dedicated resource group panel, set the parameters as required.
      Parameter Description
      Resource Group Type The type of the exclusive resource group. A value of Exclusive Resource Groups indicates an exclusive resource group that is used to schedule general nodes. A value of Exclusive Resource Groups for Data Integration indicates an exclusive resource group that is used to synchronize data
      Resource Group Name The name of the exclusive resource group, which must be unique within all resource groups of a tenant.
      Note A tenant indicates an Alibaba Cloud account. Multiple RAM users may exist under a tenant.
      Resource Group Description The description of the exclusive resource group.
      Order Number The order number of the purchased exclusive resources. If you have not purchased exclusive resources, click Purchase to go to the buy page and purchase exclusive resources.
    3. After you complete the configuration, click OK to create the exclusive resource group.
      Note The exclusive resource group is initialized within 20 minutes. Wait and then click the Refresh icon to confirm that its status is updated to Running.
  3. Find the required exclusive resource group and click Change Workspace.
  4. In the Modify home workspace dialog box, select the required workspace and click OK.
  5. Change the resource group for scheduling for a node.
    You can change the resource group for scheduling for a node in one of the following ways:
    • Change the resource group for scheduling for multiple nodes at a time on the Nodes tab in DataStudio.
      1. Go to the DataStudio page.
      2. Click the Nodes icon on the right side of Business Flow.
      3. On the Nodes tab, select one or more nodes for which you want to change the resource group for scheduling and click Modify Scheduling Resource Group in the lower-right corner.
      4. In the Modify Scheduling Resource Group dialog box, select the required resource group, set the Force modification parameter, and then enter yes to indicate that you are aware of the risk and want to confirm the action.
      5. Click OK to change the resource group for scheduling for multiple nodes at a time.
    • Change the resource group for scheduling for a node in Operation Center
      1. Click the Icon icon in the upper-left corner and choose All Products > Task Operation > Operation Center. In Operation Center, switch to the required workspace.
      2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
      3. Click the rightwards arrow in the middle of the page to show the node list. Find the required node, and choose MoreModifying a scheduling Resource Group.Change the resource group
        Notice You are not allowed to change the resource group for zero load nodes, workflow nodes, or Machine Learning experiment nodes.
        To change the resource group for multiple nodes at a time, select the required nodes on the Cycle Task page and click Modifying a scheduling Resource Group in the lower part of the page.Change the resource group for multiple nodes at a time
      4. In the Modify scheduling resource groups in batches dialog box, select the required resource group for scheduling and click OK.
    • Change the resource group for scheduling for a node on the Properties tab
      1. Click the Icon icon in the upper-left corner and choose All Products > Data Development > DataStudio. Switch to the required workspace.
      2. On the Data Analytics tab, double-click the required node to go to the node configuration tab.
      3. In the right-side navigation pane, click the Properties tab. In the Resource Group section, select the required resource group for node scheduling.Resource Group

      You can also click the Run icon on the node configuration tab. In the Arguments dialog box, select the required resource group for scheduling to run a test for the node on the DataStudio page and click OK.

  6. After the change is complete, go to the Cycle Task Maintenance > Cycle Task page. Right-click the node in the directed acyclic graph (DAG), and then select View node details to view the resource group of the node.

    After you change the resource group for a node, proceed to run a test or generate retroactive data for the node. You can view the operational logs to determine whether the node is running on the exclusive resource group.

    If the node is run and the logs are correct, the node is running on the exclusive resource group.

Exclusive resource groups for Data Integration

This section describes the details about exclusive resource groups for Data Integration.
  • Exclusive resource groups for Data Integration are available for all DataWorks editions.
  • Limits: Exclusive resource groups for Data Integration are deployed in VPCs. If a data store is deployed on the classic network, you cannot use an exclusive resource group for Data Integration to migrate data from or to this data store.
  • Scenarios:
    • The resources in exclusive resource groups for Data Integration can be scheduled at any time to ensure the data output of nodes. We recommend that you use exclusive resource groups for production nodes.
    • You must use exclusive resource groups in scenarios where a large number of nodes are run and data output is required as early as possible.
    • If nodes must connect to the Internet or a VPC, use exclusive resource groups.
    • If you need to configure a whitelist for nodes to use external services, use exclusive resource groups.
    • If your VPC is connected to a data center and your nodes need to connect to the data center, bind exclusive resource groups to the VPC.
Notice
  • For cases where network connectivity is required, you must configure network connectivity and verify network connectivity during testing.
  • You must change the resource group for a sync node to an exclusive resource group for Data Integration on the DataStudio page. After you deploy the node, the changed resource group immediately takes effect.
  • You cannot change the resource group for multiple nodes to an exclusive resource group for Data Integration at a time.
To change the resource group for a node to an exclusive resource group for Data Integration, perform the following steps:
  1. Purchase exclusive resource groups for Data Integration and create an exclusive resource group for Data Integration. For more information, see Exclusive resource group for Data Integration.
  2. Change the resource group for Data Integration for a sync node.
    You can change the resource group for Data Integration for a sync node in one of the following ways:
    • Change the resource group for Data Integration for multiple sync nodes at a time on the Nodes tab in DataStudio.
      1. Go to the DataStudio page.
      2. Click the Nodes icon on the right side of Business Flow.
      3. On the Nodes tab, select one or more sync nodes for which you want to change the resource group for Data Integration and click Modify a data integration Resource Group in the lower-right corner.
      4. In the Modify a data integration Resource Group dialog box, select the required resource group, specify whether to enable Force modification, and then enter yes to indicate that you are aware of the risk and want to confirm the action.
      5. Click OK to change the resource group for Data Integration for multiple sync nodes at a time.
    • Change the resource group for Data Integration for a node in Operation Center
      1. Click the Icon icon in the upper-left corner and choose All Products > Task Operation > Operation Center. In Operation Center, switch to the required workspace.
      2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
      3. On the Cycle Task page, select one or more nodes for which you want to change the resource group for Data Integration and click Modify a data integration Resource Group in the lower part of the page.
      4. In the Modify a data integration Resource Group dialog box, select the required resource group and click OK.
    • Change the resource group for a sync node on the Resource Group configuration tab in DataStudio.
      1. Click the Icon icon in the upper-left corner and choose All Products > Data Development > DataStudio. Switch to the required workspace.
      2. On the Data Analytics tab, double-click the required node to go to the node configuration tab.
      3. Click the Resource Group configuration tab in the right-side navigation pane.
      4. In the Resource Group configuration tab, select the required programme and resource group.Change the resource group
      5. After you complete the configuration, click the Save icon in the top toolbar.
      You can also click the Run icon on the node configuration tab. In the Arguments dialog box, select the required resource group for scheduling to run a test for the node on the DataStudio page and click OK.Change the value of the parameter
  3. After you change the resource group for the sync node, click the Save icon and then click the Submit icon.
  4. Deploy the node. For more information, see Deploy a node.
  5. After the node is deployed, go to the Operation Center > Cycle Task Maintenance > Cycle Task page. Run a test or generate retroactive data for the node. The operations are the same as those for a node whose resource group is changed to an exclusive resource group for scheduling.

VPC binding

  • Limits

    You must create exclusive resource groups in the same zone of the same region as your VPC.

    If an exclusive resource group is created in the same region but a different zone from your VPC, add a route between them. For more information, see Add a route.

  • Scenarios
    • If nodes must connect to your VPC, bind exclusive resource groups to the VPC.
    • If your VPC is connected to a data center and your nodes need to connect to the data center, bind exclusive resource groups to the VPC.
  • Procedure
    1. Find the required resource group and click Add VPC Binding.

      Before the binding, authorize DataWorks to access your cloud resources in the RAM console.

    2. After the authorization is complete, click Add Binding.
    3. In the Add VPC Binding panel, set the parameters as required and click OK.
      Notice
      • The exclusive resource group and the data store to be connected must be in the same zone. When you bind a VPC, select the vSwitch to which the data store is bound.
      • If no vSwitch or security group can be used, click Create VSwitch or Create Security Group. Make sure that you create a vSwitch or a security group in the same zone of the VPC.
    4. To configure a whitelist for your cloud service instance, go back to the Exclusive Resource Groups tab. Find the required exclusive resource group and click View Information.
      • Add the elastic IP address (EIP) and Classless Inter-Domain Routing (CIDR) block in the dialog box to the whitelist of the required cloud service instance.
      • Add the internal CIDR block of the vSwitch that is bound to the cloud service instance to the whitelist of the cloud service instance.
      • If a problem occurs in the whitelist that is enabled for MaxCompute after a node is changed to run on an exclusive resource group,submit a ticket.

O&M Assistant

O&M Assistant is applicable to scenarios where resource packages, such as Python third-party packages, are not installed and some special scripts are regularly used.

Take note of the specified installation directory. For more information, see O&M Assistant.

Custom resource groups

Notice Custom resource groups support only data integration nodes and Shell nodes.
  • Limits
    • You must activate DataWorks Enterprise Edition or a more advanced editionbefore you can submit a ticket to enable the whitelist and use custom resource groups for scheduling.
    • You must activate DataWorks Professional Edition or a more advanced edition to use custom resource groups for Data Integration.
  • Scenarios
    • Network: When you synchronize data, you must connect to your data center.
    • Environment: If the latest Python version or a Java Development Kit (JDK) is required, you can use an Elastic Compute Service (ECS) instance that meets the environment requirements to create a custom resource group.
    • Migration: If a local node already exists, you can directly schedule the node on your own server to reduce the workload of script migration.
  • Operations
    • For more information about custom resource groups for Data Integration, see Create a custom resource group for Data Integration.
    • To create a custom resource group for scheduling, perform the following steps:
      1. In the left-side navigation pane of the DataWorks console, choose Resource Groups > Custom Resource Groups.
      2. Click Add scheduling resources in the upper-right corner.
      3. In the Add scheduling resources dialog box, enter a resource name, select the required workspace, and then click Confirm.
      4. Find the created custom resource group and click Server Management. In the Management Server dialog box, click Add server.
      5. In the Add scheduling resources dialog box, set the parameters as required and click Confirm.
        Parameter Description
        Network type Only VPC is supported.
        ECS UUID Enter the universally unique identifier (UUID) of the ECS instance. You can run the dmidecode | grep UUID command to query the UUID.
        Machine IP The internal IP address of the ECS instance. You can log on to the ECS instance and run the hostname -i command to query the internal IP address.
      6. After the server is added, go back to the Custom Resource Groups tab and refresh the page. Find the created custom resource group and click Server initialization. Log on to the ECS instance and follow the initialization procedure as prompted.
  • The procedure for changing the resource group for a node to a custom resource group is the same as that to an exclusive resource group.
    • To change the resource group for a node to a custom resource group for scheduling, go to Operation Center.
    • To change the resource group for a node to a custom resource group for Data Integration, go to the DataStudio page, change the resource group, and then commit and deploy the node.