DataWorks supports shared resource groups, exclusive resource groups, and custom resource groups. This topic describes how to use these resource groups and their scenarios.

DataWorks supports shared resource groups, exclusive resource groups, and custom resource groups for scheduling and for Data Integration. DataWorks manages resources for both scheduling and Data Integration. If you use data integration nodes in DataWorks, you must distinguish between resource groups for scheduling and resource groups for Data Integration.

Shared resource groups

When a tenant activates DataWorks, the system automatically creates shared resource groups. All workspaces of the tenant share these resource groups. You can directly use shared resource groups to run nodes without configurations or operations. However, you must specify the nodes to be run on shared resource groups.

This section describes the limits, scenarios, features, and billing of shared resource groups.
  • Limits:
    • The maximum resources that shared resource groups can schedule are fixed. All workspaces of a tenant share these resource groups. This may lead to the situation that some workspaces preempt resources. In this case, nodes may not obtain resources and be scheduled in a timely manner.
    • Shared resource groups are in the DataWorks shared cluster where all tenants share resources. During peak hours, some tenants may preempt resources. In this case, nodes may not obtain resources and be scheduled in a timely manner.
  • Scenarios:
    • Shared resource groups are automatically created when a tenant activates DataWorks. You can use shared resource groups to perform operations, for example, to develop data and run tests.
      Note DataWorks provides shared resource groups for free.
    • Shared resource groups are suitable for scenarios where a small number of nodes are run with no demand for timeliness of data output.
  • Features:
    Shared resource groups do not support the following nodes:
    • Nodes that need to be configured with a whitelist to use external services.
    • Nodes that need to connect to a virtual private cloud (VPC).
    • For nodes that need to connect to the Internet, we recommend that you use exclusive resource groups or custom resource groups.

    Shared resource groups provide the security sandbox feature for nodes.

  • Billing:

    Shared resource groups bill you for instances and data synchronization threads based on the pay-as-you-go billing method.

Exclusive resource groups for scheduling

DataWorks provides exclusive resource groups for scheduling and exclusive resource groups for Data Integration. Data Integration is a DataWorks service. You must purchase exclusive resources for Data Integration together with exclusive resources for scheduling.

This section describes the limits, scenarios, and billing of exclusive resource groups for scheduling.
  • Exclusive resource groups for scheduling are available for all DataWorks editions.
  • Limits:
    • Exclusive resource groups do not support cross-tenant collaboration nodes and Machine Learning experiment nodes.
    • You must create exclusive resource groups in the same zone as your VPC. If an exclusive resource group is created in the same region but a different zone from your VPC, add a route between them. For more information, see Add a route.
    • Before you change the resource group for a node, you must confirm the network connectivity that is required by the node and the target resource group.
  • Scenarios:
    • The resources in exclusive resource groups can be scheduled at any time to guarantee the data output of nodes. We recommend that you use exclusive resource groups for production nodes.
    • You must use exclusive resource groups in scenarios where a large number of nodes are run and data output is required in a timely manner.
    • If nodes need to connect to the Internet or a VPC, use exclusive resource groups.
    • If you need to configure a whitelist for nodes to use external services, use exclusive resource groups.
    • If your VPC is connected to an Internet data center (IDC) and your nodes need to connect to the IDC, bind exclusive resource groups to the VPC.
  • Billing:

    If more than 500 instances are scheduled, no instance fee but only a resource fee is charged when you use exclusive resource groups.

To change the resource group for one or more nodes to an exclusive resource group for scheduling, take note of the following items:
  • If you change the resource group for an auto triggered node in Operation Center, the changed resource group takes effect on the next day. You can run a test or generate retroactive data to make sure that the changed resource group takes effect immediately.
  • Before you change the resource group for multiple nodes, run a test for each node separately.
  • For cases where network connectivity is required, you must configure network connectivity first and verify network connectivity during testing.
To change the resource group for one or more nodes to an exclusive resource group for scheduling, perform the following steps:
  1. Purchase exclusive resources for scheduling.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Resource Groups. The Exclusive Resource Groups tab appears.
    3. Click Create a dedicated resource group.
    4. In the Create a dedicated resource group pane, select Exclusive Resource Groups for Resource Group Type and click Purchase next to Order Number to go to the buy page.
    5. On the buy page, set Region, Type, Exclusive scheduling resources, Units, and Duration as required and click Buy Now.
      Note
      • You must select Exclusive scheduling resources for Type.
      • You can determine the quantity of resources based on the actual number of concurrent nodes. For more information, see Scenario 3 in the Purchase guide topic.
      • We recommend that you create at least two exclusive resource groups to implement disaster recovery and protect data against disasters.
      • Exclusive resources that you purchase and data stores to be connected must be in the same region and zone. For example, exclusive resources that you purchase in the China (Shanghai) region can be used only by workspaces in the China (Shanghai) region.
    6. After you confirm that the order information is correct, read and agree to DataWorks Exclusive Resources Agreement of Service by selecting the check box and click Pay.
  2. Create an exclusive resource group for scheduling.
    1. On the Exclusive Resource Groups tab, click Create a dedicated resource group.
    2. In the Create a dedicated resource group pane, set the parameters as required.
      Parameter Description
      Resource Group Type The type of the exclusive resource group. Valid values: Exclusive Resource Groups and Exclusive Resource Groups for Data Integration. The two types of exclusive resource groups are used to schedule general nodes and sync nodes, respectively.
      Resource Group Name The name of the exclusive resource group. The name must be unique within all resource groups of a tenant.
      Note A tenant indicates an Alibaba Cloud account. Multiple Resource Access Management (RAM) users may exist under a tenant.
      Resource Group Description The description of the exclusive resource group.
      Order Number The order number of the purchased exclusive resources. If you have not purchased exclusive resources, click Purchase next to Order Number to go to the buy page and purchase exclusive resources.
      Zone The zone of the exclusive resource group. Select a zone where the data stores to be connected reside.
    3. After the configuration is completed, click OK.
      Note The exclusive resource group is initialized within 20 minutes. Wait and then click the Refresh button to confirm that its status is updated to Running.
  3. Find the target exclusive resource group and click Change Workspace in the Actions column.
  4. In the Modify home workspace dialog box, select the desired workspace and click OK.
  5. Change the resource group for a node to the created exclusive resource group for scheduling.
    You can change the resource group for a node in Operation Center or on the Scheduling configuration tab of the node.
    • To change the resource group for a node in Operation Center, perform the following steps:
      1. Go to the DataStudio page of a workspace. Click Icon in the upper-left corner and choose All Products > Task Operation > Operation Center. In Operation Center, switch to the target workspace.
      2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task.
      3. On the page that appears, click the rightwards arrow in the middle of the page to show the node list. Find the target node, click More in the Actions column, and then select Modifying a scheduling Resource Group. The Modifying a scheduling Resource Group dialog box appears.Change the resource group for a node
        Notice You are not allowed to change the resource group for zero load nodes, workflow nodes, or Machine Learning experiment nodes.
        If you need to change the resource group for multiple nodes at a time, select the nodes on the Cycle Task page and click Modifying a scheduling Resource Group in the lower part of the page.Change the resource group for multiple nodes at a time
      4. In the Modify scheduling resource groups in batches dialog box, select the created exclusive resource group for scheduling and click OK.
    • To change the resource group for a node on the Scheduling configuration tab, perform the following steps:
      1. Click Icon in the upper-left corner and choose All Products > Data Development > DataStudio. Switch to the target workspace.
      2. On the Data Development tab of the DataStudio page, double-click the target workflow. On the workflow dashboard that appears, double-click the target node to go to the node configuration tab.
      3. In the right-side navigation pane, click the Scheduling configuration tab. In the Resource properties section, select the created exclusive resource group for scheduling. Resource group

      You can also click Run icon on the node configuration tab. In the Parameters dialog box, select the desired exclusive resource group for scheduling to run a test for the node on the DataStudio page and click Confirm.

  6. After you change the resource group for the node, go to Operation Center and choose Cycle Task Maintenance > Cycle Task. On the page that appears, click the target node, right-click the node in the directed acyclic graph (DAG), and then select View Node Details to view the resource group of the node.

    After you change the resource group for a node, proceed to run a test or generate retroactive data for the node. You can view the operational logs to determine whether the node is running on the target exclusive resource group.

    If the node is run and the logs are correct, the node is running on the target exclusive resource group.

Exclusive resource groups for Data Integration

This section describes the limits and scenarios of exclusive resource groups for Data Integration.
  • Exclusive resource groups for Data Integration are available for all DataWorks editions.
  • Limits:
    • Exclusive resource groups for Data Integration are deployed in VPCs. If a data store is deployed on the classic network, you cannot use an exclusive resource group for Data Integration to migrate data from or to this data store.
    • You must create exclusive resource groups for Data Integration in the same zone as your VPC. If an exclusive resource group is created in the same region but a different zone from your VPC, add a route between them. For more information, see Add a route.
  • Scenarios:
    • The resources in exclusive resource groups for Data Integration can be scheduled at any time to guarantee the data output of nodes. We recommend that you use exclusive resource groups for production nodes.
    • You must use exclusive resource groups in scenarios where a large number of nodes are run and data output is required in a timely manner.
    • If nodes need to connect to the Internet or a VPC, use exclusive resource groups.
    • If you need to configure a whitelist for nodes to use external services, use exclusive resource groups.
    • If your VPC is connected to an IDC and your nodes need to connect to the IDC, bind exclusive resource groups to the VPC.
Notice
  • For cases where network connectivity is required, you must configure network connectivity first and verify network connectivity during testing.
  • You must change the resource group for a sync node to an exclusive resource group for Data Integration on the DataStudio page. After you deploy the node, the changed resource group immediately takes effect.
  • You cannot change the resource group for multiple nodes to an exclusive resource group for Data Integration at a time.
To change the resource group for a node to an exclusive resource group for Data Integration, perform the following steps:
  1. Purchase exclusive resources for Data Integration and create an exclusive resource group for Data Integration. For more information, see Use exclusive resource groups for data integration.
  2. Change the resource group for a node to the created exclusive resource group for Data Integration.
    You can change the resource group for a sync node on the Data integration resource group configuration tab.
    1. Click Icon in the upper-left corner and choose All Products > Data Development > DataStudio. Switch to the target workspace.
    2. On the Data Development tab of the DataStudio page, double-click the target workflow. On the workflow dashboard that appears, double-click the target node to go to the node configuration tab.
    3. In the right-side navigation pane, click the Data integration resource group configuration tab.
    4. On the Data integration resource group configuration tab, set Programme to Exclusive data integration Resource Group and select the created exclusive resource group for Data Integration from the Exclusive data integration Resource Group drop-down list.Change the resource group for a node
    5. Click Save icon in the toolbar.
  3. After you change the resource group for the sync node, click Save icon and Submit icon.
  4. Deploy the node. For more information, see Deploy a node.
  5. After the node is deployed, go to the Cycle Task page in Operation Center. Run a test or generate retroactive data for the node. The operations are the same as those for a node whose resource group is changed to an exclusive resource group for scheduling.

VPC binding

  • Limits

    You must create exclusive resource groups in the same zone as your VPC.

    If an exclusive resource group is created in the same region but a different zone from your VPC, add a route between them. For more information, see Add a route.

  • Scenarios
    • If nodes need to connect to your VPC, bind exclusive resource groups to the VPC.
    • If your VPC is connected to an IDC and your nodes need to connect to the IDC, bind exclusive resource groups to the VPC.
  • Procedure
    1. Go to the Resource Groups page in the DataWorks console. On the Exclusive Resource Groups tab, find the target exclusive resource group and click Add VPC Binding in the Actions column.

      Before you bind an exclusive resource group to your VPC, authorize DataWorks to access your cloud resources in the RAM console.

    2. After the authorization is completed, click Add Binding.
    3. In the Add VPC Binding pane, set the parameters as required and click Create.
      Notice
      • The exclusive resource group and the data stores to be connected must be in the same zone. Select the VSwitch to which the data stores are bound.
      • If no VSwitch or security group is available, click Create VSwitch or Create Security Group. Be sure to create a VSwitch or a security group in the same zone of the VPC.
    4. To configure a whitelist for your cloud service instance, go back to the Exclusive Resource Groups tab. Find the target exclusive resource group and click View Information in the Actions column.
      • In the dialog box that appears, view the elastic IP address (EIP) and Classless Inter-Domain Routing (CIDR) block. Add the EIP and CIDR block to the whitelist of the target cloud service instance.
      • Add the internal CIDR block of the VSwitch that is bound to the cloud service instance to the whitelist of the cloud service instance.
      • If a problem occurs in the whitelist that is enabled for MaxCompute after a node is changed to run on an exclusive resource group, submit a ticket.

O&M Assistant

O&M Assistant is applicable to scenarios where resource packages, such as Python third-party packages, are not installed and some special scripts are regularly used.

Note that the installation directory is fixed. For more information, see O&M Assistant.

Custom resource groups

Notice Custom resource groups support only data integration nodes and Shell nodes.
  • Limits
    • You must activate DataWorks Enterprise Edition or a more advanced edition before you can submit a ticket to enable the whitelist and use custom resource groups for scheduling.
    • You must activate DataWorks Professional Edition or a more advanced edition to use custom resource groups for Data Integration.
  • Scenarios
    • Network: When you synchronize data, you need to connect to your IDC.
    • Environment: If the latest Python version or a Java Development Kit (JDK) is required, you can use an Elastic Compute Service (ECS) instance that meets the environment requirements to create a custom resource group.
    • Migration: If a local node already exists, you can directly schedule the node on your own server to reduce the workload of script migration.
  • Operations
    • For more information about custom resource groups for Data Integration, see Add a custom resource group.
    • To create a custom resource group for scheduling, perform the following steps:
      1. In the DataWorks console, click Resource Groups in the left-side navigation pane. On the page that appears, click the Custom Resource Groups tab.
      2. Click Add scheduling resources in the upper-right corner.
      3. In the Add scheduling resources dialog box, enter a resource name, select the workspace, and then click Confirm.
      4. On the Custom Resource Groups tab, find the created custom resource group and click Server Management in the Operation column. In the Management Server dialog box, click Add server.
      5. In the Add scheduling resources dialog box, set the parameters as required and click Confirm.
        Parameter Description
        Network type The network type of the ECS instance to be used to create a custom resource group. Only VPC is supported.
        ECS UUID The universally unique identifier (UUID) of the ECS instance. You can run the dmidecode | grep UUID command on the ECS instance to query the UUID.
        Machine IP The internal IP address of the ECS instance. You can log on to the ECS instance and run the hostname -i command to query the internal IP address.
      6. After the custom resource group is created and custom resources are added, go back to the Custom Resource Groups tab and refresh the page. Find the created custom resource group and click Server initialization in the Operation column. Log on to the ECS instance and follow the initialization procedure as prompted.
  • The procedure for changing the resource group for a node to a custom resource group is the same as that to an exclusive resource group.
    • To change the resource group for a node to a custom resource group for scheduling, go to Operation Center.
    • To change the resource group for a node to a custom resource group for Data Integration, go to the DataStudio page, change the resource group, and then commit and deploy the node.