DataWorks supports default resource groups, exclusive resource groups, and custom resource groups. This topic describes how to use these resource groups and their scenarios.

DataWorks supports default resource groups, exclusive resource groups, and custom resource groups for scheduling and for Data Integration. DataWorks manages resources for both scheduling and Data Integration. If you use data integration nodes in DataWorks, you must distinguish between resource groups for scheduling and resource groups for Data Integration.

Default resource groups

When a tenant activates DataWorks, the system automatically creates default resource groups. All workspaces of the tenant share default resource groups. You can directly use default resource groups to run nodes without configurations or operations. However, you must specify the nodes to be run on default resource groups.
  • Limits
    • The maximum resources that default resource groups can schedule are fixed. All workspaces of a tenant share default resource groups. This may lead to the situation that some workspaces preempt resources. In this case, nodes may not obtain resources and be scheduled in a timely manner.
    • Default resource groups are in the DataWorks shared cluster where all tenants share resources. During peak hours, some tenants may preempt resources. In this case, nodes may not obtain resources and be scheduled in a timely manner.
  • Scenarios
    • Default resource groups are automatically created when a tenant activates DataWorks. You can use default resource groups to perform operations, for example, to develop data and run tests.
      Note Currently, DataWorks provides default resource groups for free.
    • Default resource groups are suitable for scenarios where a small number of nodes are run with no demand for timeliness of data output.
  • Features
    Default resource groups do not support the following nodes:
    • Nodes that need to be configured with a whitelist to access external services.
    • Nodes that need to access a Virtual Private Cloud (VPC).
    • For nodes that need to access the Internet, we recommend that you use other resource groups.

    Default resource groups provide the security sandbox feature for nodes.

  • Billing

    Default resource groups bill you for instances and data synchronization threads in pay-as-you-go mode.

Exclusive resource groups for scheduling

Currently, DataWorks provides exclusive resource groups for scheduling and exclusive resource groups for Data Integration. Exclusive resource groups for Data Integration are MaxCompute computing resource groups. Data Integration is a DataWorks service. You must purchase exclusive resources for Data Integration together with exclusive resources for scheduling.
  • Exclusive resource groups for scheduling are available for all DataWorks editions.
  • Limits
    • Currently, exclusive resource groups do not support cross-tenant collaboration nodes and Machine Learning experiment nodes.
    • You must purchase exclusive resources in the same zone as your VPC.
    • Before you change the resource group for a node, you must guarantee network connectivity required by the node and the target resource group.
  • Scenarios
    • The resources in exclusive resource groups can be scheduled at any time to guarantee the data output of nodes. We recommend that you use exclusive resource groups for production nodes.
    • You must use exclusive resource groups in scenarios where a large number of nodes are run and data output is required in a timely manner.
    • If nodes need to access the Internet or a VPC, use exclusive resource groups.
    • If you need to configure a whitelist for nodes to access external services, use exclusive resource groups.
    • If your VPC is connected to an Internet data center (IDC) and your nodes need to access the IDC, bind exclusive resource groups to the VPC.
  • Billing

    If more than 500 instances are scheduled, no instance fee but only a resource fee is charged when you use exclusive resources.

To change the resource group for one or more nodes to an exclusive resource group for scheduling, follow these steps:
Notice
  • If you change the resource group for an auto triggered node in Operation Center, the changed resource group takes effect on the next day. You can run a test or generate retroactive data to make sure that the changed resource group takes effect immediately.
  • Before you change the resource group for multiple nodes, run a test for each node separately.
  • For cases where network connectivity is required, you must configure network connectivity first and verify network connectivity during testing.
  1. Purchase exclusive resources for scheduling.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Resource Groups. The Exclusive Resource Groups tab appears.
    3. Click Add Exclusive Resource Group.
    4. In the Add Exclusive Resource Group pane that appears, select Exclusive Resource Groups for Resource Group Type and click Purchase next to Order Number to go to the purchase page.
    5. On the purchase page, set the region, exclusive resource type, exclusive resource specifications, quantity of resources, and billing cycle as required and click Buy Now.
      Note
      • You must select exclusive resources for scheduling as the exclusive resource group type.
      • You can determine the quantity of resources based on the actual number of concurrent nodes. For more information, see Scenario 3 in the Purchase guide topic.
      • We recommend that you create at least two exclusive resource groups to implement disaster recovery and protect data against disasters.
      • Exclusive resources purchased and data stores to be accessed must be in the same region and zone. For example, the exclusive resources purchased in the China (Shanghai) region can only be used by workspaces in the China (Shanghai) region.
    6. After you confirm that the order information is correct, select the check box for DataWorks Exclusive Resources Agreement of Service and click Pay.
  2. Create an exclusive resource group for scheduling.
    1. On the Exclusive Resource Groups tab, click Add Exclusive Resource Group.
    2. In the Add Exclusive Resource Group pane that appears, set relevant parameters.
      Parameter Description
      Resource Group Type The type of the exclusive resource group. The valid values are Exclusive Resource Groups and Exclusive Resource Groups for Data Integration. The two types of exclusive resource groups are applicable to general node scheduling and data synchronization, respectively.
      Resource Group Name The name of the exclusive resource group. The name must be unique within all resource groups of a tenant.
      Note A tenant account indicates an Alibaba Cloud account. Multiple Resource Access Management (RAM) users may exist under a tenant account.
      Resource Group Description The description of the exclusive resource group.
      Order Number The order number of the exclusive resource group. If you have not purchased exclusive resources, click Purchase next to Order Number to go to the purchase page and purchase exclusive resources.
      Zone The zone of the exclusive resource group. Select a zone where the data stores to be accessed reside.
    3. After the configuration is completed, click Create to create the exclusive resource group.
      Note The exclusive resource group is initialized within 20 minutes. Wait and then click the Refresh button to confirm that its status is updated to Running.
  3. Click Change Workspace in the Actions column of the exclusive resource group. In the Change Workspace dialog box that appears, select the desired workspace and click OK.
  4. Go to the DataStudio page of a workspace. Click the icon in the upper-left corner and choose All Products > Operation Center. On the Operation Center page that appears, switch to the target workspace.
  5. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task. Select one or more nodes that need to change the resource group and choose More > Change Resource Group in the lower-left corner of the page.
    Notice Changing the resource group is not supported for nodes such as zero-load nodes, workflow nodes, and Machine Learning experiment nodes. Do not select these nodes.
  6. In the Change Resource Group dialog box that appears, select an exclusive resource group for scheduling and click OK.
  7. View the DAG of the target auto triggered node, right-click the node in the DAG, and then select View Node Details to view the resource group information.

    After you change the resource group for a node, proceed to run a test or generate retroactive data for the node. You can view the operational logs to determine whether the node runs on the target exclusive resource group.

    If the node is run and the logs are correct, the node is running on the target exclusive resource group.

To change the resource group for multiple nodes of the same type to an exclusive resource group, go to the Cycle Task page in Operation Center. Select the target node type from the Node Type drop-down list, select multiple nodes, and then click Change Resource Group at the bottom of the page.

In the Change Resource Group dialog box that appears, select an exclusive resource group for scheduling and click OK.

Exclusive resource groups for Data Integration

  • Exclusive resource groups for Data Integration are available for all DataWorks editions.
  • Limits
    • Exclusive resource groups for Data Integration are deployed in VPCs. If a data store is deployed on the classic network, you cannot use an exclusive resource group for Data Integration to migrate data from or to this data store.
    • You must purchase exclusive resources in the same zone as your VPC.
  • Scenarios
    • The resources in exclusive resource groups for Data Integration can be scheduled at any time to guarantee the data output of nodes. We recommend that you use exclusive resource groups for production nodes.
    • You must use exclusive resource groups in scenarios where a large number of nodes are run and data output is required in a timely manner.
    • If nodes need to access the Internet or a VPC, use exclusive resource groups.
    • If you need to configure a whitelist for nodes to access external services, use exclusive resource groups.
    • If your VPC is connected to an IDC and your nodes need to access the IDC, bind exclusive resource groups to the VPC.
To change the resource group for a node to an exclusive resource group for Data Integration, follow these steps:
Notice
  • Before you change the resource group for multiple nodes, run a test for each node separately.
  • For cases where network connectivity is required, you must configure network connectivity first and verify network connectivity during testing.
  • You must change the resource group for a sync node to an exclusive resource group for Data Integration on the DataStudio page. After you deploy the node, the changed resource group takes effect immediately.
  1. Purchase exclusive resources for Data Integration and create an exclusive resource group for Data Integration. For more information, see Use exclusive resource groups for data integration.
  2. Go to the DataStudio page. Double-click the target sync node in the workflow.
  3. After you change the resource group for the sync node, click Save icon and Submit icon.
  4. Deploy the node. For more information, see Deploy a node.
  5. After the node is deployed, go to the Cycle Task page in Operation Center. Run a test or generate retroactive data for the node. The operations are the same as those for a node whose resource group is changed to an exclusive resource group for scheduling.
Notice Currently, you cannot change the resource group for multiple nodes to an exclusive resource group for Data Integration at a time.

VPC binding

  • Limits

    You must create exclusive resource groups in the same zone as your VPC.

    If an exclusive resource group is created in a different zone from your VPC, we recommend that you re-create the exclusive resource group. If you cannot re-create the exclusive resource group, submit a ticket.

  • Scenarios
    • If nodes need to access your VPC, bind exclusive resource groups to the VPC.
    • If your VPC is connected to an IDC and your nodes need to access the IDC, bind exclusive resource groups to the VPC.
  • Procedure
    1. Go to the Resource Groups page in the DataWorks console. On the Exclusive Resource Groups tab, find the target exclusive resource group and click Add VPC Binding in the Actions column.

      Before binding the exclusive resource group to your VPC, authorize DataWorks to access your cloud resources in the RAM console.

    2. After the authorization is completed, click Add Binding.
    3. In the Add VPC Binding dialog box that appears, set relevant parameters and click Create.
      Notice
      • The exclusive resource group and the data stores to be accessed must be in the same zone. Select the VSwitch to which the data stores are bound.
      • If no VSwitch or security group is available, click Create VSwitch or Create Security Group. Be sure to create a VSwitch or a security group in the same zone of the VPC.
    4. To configure a whitelist for your cloud service instance, go back to the Exclusive Resource Groups tab. Find the target exclusive resource group and click View Information in the Actions column.
      • In the dialog box that appears, view the Elastic IP Address (EIP) and Classless Inter-Domain Routing (CIDR) block. Add the EIP and CIDR block to the whitelist of the target cloud service instance.
      • Add the internal CIDR block of the VSwitch bound to the cloud service instance to the whitelist of the cloud service instance.
      • If a problem occurs in the whitelist enabled for MaxCompute after a node is changed to run on an exclusive resource group, submit a ticket.

O&M Assistant

O&M Assistant is applicable to scenarios where resource packages, such as Python third-party packages, are not installed and some special scripts are regularly used.

Note that the installation directory is fixed. For more information, see O&M Assistant.

Custom resource groups

Notice Currently, custom resource groups only support data integration nodes and Shell nodes.
  • Limits on editions
    • You must activate DataWorks Enterprise Edition or higher before you can submit a ticket to enable the whitelist and use custom resource groups for scheduling.
    • You must activate DataWorks Professional Edition or higher to use custom resource groups for Data Integration.
  • Scenarios
    • Network: When synchronizing data, you need to access your IDC.
    • Environment: If the latest Python version or a Java Development Kit (JDK) is required, you can use an Elastic Compute Service (ECS) instance that meets the environment requirements as custom resources.
    • Migration: If a local node already exists, you can directly schedule the node on your own server to reduce the workload of script migration.
  • Operations
    • For more information about custom resource groups for Data Integration, see Add a custom resource group.
    • To create a custom resource group for scheduling, follow these steps:
      1. In the DataWorks console, click Resource Groups in the left-side navigation pane. On the page that appears, click the Custom Resource Groups tab.
      2. Click Add Scheduling Resource in the upper-right corner. In the Add Scheduling Resource dialog box that appears, enter a resource name, select the workspace, and then click OK.
      3. On the Custom Resource Groups tab, find the created custom resource group and click Manage Server in the Actions column. In the Manage Server dialog box that appears, click Add Server.
      4. In the Add Scheduling Resource dialog box that appears, set relevant parameters and click OK.
        Parameter Description
        Network Type The network type of the ECS instance to be added as custom resources. Currently, only VPC is supported.
        ECS UUID The universally unique identifier (UUID) of the ECS instance. You can run the dmidecode | grep UUID command on the ECS instance to query the UUID.
        Server IP address The internal IP address of the ECS instance. You can log on to the ECS instance and run the hostname -i command to query the internal IP address.
      5. After the custom resource group is created and custom resources are added, go back to the Custom Resource Groups tab and click the Refresh button. Find the created custom resource group and click Initialize Server in the Actions column. Log on to the ECS instance and follow the initialization procedure as prompted.
  • The procedure for changing the resource group for a node to a custom resource group is the same as that to an exclusive resource group.
    • To change the resource group for a node to a custom resource group for scheduling, go to Operation Center.
    • To change the resource group for a node to a custom resource group for Data Integration, go to the DataStudio page to change the resource group and then commit and deploy the node.