This topic describes how to create a custom resource group for Data Integration and use this resource group to run batch synchronization nodes.

Prerequisites

DataWorks Professional Edition or a more advanced edition is purchased so that you can use a custom resource group for Data Integration.

Background information

If the shared resource group cannot connect to the data source that you want to access, you can create a custom resource group to run synchronization nodes to speed up data transmission.

A workspace administrator can create or modify a custom resource group on the Custom Resource Groups page of Data Integration.
Note
  • When you register an Elastic Compute Service (ECS) instance for hosting a custom resource group, you can set the network type to Classic Network only if the ECS instance is in the China (Shanghai) region. In this case, you must enter the hostname of the ECS instance. We recommend that you set the network type to VPC. You can set the network type to VPC only for ECS instances in other regions. In this case, you must enter the universally unique identifier (UUID) of the ECS instance to be registered.
  • Only an administrator has permissions to access specific files on the ECS instance that hosts a custom resource group. For example, a workspace administrator can call shell or Structured Query Language (SQL) files on the purchased ECS instance when the workspace administrator runs a shell node.
  • Resource groups for scheduling are used to schedule nodes. These resource groups have limited resources and are not suitable for computing nodes. Therefore, we recommend that you do not create a custom resource group on the ECS instances that host a resource group for scheduling. MaxCompute can process large amounts of data. We recommend that you use MaxCompute for big data computing.
A custom resource group for Data Integration is subject to the following limits:
  • The difference between the time of the ECS instance where a custom resource group for Data Integration resides and the current Internet time must be within 2 minutes. Otherwise, service requests may time out, and nodes that are run on the custom resource group for Data Integration may fail to be run.
  • You can add only one custom resource group for Data Integration to one ECS instance. You can select only one network type for each custom resource group for Data Integration.
  • A custom resource group for Data Integration that you added on the Custom Resource Groups page of Data Integration can only run synchronization nodes in the current workspace. The custom resource group for Data Integration does not appear in the resource group list on the Resource Groups page.

    A custom resource group for Data Integration that you added on the Custom Resource Groups page cannot run synchronization nodes in a manually triggered workflow.

If the timeout error message response code is not 200 appears in the log file of alisatasknode, the custom resource group for Data Integration was not accessible within a specific period of time. It is usually because the service request API is not stable in that period of time. The ECS instance that hosts the custom resource group for Data Integration can continue to work if the error persists for no more than 10 minutes. To find the error details, view the heartbeat.log file in the /home/admin/alisatasknode/logs directory.

Purchase an ECS instance

Purchase an ECS instance.
Note
  • CentOS V6, CentOS V7, or AliOS is recommended.
  • If the added ECS instance needs to run MaxCompute nodes or synchronization nodes, verify that the current Python version of the ECS instance is 2.6 or 2.7. The Python version of CentOS V5 is 2.4, whereas the Python version for other operating systems is later than 2.6.
  • Make sure that the ECS instance can access the Internet. Ping www.alibabacloud.com on the ECS instance to check whether the URL can be pinged.
  • We recommend that you purchase an ECS instance with 8 vCPUs and 16 GiB of memory.

View the hostname and internal IP address of the ECS instance

Log on to the ECS console. In the left-side navigation pane, choose Instances & Images > Instances. On the Instances page, view the hostname and IP address of the purchased ECS instance. View the ECS instance information

Enable port 8000

Note If your ECS instance is in a virtual private cloud (VPC), you do not need to enable port 8000. The steps in this section apply only to ECS instances in the classic network.

To enable port 8000 for reading logs, perform the following steps:

  1. Log on to the ECS console. In the left-side navigation pane, choose Network & Security > Security Groups.
  2. On the Security Groups page, find the security group that you want to manage and click Add Rules in the Actions column.
  3. On the Security Group Rules page, click the Inbound tab and click Add Rule.
  4. In the new row that appears, set the Port Range parameter to 8000/8000 and the Authorization Object parameter to the fixed IP address of Data Integration.
    Add Rule
  5. Click Save.

Create a custom resource group for Data Integration

  1. Go to the Data Integration page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
  2. In the left-side navigation pane, choose Configuration > Custom Resource Group. The Custom Resource Groups page appears.
  3. On the Custom Resource Groups page, click Add Resource Group in the upper-right corner.
    Notice By default, the Custom Resource Groups page displays only the custom resource group that you create and does not display the shared resource group.
  4. In the Add Resource Group wizard, perform the following steps:
    1. In the Create Resource Group step, configure the Resource Group Name parameter.
      Note The name can contain letters, digits, and underscores (_) and must start with a letter.
    2. Click Next.
    3. In the Add Server step, configure the parameters.
      Add Server
      Parameter Description
      Network Type The network type of the ECS instance that hosts your custom resource group. You can set Network Type to Classic Network only if the ECS instance is in the China (Shanghai) region. If your ECS instance is in a region other than the China (Shanghai) region, set this parameter to VPC.
      Server Name or ECS UUID The hostname or the UUID of the ECS instance that hosts the custom resource group.
      • If you set Network Type to Classic Network, you must configure the Server Name parameter.

        To obtain the hostname, log on to the ECS console and run the hostname command.

      • If you set Network Type to VPC, you must configure the ECS UUID parameter.

        To obtain the UUID, log on to the ECS console and run the dmidecode | grep UUID command.

      Server IP Address The private IP address of the ECS instance.
      Server CPU (Cores) The number of vCPUs of the ECS instance. We recommend that you configure at least four vCPUs for an ECS instance that hosts a custom resource group.
      Server RAM (GB) The memory of the ECS instance. We recommend that you configure at least 8 GB of random access memory (RAM) and 80 GB of disk space for an ECS instance that hosts a custom resource group.
    4. Click Next.
    5. Perform the steps that are listed in the Install Agent step.
      Install Agent
      Note If an error occurs when you run the install.sh command or you need to run it again, run the rm –rf install.sh command in the same directory as the install.sh command to delete the generated file. Then, run the install.sh command again.

      The commands required during the installation and initialization process differ for each user. Run relevant commands based on the instructions on the initialization interface.

    6. Click Next.
    7. In the Test Connection step, click Refresh and check the status of the ECS instance.
    8. Click Complete.
If the ECS instance remains in the Stopped state after the preceding steps are performed, the hostname may not be bound to an IP address, as shown in the following figure. Stopped
  1. Log on to the ECS instance by using Secure Shell (SSH) as the admin user.
  2. Run the hostname -i command to view the binding information of the hostname.
  3. Run the vim/etc/hosts command to add the internal IP address and hostname of the ECS instance.
  4. Refresh the page to check whether the ECS instance is registered.
    If the ECS instance is still in the Stopped state after you refresh the page, perform the following steps to restart alisatasknode:
    1. Log on to the ECS instance by using SSH as the admin user.
    2. Run the following command:
      /home/admin/alisatasknode/target/alisatasknode/bin/serverctl restart
      Note You must enter your AccessKey pair when you run this command. Keep your AccessKey secret strictly confidential.

Configure the resource group for Data Integration

  1. Click the Icon icon in the upper-left corner of the Data Integration page and choose All Products > Data Development > DataStudio.
  2. In the upper-left corner of the page that appears, select the workspace to which your resource group for Data Integration belongs.
  3. On the DataStudio page, find a batch synchronization node that is run on the resource group for Data Integration that you want to configure and double-click the node.
  4. In the right-side navigation pane of the configuration tab of the node, click Resource Group configuration.
  5. In the lower-right corner of the Resource Group configuration tab, click More.
  6. On the Resource Group configuration tab, configure the Programme parameter and select a resource group based on your business requirements.
    Custom resource group
  7. On the configuration tab of the node, click the Save icon in the toolbar.

Manage custom resource groups for Data Integration

After you create and configure a custom resource group for Data Integration on the Custom Resource Groups page of Data Integration, you can view the information about and perform operations on the resource group. You can view the network type of the resource group and view the ECS instance that hosts the resource group. You can perform the following operations on the resource group: Initialize the ECS instance that hosts the resource group, manage the resource group, and delete the resource group. Delete a custom resource group for Data Intergration
  • Manage: allows you to view the IP address, status, and usage of the ECS instance that hosts the resource group. You can also change or delete the ECS instance that hosts the resource group or add an ECS instance for the resource group. You can check how to add an ECS instance in the Add Resource Group wizard.
    Note
    • If the value of Resource Usage for an ECS instance is not 0%, nodes are running on the ECS instance that hosts the resource group.
    • After you add an ECS instance for a resource group, you must initialize the ECS instance.
  • Initialize Server: After you add an ECS instance for the resource group, you must initialize the ECS instance.
    Click Initialize Server and perform the steps that are described in the following figure to initialize the ECS instance. Initialize Server
  • Delete: allows you to delete a custom resource group for Data Integration.
    Note DataWorks does not allow you to delete a resource group on which nodes are run. Before you delete a resource group, make sure that no nodes that are in the Running state exist in the resource group.

    To view the status of a node, perform the following steps: Choose Operation Center > Cycle Task Maintenance > Cycle Task, filter nodes by resource group name, and then view the status of the nodes. For more information, see View auto triggered nodes.