This topic describes how to create a custom resource group for Data Integration and use this resource group to run batch synchronization nodes.

Prerequisites

DataWorks Professional Edition or a more advanced edition is purchased so that you can use custom resource groups for Data Integration.

Background information

If the shared resource groups of DataWorks do not support your data sources or you want to speed up data transmission, you can create custom resource groups to run your synchronization nodes.

A workspace administrator can create or modify custom resource groups on the Custom Resource Groups page of Data Integration.
Note
  • When you register an Elastic Compute Service (ECS) instance for hosting a custom resource group, you can set the network type to Classic Network only if the ECS instance is in the China (Shanghai) region. In this case, you must enter the hostname of the ECS instance. We recommend that you set the network type to VPC. You can set the network type to VPC only for ECS instances in other regions. In this case, you must enter the universally unique identifier (UUID) of the ECS instance to be registered.
  • Only the administrator has the permission to access specific files on the ECS instance that hosts a custom resource group. For example, you must use the admin account to call shell or Structured Query Language (SQL) files on the ECS instance when you write a shell script for a node.
  • Resource groups for scheduling are used to run nodes. These resource groups have limited resources and are not suitable for computing nodes. Therefore, we recommend that you do not create custom resource groups on the ECS instances that host a resource group for scheduling. MaxCompute can process large amounts of data. We recommend that you use MaxCompute for big data computing.
Custom resource groups for Data Integration are subject to the following limits:
  • The difference between the time of the ECS instance where a custom resource group for Data Integration resides and the current Internet time must be within 2 minutes. Otherwise, service requests may time out, and nodes that are run on the custom resource group for Data Integration may fail to be run.
  • You can add only one custom resource group for Data Integration to one ECS instance. You can select only one network type for each custom resource group for Data Integration.
  • Custom resource groups for Data Integration that you added on the Custom Resource Groups page of Data Integration can only run synchronization nodes in the current workspace. They do not appear in the resource group list.

    Custom resource groups for Data Integration that you added on the Custom Resource Groups page cannot run synchronization nodes in a manually triggered workflow.

If the timeout error message response code is not 200 appears in the log file of alisatasknode, the custom resource group for Data Integration was not accessible within a specific period. The ECS instance that hosts the custom resource group for Data Integration can continue to work if the error persists for no more than 10 minutes. To find the error details, view the heartbeat.log file in the /home/admin/alisatasknode/logs directory.

Purchase an ECS instance

Purchase an ECS instance.
Note
  • CentOS V6, CentOS V7, or AliOS is recommended.
  • If the added ECS instance needs to run MaxCompute nodes or synchronization nodes, verify that the current Python version of the ECS instance is 2.6 or 2.7. The Python version of CentOS V5 is 2.4, whereas those of other operating systems are later than 2.6.
  • Make sure that the ECS instance can access the Internet. Ping www.alibabacloud.com on the ECS instance to check whether the URL can be pinged.
  • We recommend that you purchase an ECS instance with 8 vCPUs and 16 GiB of memory.

View the hostname and internal IP address of the ECS instance

Log on to the ECS console. In the left-side navigation pane, choose Instances & Images > Instances. On the Instances page, view the hostname and IP address of the purchased ECS instance. View the ECS instance information

Enable port 8000

Note You do not need to enable port 8000 if your ECS instance is in a virtual private cloud (VPC). The steps in this section apply only to ECS instances in the classic network.

To enable port 8000 for reading logs, perform the following steps:

  1. Log on to the ECS console. In the left-side navigation pane, choose Network & Security > Security Groups.
  2. Find the security group that you want to manage and click Add Rules in the Actions column.
  3. On the Security Group Rules page, click the Inbound tab and click Add Rule.
  4. In the new row that appears, set the Port Range parameter to 8000/8000 and the Authorization Object parameter to the fixed IP address of Data Integration.
    Add Rule
  5. Click Save.

Create a custom resource group for Data Integration

  1. Go to the Data Integration page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration.
  2. In the left-side navigation pane, click Custom Resource Group.
  3. On the Custom Resource Groups page, click Add Resource Group in the upper-right corner.
    Add Resource Group
    Notice By default, the Custom Resource Groups page displays only your custom resource groups and does not display your shared resource groups.
  4. In the Add Resource Group wizard, perform the following steps:
    1. In the Create Resource Group step, set the Resource Group Name parameter.
      Note The name can contain letters, digits, and underscores (_) and must start with a letter.
    2. Click Next.
    3. In the Add Server step, set the parameters.
      Add Server
      Parameter Description
      Network Type The network type of the ECS instance that hosts your custom resource group. You can set the network type to Classic Network only if the ECS instance is in the China (Shanghai) region. If your ECS instance is in a region other than the China (Shanghai) region, set this parameter to VPC.
      Server Name or ECS UUID The hostname or the UUID of the ECS instance that hosts the custom resource group.
      • If you set Network Type to Classic Network, you must set the Server Name parameter.

        To obtain the hostname, log on to the ECS instance and run the hostname command.

      • If you set Network Type to VPC, you must set the ECS UUID parameter.

        To obtain the UUID, log on to the ECS instance and run the dmidecode | grep UUID command.

      Server IP Address The private IP address of the ECS instance.
      Server CPU (Cores) The number of vCPUs of the ECS instance. We recommend that you configure at least four vCPUs for an ECS instance that hosts a custom resource group.
      Server RAM (GB) The memory of the ECS instance. We recommend that you configure at least 8 GB of RAM and 80 GB of disk space for an ECS instance that hosts a custom resource group.
    4. Click Next.
    5. Perform the steps that are listed in the Install Agent step.
      Install Agent
      Note If an error occurs when you run the install.sh script or you need to run it again, run the rm –rf install.sh command in the same directory as the install.sh script to delete the generated file. Then, run the install.sh script again.

      The commands required during the installation and initialization process differ for each user. Run relevant commands based on the instructions on the initialization interface.

    6. Click Next.
    7. In the Test Connection step, click Refresh and check the status of the instance.
    8. Click Complete.
If the instance remains in the Stopped state after the preceding steps, the hostname may not be bound to an IP address, as shown in the following figure. Stopped
  1. Log on to the ECS instance by using Secure Shell (SSH) as the admin user.
  2. Run the hostname -i command to view the binding information of the hostname.
  3. Run the vim/etc/hosts command to add the binding of the IP address and hostname.
  4. Refresh the page to check whether the ECS instance is registered.
    If the ECS instance is still in the Stopped state after you refresh the page, perform the following steps to restart alisatasknode:
    1. Log on to the ECS instance by using SSH as the admin user.
    2. Run the following command:
      /home/admin/alisatasknode/target/alisatasknode/bin/serverctl restart
      Note You must enter your AccessKey pair when you run this command. Keep your AccessKey secret strictly confidential.

Configure the resource group for Data Integration

  1. Click the Icon icon in the upper-left corner of the Data Integration page and choose All Products > Data Development > DataStudio.
  2. In the upper-left corner of the page that appears, select the workspace to which your resource group for Data Integration belongs.
  3. On the Data Analytics tab, expand the workflow where the batch synchronization node you want to configure resides, find the batch synchronization node in the Data Integration folder, and then double-click it.
  4. On the configuration tab of the node, click the Resource Group configuration tab in the right-side navigation pane.
  5. In the lower-right corner of the Resource Group configuration tab, click More.
  6. On the Resource Group configuration tab, set Programme and select a resource group based on your business requirements.
    Select a custom resource group
  7. On the configuration tab of the node, click the Save icon in the toolbar.