This topic describes how to create a custom resource group for Data Integration and select a resource group for Data Integration to run batch sync nodes.

Prerequisites

DataWorks Professional Edition or a more advanced edition is purchased so that you can use custom resource groups for Data Integration.

Background information

You can create custom resource groups to run your sync nodes if the shared resource group of DataWorks does not support your connection, or you want to improve the data transmission speed.

A workspace administrator can create or modify custom resource groups on the Custom Resource Group page of Data Integration.
Note
  • When you register an Elastic Compute Service (ECS) instance for hosting a custom resource group, you can set the network type to Classic Network only when the ECS instance is in the China (Shanghai) region. In this case, you must enter the hostname of the ECS instance. We recommend that you set the network type to VPC. You can set the network type to VPC only for ECS instances in other regions. In this case, you must enter the universally unique identifier (UUID) of the ECS instance to be registered.
  • The admin permission is required to access some files on the ECS instance that hosts a custom resource group. For example, the admin permission is required to call shell or SQL files on the ECS instance when you write a shell script for a node.
  • A resource group for scheduling is used to schedule nodes. They have limited resources and are not suitable for computing nodes. Therefore, we recommend that you do not create custom resource groups on ECS instances of a resource group for scheduling. MaxCompute can process a large amount of data. We recommend that you use MaxCompute for big data computing.
Custom resource groups for Data Integration are subject to the following limits:
  • The difference between the time of the ECS instance where a custom resource group for Data Integration resides and the current Internet time must be within 2 minutes. Otherwise, service requests may time out and nodes may fail to be run on the custom resource group for Data Integration.
  • You can add only one custom resource group for Data Integration on an ECS instance. You can select only one network type for each custom resource group for Data Integration.
  • Custom resource groups for Data Integration that you added on the Custom Resource Group page of Data Integration can only run sync nodes in the current workspace. They do not appear in the resource group list.

    Custom resource groups for Data Integration that you added on the Custom Resource Group page cannot run sync nodes in a manually triggered workflow.

If the timeout error message response code is not 200 exists in the log file of alisatasknode, the custom resource group for Data Integration was inaccessible within the specific time period. The ECS server that hosts the custom resource group for Data Integration can continue to work if the exception persists for no more than 10 minutes. To find the exception details, view the heartbeat.log file in the /home/admin/alisatasknode/logs directory.

Purchase an ECS instance

Purchase an ECS instance
Note
  • CentOS V6, CentOS V7, or AliOS is recommended.
  • If the added ECS instance needs to run MaxCompute nodes or sync nodes, verify that the current Python version of the ECS instance is V2.6 or V2.7. The Python version of CentOS V5 is V2.4, whereas those of other operating systems are later than V2.6.
  • Make sure that the ECS instance can access the Internet. Ping www.alibabacloud.com on the ECS instance and check whether the URL can be pinged.
  • We recommend that you configure the ECS instance with an 8-core CPU and 16 GB memory.

View the hostname and internal IP address of the ECS instance

Log on to the ECS console. In the left-side navigation pane, choose Instances & Images > Instances. On the page that appears, view the hostname and IP address of the purchased ECS instance.View the ECS instance information

Enable port 8000

Note You do not need to enable port 8000 if your ECS instance is in a virtual private cloud (VPC). Steps in this section apply to ECS instances on the classic network only.

To enable port 8000 for reading logs, perform the following steps:

  1. Log on to the ECS console.In the left-side navigation pane, choose Network & Security > Security Groups.
  2. Find the security group that you want to manage and click Add Rules in the Actions column.
  3. On the page that appears, click Add Security Group Rule in the upper-right corner.
  4. In the dialog box that appears, set the Port Range parameter to 8000/8000 and the Authorization Object parameter to the fixed IP address of Data Integration.
    Add Security Group Rule
  5. Click OK.

Create a custom resource group for Data Integration

  1. Go to the Data Integration page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides. Find the workspace and click Data Integration in the Actions column.
  2. In the left-side navigation pane, click Custom Resource Group.
  3. On the Custom Resource Groups page, click Add Resource Group in the upper-right corner.
    Create a custom resource group
    Notice By default, the Custom Resource Groups page lists only your custom resource groups, but not shared resource groups.
  4. In the Add Resource Group wizard, perform the following steps:
    1. In the Create Resource Group step, set the Resource Group Name parameter.
      Note The name can contain letters, digits, and underscores (_), and must start with a letter.
    2. Click Next.
    3. In the Add Server step, set the parameters that are described in the following table.
      Add Server step
      GUI element Description
      Network Type The network type of the ECS instance that hosts your custom resource group. You can set the network type to Classic Network only when the ECS instance is in the China (Shanghai) region. If your ECS instance is in a region other than the China (Shanghai) region, set this parameter to VPC.
      Server Name or ECS UUID The hostname or the UUID of the ECS instance that hosts the custom resource group, depending on the network type of your ECS instance:
      • The hostname when Network Type is set to Classic Network.

        To obtain the hostname, log on to the ECS instance and run the hostname command.

      • The UUID when Network Type is set to VPC.

        To obtain the UUID, log on to the ECS instance and run the dmidecode | grep UUID command.

      Server IP Address The internal IP address of the ECS instance.
      Server CPU (Cores) The number of CPU cores on the ECS instance. We recommend that you configure at least four CPU cores for an ECS instance that hosts a custom resource group.
      Server RAM (GB) The memory of the ECS instance. We recommend that you configure at least 8 GB RAM and 80 GB disk space for an ECS instance that hosts a custom resource group.
    4. Click Next.
    5. Perform the steps that are listed in the Install Agent step.
      Install Agent step
      Note If an error occurs when you run the install.sh script or you need to run it again, run the rm –rf install.sh command in the same directory as the install.sh script to delete the generated file. Then, run the install.sh script again.

      The commands to run during the installation and initialization process differ for each user. Run relevant commands based on the instructions on the initialization interface.

    6. Click Next.
    7. In the Test Connection step, click Refresh and check the status of the instance.
    8. Click Complete.
If the instance status remains Stopped after the preceding steps, the hostname may not be bound to an IP address, as shown in the following figure.Stopped
  1. Log on to the ECS instance by using Secure Shell (SSH) as the admin user.
  2. Run the hostname -i command to view the hostname binding information.
  3. Run the vim/etc/hosts command to add the binding of the IP address and hostname.
  4. Refresh the instance status and check whether the ECS instance is registered.
    If the ECS instance is still in the Stopped state after you refresh the page, perform the following steps to restart alisatasknode:
    1. Log on to the ECS instance by using SSH as the admin user.
    2. Run the following command:
      /home/admin/alisatasknode/target/alisatasknode/bin/serverctl restart
      Note You must enter your AccessKey pair when you run this command. Keep the AccessKey pair information secure.

Select a resource group for Data Integration for your sync node

  1. Click the Icon icon in the upper-left corner of the Data Integration page and choose All Products > Data Development > DataStudio.
  2. In the upper-left corner of the page that appears, select the workspace where your resource group for Data Integration resides.
  3. On the Data Analytics tab, expand the workflow where the batch sync node to be configured resides. Under Data Integration, double-click the batch sync node.
  4. On the node configuration tab that appears, click the Resource Group configuration tab in the right-side navigation pane.
    Select a custom resource group
  5. On the Resource Group configuration tab, set Programme and select a resource group as required.
  6. On the node configuration tab, click the Save icon icon in the toolbar.