This topic describes how to create a custom resource group for Data Integration and
select a resource group for Data Integration to run batch sync nodes.
Prerequisites
DataWorks Professional Edition or a more advanced edition is purchased so that you
can use custom resource groups for Data Integration.
Background information
You can create custom resource groups to run your sync nodes if the shared resource
group of DataWorks does not support your connection, or you want to improve the data
transmission speed.
A workspace administrator can create or modify custom resource groups on the page of Data Integration.
Note
- When you register an Elastic Compute Service (ECS) instance for hosting a custom resource
group, you can set the network type to Classic Network only when the ECS instance
is in the China (Shanghai) region. In this case, you must enter the hostname of the
ECS instance. We recommend that you set the network type to VPC. You can set the network
type to VPC only for ECS instances in other regions. In this case, you must enter
the universally unique identifier (UUID) of the ECS instance to be registered.
- The admin permission is required to access some files on the ECS instance that hosts
a custom resource group. For example, the admin permission is required to call shell
or SQL files on the ECS instance when you write a shell script for a node.
- A resource group for scheduling is used to schedule nodes. They have limited resources
and are not suitable for computing nodes. Therefore, we recommend that you do not
create custom resource groups on ECS instances of a resource group for scheduling.
MaxCompute can process a large amount of data. We recommend that you use MaxCompute
for big data computing.
Custom resource groups for Data Integration are subject to the following limits:
If the timeout error message response code is not 200
exists in the log file of alisatasknode, the custom resource group for Data Integration was inaccessible within the specific
time period. The ECS server that hosts the custom resource group for Data Integration
can continue to work if the exception persists for no more than 10 minutes. To find
the exception details, view the heartbeat.log file in the /home/admin/alisatasknode/logs
directory.
Purchase an ECS instance
Purchase an ECS instance
Note
- CentOS V6, CentOS V7, or AliOS is recommended.
- If the added ECS instance needs to run MaxCompute nodes or sync nodes, verify that
the current Python version of the ECS instance is V2.6 or V2.7. The Python version
of CentOS V5 is V2.4, whereas those of other operating systems are later than V2.6.
- Make sure that the ECS instance can access the Internet. Ping
www.alibabacloud.com
on the ECS instance and check whether the URL can be pinged.
- We recommend that you configure the ECS instance with an 8-core CPU and 16 GB memory.
View the hostname and internal IP address of the ECS instance
Log on to the ECS console. In the left-side navigation pane, choose . On the page that appears, view the hostname and IP address of the purchased ECS
instance.

Enable port 8000
Note You do not need to enable port 8000 if your ECS instance is in a virtual private cloud
(VPC). Steps in this section apply to ECS instances on the classic network only.
To enable port 8000 for reading logs, perform the following steps:
- Log on to the ECS console.In the left-side navigation pane, choose .
- Find the security group that you want to manage and click Add Rules in the Actions column.
- On the page that appears, click in the upper-right corner.
- In the dialog box that appears, set the Port Range parameter to 8000/8000 and the Authorization Object parameter to the fixed IP address of Data Integration.
- Click OK.
Create a custom resource group for Data Integration
- Go to the Data Integration page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where your workspace resides. Find the
workspace and click Data Integration in the Actions column.
- In the left-side navigation pane, click Custom Resource Group.
- On the Custom Resource Groups page, click Add Resource Group in the upper-right corner.

Notice By default, the Custom Resource Groups page lists only your custom resource groups,
but not shared resource groups.
- In the Add Resource Group wizard, perform the following steps:
- In the Create Resource Group step, set the Resource Group Name parameter.
Note The name can contain letters, digits, and underscores (_), and must start with a letter.
- Click Next.
- In the Add Server step, set the parameters that are described in the following table.

GUI element |
Description |
Network Type |
The network type of the ECS instance that hosts your custom resource group. You can
set the network type to Classic Network only when the ECS instance is in the China
(Shanghai) region. If your ECS instance is in a region other than the China (Shanghai)
region, set this parameter to VPC.
|
Server Name or ECS UUID |
The hostname or the UUID of the ECS instance that hosts the custom resource group,
depending on the network type of your ECS instance:
- The hostname when Network Type is set to Classic Network.
To obtain the hostname, log on to the ECS instance and run the hostname command.
- The UUID when Network Type is set to VPC.
To obtain the UUID, log on to the ECS instance and run the dmidecode | grep UUID command.
|
Server IP Address |
The internal IP address of the ECS instance. |
Server CPU (Cores) |
The number of CPU cores on the ECS instance. We recommend that you configure at least
four CPU cores for an ECS instance that hosts a custom resource group.
|
Server RAM (GB) |
The memory of the ECS instance. We recommend that you configure at least 8 GB RAM
and 80 GB disk space for an ECS instance that hosts a custom resource group.
|
- Click Next.
- Perform the steps that are listed in the Install Agent step.

Note If an error occurs when you run the
install.sh
script or you need to run it again, run the
rm –rf install.sh
command in the same directory as the
install.sh
script to delete the generated file. Then, run the
install.sh
script again.
The commands to run during the installation and initialization process differ for
each user. Run relevant commands based on the instructions on the initialization interface.
- Click Next.
- In the Test Connection step, click Refresh and check the status of the instance.
- Click Complete.
If the instance status remains
Stopped after the preceding steps, the hostname may not be bound to an IP address, as shown
in the following figure.

- Log on to the ECS instance by using Secure Shell (SSH) as the admin user.
- Run the
hostname -i
command to view the hostname binding information.
- Run the
vim/etc/hosts
command to add the binding of the IP address and hostname.
- Refresh the instance status and check whether the ECS instance is registered.
If the ECS instance is still in the Stopped state after you refresh the page, perform
the following steps to restart alisatasknode:
- Log on to the ECS instance by using SSH as the admin user.
- Run the following command:
/home/admin/alisatasknode/target/alisatasknode/bin/serverctl restart
Note You must enter your AccessKey pair when you run this command. Keep the AccessKey pair
information secure.
Select a resource group for Data Integration for your sync node
- Click the
icon in the upper-left corner of the Data Integration page and choose .
- In the upper-left corner of the page that appears, select the workspace where your
resource group for Data Integration resides.
- On the Data Analytics tab, expand the workflow where the batch sync node to be configured resides. Under
Data Integration, double-click the batch sync node.
- On the node configuration tab that appears, click the Resource Group configuration tab in the right-side navigation pane.
- On the Resource Group configuration tab, set Programme and select a resource group as required.
- On the node configuration tab, click the
icon in the toolbar.