All Products
Search
Document Center

DataWorks:Implement load balancing and high availability for SSH nodes

Last Updated:Mar 27, 2026

DataWorks SSH nodes connect to a fixed Elastic Compute Service (ECS) instance by default. If that instance fails or runs out of resources, SSH tasks stall or stop. By routing SSH traffic through Alibaba Cloud Network Load Balancer (NLB), you can distribute tasks across multiple ECS instances and eliminate the single point of failure (SPOF)—without changing any task logic.

This topic walks you through the full setup: creating ECS instances, configuring NLB, adding the NLB instance as an SSH data source, and verifying that tasks are distributed across instances.

How it works

When you configure an SSH data source with the NLB instance's domain name, every task request from a DataWorks SSH node is intercepted by the NLB listener. NLB selects a healthy ECS instance based on its load balancing policy and forwards the request to that instance. The ECS instance executes the task, and the result is transparently returned through NLB and displayed in the SSH node's run log.

image

The following table summarizes the difference between running SSH tasks with and without NLB:

Dimension Without load balancing With load balancing
Failure impact Tasks on the affected ECS instance are delayed or stopped NLB reroutes tasks to healthy instances automatically
Task continuity Interrupted when the single instance fails Maintained across the server group
Resource saturation Tasks fail when one instance is exhausted Requests are distributed across instances
O&M effort Manual intervention required to recover NLB health checks handle failover automatically

When to use this feature

Configure NLB-backed load balancing for your SSH nodes when:

  • SSH task failures caused by ECS instance failures or resource exhaustion are unacceptable

  • Tasks run long enough that an instance restart during execution would break the workflow

  • You manage multiple ECS instances for SSH workloads and want automatic failover without changing task logic

Prerequisites

Before you begin, make sure that:

  • A RAM user is added to your workspace with the Develop or Workspace Administrator role. For more information, see Add members to a workspace.

    Important

    The Workspace Administrator role has broader permissions than typically needed. Assign it with caution.

  • A serverless resource group is associated with your workspace. For more information, see Use serverless resource groups.

Limitations

  • SSH node code cannot exceed 128 KB.

  • DataWorks resource groups, ECS instances, and NLB instances must all be in the same virtual private cloud (VPC) in the same region. Mismatches in VPC or region will cause connectivity failures.

Step 1: Create ECS instances

Create at least two ECS instances. This example uses two instances with different sample data to verify that load balancing works.

Create the instances

  1. Go to the ECS buy page and click Custom Launch.

  2. Configure the following parameters: For more information about parameter settings, see Create an instance on the Custom Launch tab.

    Parameter Description
    Region Select the region where your DataWorks workspace resides
    Network and Zone Select the VPC and vSwitch used by your serverless resource group
    Security Group Select the security group associated with the serverless resource group's vSwitch
    Other parameters Configure based on your business requirements
  3. Set Quantity to 2 on the right side of the buy page.

  4. Click Create Order and follow the prompts to complete the purchase.

Set up sample data

Log on to each ECS instance and write different sample data so you can verify load balancing later.

  1. Log on to the ECS console and select the region where the instances reside.

  2. On the Instances page, click Connect in the Actions column for an instance. In the Remote connection dialog box, click Sign in now.

  3. In the Instance Login dialog box, enter the authentication information to log on.

  4. Run the following command on ECS Instance 1:

    echo "I am the second server" > /tmp/a.txt
  5. Repeat steps 2–3 for ECS Instance 2, then run:

    echo "I am the second server" > /tmp/a.txt

Step 2: Create an NLB instance

Create an NLB instance that routes SSH traffic to your ECS instances. For more information, see Create and manage an NLB instance.

Important

Create the NLB instance in the same region as your DataWorks workspace. A region mismatch will cause connectivity failures.

  1. Log on to the NLB console. In the top navigation bar, select the target region.

  2. In the left-side navigation pane, click Instances. On the Instances page, click Create NLB. The Cloud Service Buy Page appears.

  3. Configure the following parameters:

    Section Parameter Value
    Network Network Type Internal-facing
    VPC The VPC where your serverless resource group is deployed
    Zone The zone where the serverless resource group's vSwitch resides
    IP Version IPv4
    Management Settings Instance Name Enter a custom name
    Resource Group Select the default resource group
  4. Click Create Now. On the Confirm Order page, click Activate Now.

Step 3: Create a backend server group

After the NLB instance enters the Active state, create a server group and add the ECS instances to it. For more information, see NLB server groups.

  1. In the NLB console, find the NLB instance and click its ID to open the instance details page.

  2. In the Quick Start with NLB section, click Create Server Group.

  3. In the Create Server Group dialog box, set Server Group Name to ECS_NLB and click Create.

  4. In the confirmation dialog box, click Add Backend Server.

  5. On the Backend Servers tab, click Add Backend Server. In the panel that appears, select both ECS instances.

  6. Click Next to proceed to the Ports/Weights step. Set Port to 22 for each instance, then click OK.

Wait for the addition to complete before continuing.

Step 4: Configure a listener

Add a TCP listener on port 22 to route SSH traffic from NLB to the server group. For more information, see Add a TCP listener.

  1. On the NLB instance details page, click Create Listener in the Quick Start with NLB section.

  2. On the Configure Server Load Balancer page, set Listener Protocol to TCP and Listener Port to 22. Click Next.

  3. In the Select Server Group step, select Server Type and choose ECS_NLB from the server group list. Click Next.

  4. In the Configuration Review step, confirm the ECS instances and listener port are correct. Click Submit.

Step 5: Add the NLB instance as an SSH data source

Configure an SSH data source in DataWorks using the NLB instance's domain name as the host address. For more information, see SSH data source.

Configure the data source

  1. Log on to the DataWorks console. In the left-side navigation pane, choose More > Management Center.

  2. Select the workspace from the drop-down list and click Go to Management Center.

  3. In the left-side navigation pane, click Data Sources. On the Data Sources tab, click Add Data Source.

  4. In the Add Data Source dialog box, click SSH. On the Add SSH Data Source page, configure the following parameters: To get the NLB instance's domain name: In the NLB console, open the instance details page, go to the Basic Information section on the Instance Details tab, and click Copy next to Domain Name.

    Parameter Value
    Data Source Name A custom name. Example: SSH_DB
    Configuration Mode Connection String Mode (fixed)
    Authentication Method DataWorks SSH Public Key Authentication (recommended)
    Host Address The domain name of the NLB instance (see below)
    Host Port 22
    Username root
    Public Key Click Generate Key Pair to generate the key

Add the public key to each ECS instance

Important

Complete this step before testing network connectivity. If you skip it, the connectivity test will fail.

After clicking Generate Key Pair, copy the generated public key and add it to the .ssh/authorized_keys file on each ECS instance:

  1. Copy the public key displayed on the Add SSH Data Source page.

  2. Log on to ECS Instance 1 (follow the steps in Step 1: Create ECS instances) and run:

    echo "<public-key>" >> ~/.ssh/authorized_keys

    Replace <public-key> with the key you copied.

  3. Repeat for ECS Instance 2.

Test connectivity

In the Connection Configuration section, find the serverless resource group associated with your workspace and click Test Network Connectivity in the Connection Status column.

Step 6: Create and run an SSH node

Create an SSH node in Data Studio, run it multiple times, and confirm that NLB distributes requests across both ECS instances.

Create the node

  1. On the Workspaces page, find the workspace and choose Shortcuts > Data Studio in the Actions column.

  2. In the left-side navigation pane of Data Studio, click the image icon. Click the image icon next to Workspace Directories and choose Create Node > General > SSH.

  3. In the Create Node dialog box, set Name and click OK.

Configure and run the node

  1. In the code editor, enter:

    cat /tmp/a.txt
  2. From the Select DataSource drop-down list at the top of the configuration tab, select SSH_DB.

  3. In the right-side navigation pane, click Debugging Configurations. Select the serverless resource group from the Resource Group drop-down list.

  4. Click Save in the toolbar, then click Run.

Verify the results

Run the node several times. Because NLB randomly distributes tasks to ECS instances based on its load balancing algorithm, you may see different results between runs:

Execution result 1 image

Execution result 2

image
Note

Seeing the same result across consecutive runs is normal—NLB may distribute the task to the same instance across multiple runs. Seeing results from both instances across multiple runs confirms that load balancing is working.

If all runs consistently return the same result, check the following:

  • The public key is correctly added to .ssh/authorized_keys on both ECS instances

  • Both instances show a healthy status in the NLB server group

  • The listener port (22) matches the backend server port

What's next

Appendix: Implementation principles

The following figure shows how NLB distributes tasks in DataWorks to ensure stable running of SSH tasks.

image

After you add multiple ECS instances to a server group of an NLB instance, and specify the domain name of the NLB instance as the host address of an SSH data source in DataWorks, when you configure an SSH node, task requests on the SSH node are monitored by a listener configured for the NLB instance based on the data source and corresponding tasks are distributed to healthy ECS instances for running based on a load balancing policy. The running results are transparently returned by the NLB instance and are displayed in real time in the run log of the SSH node.