DataWorks SSH nodes connect to a fixed Elastic Compute Service (ECS) instance by default. If that instance fails or runs out of resources, SSH tasks stall or stop. By routing SSH traffic through Alibaba Cloud Network Load Balancer (NLB), you can distribute tasks across multiple ECS instances and eliminate the single point of failure (SPOF)—without changing any task logic.
This topic walks you through the full setup: creating ECS instances, configuring NLB, adding the NLB instance as an SSH data source, and verifying that tasks are distributed across instances.
How it works
When you configure an SSH data source with the NLB instance's domain name, every task request from a DataWorks SSH node is intercepted by the NLB listener. NLB selects a healthy ECS instance based on its load balancing policy and forwards the request to that instance. The ECS instance executes the task, and the result is transparently returned through NLB and displayed in the SSH node's run log.
The following table summarizes the difference between running SSH tasks with and without NLB:
| Dimension | Without load balancing | With load balancing |
|---|---|---|
| Failure impact | Tasks on the affected ECS instance are delayed or stopped | NLB reroutes tasks to healthy instances automatically |
| Task continuity | Interrupted when the single instance fails | Maintained across the server group |
| Resource saturation | Tasks fail when one instance is exhausted | Requests are distributed across instances |
| O&M effort | Manual intervention required to recover | NLB health checks handle failover automatically |
When to use this feature
Configure NLB-backed load balancing for your SSH nodes when:
-
SSH task failures caused by ECS instance failures or resource exhaustion are unacceptable
-
Tasks run long enough that an instance restart during execution would break the workflow
-
You manage multiple ECS instances for SSH workloads and want automatic failover without changing task logic
Prerequisites
Before you begin, make sure that:
-
A RAM user is added to your workspace with the Develop or Workspace Administrator role. For more information, see Add members to a workspace.
ImportantThe Workspace Administrator role has broader permissions than typically needed. Assign it with caution.
-
A serverless resource group is associated with your workspace. For more information, see Use serverless resource groups.
Limitations
-
SSH node code cannot exceed
128 KB. -
DataWorks resource groups, ECS instances, and NLB instances must all be in the same virtual private cloud (VPC) in the same region. Mismatches in VPC or region will cause connectivity failures.
Step 1: Create ECS instances
Create at least two ECS instances. This example uses two instances with different sample data to verify that load balancing works.
Create the instances
-
Go to the ECS buy page and click Custom Launch.
-
Configure the following parameters: For more information about parameter settings, see Create an instance on the Custom Launch tab.
Parameter Description Region Select the region where your DataWorks workspace resides Network and Zone Select the VPC and vSwitch used by your serverless resource group Security Group Select the security group associated with the serverless resource group's vSwitch Other parameters Configure based on your business requirements -
Set Quantity to
2on the right side of the buy page. -
Click Create Order and follow the prompts to complete the purchase.
Set up sample data
Log on to each ECS instance and write different sample data so you can verify load balancing later.
-
Log on to the ECS console and select the region where the instances reside.
-
On the Instances page, click Connect in the Actions column for an instance. In the Remote connection dialog box, click Sign in now.
-
In the Instance Login dialog box, enter the authentication information to log on.
-
Run the following command on ECS Instance 1:
echo "I am the second server" > /tmp/a.txt -
Repeat steps 2–3 for ECS Instance 2, then run:
echo "I am the second server" > /tmp/a.txt
Step 2: Create an NLB instance
Create an NLB instance that routes SSH traffic to your ECS instances. For more information, see Create and manage an NLB instance.
Create the NLB instance in the same region as your DataWorks workspace. A region mismatch will cause connectivity failures.
-
Log on to the NLB console. In the top navigation bar, select the target region.
-
In the left-side navigation pane, click Instances. On the Instances page, click Create NLB. The Cloud Service Buy Page appears.
-
Configure the following parameters:
Section Parameter Value Network Network Type Internal-facing VPC The VPC where your serverless resource group is deployed Zone The zone where the serverless resource group's vSwitch resides IP Version IPv4 Management Settings Instance Name Enter a custom name Resource Group Select the default resource group -
Click Create Now. On the Confirm Order page, click Activate Now.
Step 3: Create a backend server group
After the NLB instance enters the Active state, create a server group and add the ECS instances to it. For more information, see NLB server groups.
-
In the NLB console, find the NLB instance and click its ID to open the instance details page.
-
In the Quick Start with NLB section, click Create Server Group.
-
In the Create Server Group dialog box, set Server Group Name to
ECS_NLBand click Create. -
In the confirmation dialog box, click Add Backend Server.
-
On the Backend Servers tab, click Add Backend Server. In the panel that appears, select both ECS instances.
-
Click Next to proceed to the Ports/Weights step. Set Port to
22for each instance, then click OK.
Wait for the addition to complete before continuing.
Step 4: Configure a listener
Add a TCP listener on port 22 to route SSH traffic from NLB to the server group. For more information, see Add a TCP listener.
-
On the NLB instance details page, click Create Listener in the Quick Start with NLB section.
-
On the Configure Server Load Balancer page, set Listener Protocol to TCP and Listener Port to
22. Click Next. -
In the Select Server Group step, select Server Type and choose
ECS_NLBfrom the server group list. Click Next. -
In the Configuration Review step, confirm the ECS instances and listener port are correct. Click Submit.
Step 5: Add the NLB instance as an SSH data source
Configure an SSH data source in DataWorks using the NLB instance's domain name as the host address. For more information, see SSH data source.
Configure the data source
-
Log on to the DataWorks console. In the left-side navigation pane, choose More > Management Center.
-
Select the workspace from the drop-down list and click Go to Management Center.
-
In the left-side navigation pane, click Data Sources. On the Data Sources tab, click Add Data Source.
-
In the Add Data Source dialog box, click SSH. On the Add SSH Data Source page, configure the following parameters: To get the NLB instance's domain name: In the NLB console, open the instance details page, go to the Basic Information section on the Instance Details tab, and click Copy next to Domain Name.
Parameter Value Data Source Name A custom name. Example: SSH_DBConfiguration Mode Connection String Mode (fixed) Authentication Method DataWorks SSH Public Key Authentication (recommended) Host Address The domain name of the NLB instance (see below) Host Port 22Username rootPublic Key Click Generate Key Pair to generate the key
Add the public key to each ECS instance
Complete this step before testing network connectivity. If you skip it, the connectivity test will fail.
After clicking Generate Key Pair, copy the generated public key and add it to the .ssh/authorized_keys file on each ECS instance:
-
Copy the public key displayed on the Add SSH Data Source page.
-
Log on to ECS Instance 1 (follow the steps in Step 1: Create ECS instances) and run:
echo "<public-key>" >> ~/.ssh/authorized_keysReplace
<public-key>with the key you copied. -
Repeat for ECS Instance 2.
Test connectivity
In the Connection Configuration section, find the serverless resource group associated with your workspace and click Test Network Connectivity in the Connection Status column.
Step 6: Create and run an SSH node
Create an SSH node in Data Studio, run it multiple times, and confirm that NLB distributes requests across both ECS instances.
Create the node
-
On the Workspaces page, find the workspace and choose Shortcuts > Data Studio in the Actions column.
-
In the left-side navigation pane of Data Studio, click the
icon. Click the
icon next to Workspace Directories and choose Create Node > General > SSH. -
In the Create Node dialog box, set Name and click OK.
Configure and run the node
-
In the code editor, enter:
cat /tmp/a.txt -
From the Select DataSource drop-down list at the top of the configuration tab, select
SSH_DB. -
In the right-side navigation pane, click Debugging Configurations. Select the serverless resource group from the Resource Group drop-down list.
-
Click Save in the toolbar, then click Run.
Verify the results
Run the node several times. Because NLB randomly distributes tasks to ECS instances based on its load balancing algorithm, you may see different results between runs:
Execution result 1 
Execution result 2
Seeing the same result across consecutive runs is normal—NLB may distribute the task to the same instance across multiple runs. Seeing results from both instances across multiple runs confirms that load balancing is working.
If all runs consistently return the same result, check the following:
-
The public key is correctly added to
.ssh/authorized_keyson both ECS instances -
Both instances show a healthy status in the NLB server group
-
The listener port (
22) matches the backend server port
What's next
-
To monitor the health status of backend servers in your NLB server group, see NLB server groups.
-
To learn more about SSH node development, see Node development.
Appendix: Implementation principles
The following figure shows how NLB distributes tasks in DataWorks to ensure stable running of SSH tasks.
After you add multiple ECS instances to a server group of an NLB instance, and specify the domain name of the NLB instance as the host address of an SSH data source in DataWorks, when you configure an SSH node, task requests on the SSH node are monitored by a listener configured for the NLB instance based on the data source and corresponding tasks are distributed to healthy ECS instances for running based on a load balancing policy. The running results are transparently returned by the NLB instance and are displayed in real time in the run log of the SSH node.