Connect DataWorks to a remote host via SSH so that SSH nodes can trigger script execution on that host. A common use case is scheduling periodic scripts on an Elastic Compute Service (ECS) instance.
Limitations
-
SSH data sources support Connection String Mode only.
-
SSH scheduling tasks require an exclusive resource group for scheduling. Upgrade your resource group before running tasks, or the tasks will fail.
Prerequisites
Before you begin, make sure you have:
-
The host address and port of the target SSH server
-
An exclusive resource group for scheduling, purchased and configured with network access to the target host — see Use exclusive resource groups for scheduling and Network connectivity solutions
Permissions
To create a data source as a RAM user or RAM role, you need one of the following:
-
A workspace role of Project Owner, Workspace Administrator, or O&M — see Add and manage workspace members and their role permissions
-
The AliyunDataWorksFullAccess or AdministratorAccess policy — see Grant permissions to a RAM user and Manage permissions for a RAM role
Usage notes
In standard mode workspaces, create separate data sources for the development and production environments. Both environments must use the same authentication mode. For details, see Basic mode vs. standard mode workspaces.
Create an SSH data source
Step 1: Open the data source creation page
-
Log on to the DataWorks console. In the top navigation bar, select your region. In the left-side navigation pane, choose More > Management Center. Select your workspace from the drop-down list and click Go to Management Center.
-
In the left-side navigation pane of the SettingCenter page, click Data Sources.
-
Click Add Data Source, select SSH, and follow the on-screen instructions.
Step 2: Configure basic information
On the Create SSH Data Source page, enter the data source name and select an authentication mode.
The Configuration Mode is fixed to Connection String Mode. Choose an Authentication Mode and fill in the corresponding parameters:
Host password authentication
| Parameter | Description | Required |
|---|---|---|
| Host Address | The host address of the SSH server | Yes |
| Host Port | The port of the SSH server | Yes |
| Username | The username used to log on to the SSH server | Yes |
| Password | The logon password of the SSH server | Yes |
Host SSH key authentication
| Parameter | Description | Required |
|---|---|---|
| Host Address | The host address of the SSH server | Yes |
| Host Port | The port of the SSH server | Yes |
| Username | The username used to log on to the SSH server | Yes |
| Private Key | Upload the private key authentication file. For details, see Manage third-party authentication files | Yes |
| Passphrase | The passphrase for the private key file, if the file is encrypted | No |
DataWorks SSH public key authentication (recommended)
DataWorks generates a key pair for the data source. Add the public key to your server to establish a secure connection between DataWorks and the SSH server.
| Parameter | Description | Required |
|---|---|---|
| Host Address | The host address of the SSH server | Yes |
| Host Port | The port of the SSH server | Yes |
| Username | The username used to log on to the SSH server | Yes |
| Public Key | Click Generate Key Pair to generate a public key for the specified username | Yes |
After generating the public key, add it to the .ssh/authorized_keys file on your server before testing connectivity, or the connection will fail.
The generated key pair takes effect only after the data source is created. If you click Generate Key Pair again while editing, a new key replaces the previous one — save the new public key to your server promptly, or existing tasks may fail.
Step 3: Test resource group connectivity
In the Connection Configuration section, test the connectivity between the data source and your exclusive resource group for scheduling. Tasks fail if the resource group cannot reach the data source.
If the connectivity test fails, add the IP address of the resource group to the inbound rules of the server's security group. Both the public and internal IP addresses of the resource group are supported.
What's next
After creating the SSH data source:
-
Develop and schedule SSH tasks: In DataStudio, select the SSH data source in an SSH node to connect to the host. Deploy the node in Operation Center for periodic scheduling. See Data development and scheduling.
-
Manage the data source: Edit or delete the data source on the Data Sources page. See Manage data sources.
For high availability and load balancing across SSH nodes, see High availability for SSH nodes with load balancing.