Build ETL Pipelines with DLA Serverless Nodes - DataWorks

Use Data Lake Analytics (DLA) nodes in DataWorks to run SQL tasks against DLA and incorporate them into scheduled extract, transform, and load (ETL) data processing flows.

Background information

Data Lake Analytics nodes are used to run tasks in Data Lake Analytics, an Alibaba Cloud product. For more information, see What is Data Lake Analytics?.

Data Lake Analytics nodes can run tasks on Serverless resource groups (recommended) or previous-generation exclusive resource groups for scheduling.

Prerequisites

Before you begin, make sure you have:

A DataWorks workspace with Data Development access
A DLA data source configured in DataWorks. If you don't have one, see Configure a Data Lake Analytics (DLA) data source
A Serverless resource group that has passed the connectivity test with your DLA network. To purchase and set up one, see Use Serverless resource groups

Supported regions

China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).

Create and configure a DLA node

Step 1: Create the node

Go to the Data Studio page. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the target workspace from the drop-down list and click Go to Data Development.
Hover over the icon and choose New Node > Custom > Data Lake Analytics. Alternatively, open the target business flow, right-click Custom, and choose New Node > Data Lake Analytics.

In the Create Node dialog box, configure the following parameters and click Confirm.

Parameter	Required	Description
Name	Yes	The node name. Must be 1–128 characters and can contain letters, digits, underscores (`_`), and periods (`.`).
Path	Yes	The path where the node is saved in the directory tree.

Step 2: Edit the node

Select a data source from the drop-down list. If the data source you need isn't listed, click New Data Source to create one on the Data Source Management page. For details, see Configure a Data Lake Analytics (DLA) data source.
Write your SQL statements using DLA syntax. Both DML and DDL statements are supported.
Click the icon in the toolbar to save your changes.
Click the icon in the toolbar to run the SQL statement. To test against a specific resource group, click the icon instead and select the target Serverless resource group. If your data source is in a VPC, you must select a Serverless resource group that has passed the connectivity test.

Step 3: Configure scheduling

Click Schedule on the right side of the node editing area to configure scheduling properties. When setting the resource group, select a Serverless resource group that is connected to the DLA network. This resource group handles all recurring scheduled executions.

For all available scheduling options, see Configure basic properties.

Step 4: Commit and publish the node

Before committing, set the Rerun property and configure the ancestor node dependency.

Click the icon in the toolbar to save the node.
Click the icon in the toolbar.
In the Commit New Version dialog box, enter a Change description and click Confirm.
If your workspace is in standard mode, click Publish in the upper-right corner after the commit completes. For details, see Publish tasks.

What's next

After the node is committed and published, monitor and manage it from the O&M center. For details, see Perform basic O&M operations on auto triggered nodes.