Use Data Lake Analytics (DLA) nodes in DataWorks to run SQL tasks against DLA and incorporate them into scheduled extract, transform, and load (ETL) data processing flows.
Background information
Data Lake Analytics nodes are used to run tasks in Data Lake Analytics, an Alibaba Cloud product. For more information, see What is Data Lake Analytics?.
Data Lake Analytics nodes can run tasks on Serverless resource groups (recommended) or previous-generation exclusive resource groups for scheduling.
Prerequisites
Before you begin, make sure you have:
A DataWorks workspace with Data Development access
A DLA data source configured in DataWorks. If you don't have one, see Configure a Data Lake Analytics (DLA) data source
A Serverless resource group that has passed the connectivity test with your DLA network. To purchase and set up one, see Use Serverless resource groups
Supported regions
China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).
Create and configure a DLA node
Step 1: Create the node
Go to the Data Studio page. Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the target workspace from the drop-down list and click Go to Data Development.
Hover over the
icon and choose New Node > Custom > Data Lake Analytics. Alternatively, open the target business flow, right-click Custom, and choose New Node > Data Lake Analytics.In the Create Node dialog box, configure the following parameters and click Confirm.
Parameter Required Description Name Yes The node name. Must be 1–128 characters and can contain letters, digits, underscores ( _), and periods (.).Path Yes The path where the node is saved in the directory tree.
Step 2: Edit the node
Select a data source from the drop-down list. If the data source you need isn't listed, click New Data Source to create one on the Data Source Management page. For details, see Configure a Data Lake Analytics (DLA) data source.
Write your SQL statements using DLA syntax. Both DML and DDL statements are supported.
Click the
icon in the toolbar to save your changes.Click the
icon in the toolbar to run the SQL statement. To test against a specific resource group, click the
icon instead and select the target Serverless resource group. If your data source is in a VPC, you must select a Serverless resource group that has passed the connectivity test.
Step 3: Configure scheduling
Click Schedule on the right side of the node editing area to configure scheduling properties. When setting the resource group, select a Serverless resource group that is connected to the DLA network. This resource group handles all recurring scheduled executions.
For all available scheduling options, see Configure basic properties.
Step 4: Commit and publish the node
Before committing, set the Rerun property and configure the ancestor node dependency.
Click the
icon in the toolbar to save the node.Click the
icon in the toolbar.In the Commit New Version dialog box, enter a Change description and click Confirm.
If your workspace is in standard mode, click Publish in the upper-right corner after the commit completes. For details, see Publish tasks.
What's next
After the node is committed and published, monitor and manage it from the O&M center. For details, see Perform basic O&M operations on auto triggered nodes.