All Products
Search
Document Center

DataWorks:Data Lake Analytics node

Last Updated:Aug 14, 2023

Data Lake Analytics nodes are supported in DataWorks. You can create a Data Lake Analytics node in the DataWorks console to build an online extract, transform, and load (ETL) process.

Background information

Data Lake Analytics nodes are used to connect to Data Lake Analytics (DLA), an interactive query and analytics service that is provided by Alibaba Cloud. For more information, see DLA documentation.
Important You can run Data Lake Analytics nodes only on exclusive resource groups for scheduling. For more information, see Create and use an exclusive resource group for scheduling.

Procedure

  1. Go to the DataStudio page.

    1. Log on to the DataWorks console.

    2. In the left-side navigation pane, click Workspaces.

    3. In the top navigation bar, select the region where the workspace resides. On the Workspaces page, find the workspace in which you want to create tables, and click Shortcuts > Data Development in the Actions column.

  2. On the Data Development tab, move the pointer over Create icon and choose Customize > Data Lake Analytics.
    Alternatively, you can click the target workflow, right-click UserDefined, and then choose New > Data Lake Analytics.
  3. In the Create Node dialog box, configure the Name and Path parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Confirm.
  5. Configure the Data Lake Analytics node.
    1. Select a connection.
      On the configuration tab of the Data Lake Analytics node, select a connection from the Select data source drop-down list. If you cannot find the required connection in the drop-down list, click New data source to add a connection on the Data Source page. For more information, see Add a DLA data source.
    2. Write the SQL statements of the node.
      After you select a connection, write SQL statements based on the syntax that is supported by DLA. You can write data manipulation language (DML) or data definition language (DDL) statements in the code editor.
    3. Click Save icon in the toolbar.
    4. Click Run icon in the toolbar to run the SQL statements you have saved.
    If you need to change the resource group on which you test the Data Lake Analytics node on the DataStudio page, click Advanced run (run with parameters) icon in the toolbar and select your desired exclusive resource group.
    Note To access a data store in a virtual private cloud (VPC), a node must be run on an exclusive resource group for scheduling. In this example, the data store is in a VPC. You must select an exclusive resource group for scheduling that is connected to the target DLA data store.
  6. Click the Scheduling configuration tab in the right-side navigation pane. On the Scheduling configuration tab, set the scheduling properties for the node. For more information, see Configure basic properties.
    You must select an exclusive resource group for scheduling that is connected to the target DLA data store to periodically run the node.
  7. Save and commit the node.
    Important You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
    1. Click the Save icon in the top toolbar to save the node.
    2. Click the Submit icon in the toolbar.
    3. In the Commit Node dialog box, configure the Change description parameter.
    4. Click OK.
    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Deploy nodes.
  8. Perform O&M operations on the node. For more information, see Perform basic O&M operations on auto triggered nodes.