All Products
Search
Document Center

DataWorks:Data Lake Formation data source

Last Updated:Feb 03, 2026

Alibaba Cloud Data Lake Formation (DLF) is a fully managed platform that provides unified metadata, data storage, and data management. DLF offers features such as metadata management, storage management, permission management, storage analysis, and storage optimization. You can use DataWorks Data Integration to write data to DLF data sources. This topic describes how to use a DLF data source.

Limits

You can use Data Lake Formation data sources only in Data Integration and only with serverless resource groups.

Create a data source

  1. Go to the Data Sources page.

    1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

    2. In the left-side navigation pane of the SettingCenter page, click Data Sources.

  1. Click Add Data Source. Search for and select Data Lake Formation. Configure the parameters as described in the following table:

    Parameter

    Description

    Data Source Name

    Enter a custom name for the data source. It must be unique within the workspace and can contain only letters, digits, and underscores (_). It cannot start with a digit or an underscore.

    Configuration Mode

    Only Alibaba Cloud Instance Mode is supported.

    Endpoint

    Select the endpoint of the DLF engine instance from the drop-down list.

    Access Identity

    You can select one of the following options:

    • Alibaba Cloud Account

    • Alibaba Cloud RAM User

    • Alibaba Cloud RAM Role

    Select an option as needed.

    Note

    If you select RAM User or RAM Role, grant the following permissions to the RAM user or RAM role.

    DLF Data Catalog

    Select a DLF data catalog in the same region as your DataWorks workspace.

    Database Name

    Select a database in the data catalog.

    After you configure the parameters, test the connectivity between the data source and the serverless resource group in the connection configuration section. If the connectivity test is successful, click Complete Creation to create the data source. If the connectivity test fails, see Network connectivity configuration to troubleshoot the issue.

Create a data integration task

You can use a Data Lake Formation data source in a DataWorks data integration task. For more information, see Synchronize data to Data Lake Formation.

Appendix: Script examples and parameter descriptions

Configure an offline task script

If you use the code editor to configure an offline task, you must add the parameters to the task script in the standard format. For more information, see Configure a task in the code editor. The following sections describe the data source parameters for the code editor.

Reader script example

{
   "type": "job",
   "version": "2.0",
   "steps": [
      {
         "stepType": "dlf",
         "parameter": {
            "datasource": "guxuan_dlf",
            "table": "auto_ob_3088545_0523",
            "column": [
               "id",
               "col1",
               "col2",
               "col3"
            ],
            "where": "id > 1"
         },
         "name": "Reader",
         "category": "reader"
      },
      {
         "stepType": "stream",
         "parameter": {
            "print": false
         },
         "name": "Writer",
         "category": "writer"
      }
   ],
   "setting": {
      "errorLimit": {
         "record": "" // The number of error records.
      },
      "speed": {
         "throttle": true, // If set to false, the mbps parameter does not take effect, which means the rate is not limited. If set to true, the rate is limited.
         "concurrent": 20, // The job concurrency.
         "mbps": "12" // The rate limit. 1 mbps = 1 MB/s.
      }
   },
   "order": {
      "hops": [
         {
            "from": "Reader",
            "to": "Writer"
         }
      ]
   }
}

Reader script parameters

Parameter

Description

Required

datasource

The DLF data source.

Yes

table

The table name.

Yes

column

The column names.

Yes

where

The filter condition.

No

Writer script example

{
   "type": "job",
   "version": "2.0",
   "steps": [
      {
         "stepType": "stream",
         "parameter": {
         },
         "name": "Reader",
         "category": "reader"
      },
      {
         "stepType": "dlf",
         "parameter": {
            "datasource": "guxuan_dlf",
            "column": [
               "id",
               "col1",
               "col2",
               "col3"
            ],
            "table": "auto_ob_3088545_0523"
         },
         "name": "Writer",
         "category": "writer"
      }
   ],
   "setting": {
      "errorLimit": {
         "record": "" // The number of error records.
      },
      "speed": {
         "throttle": true, // If set to false, the mbps parameter does not take effect, which means the rate is not limited. If set to true, the rate is limited.
         "concurrent": 20, // The job concurrency.
         "mbps": "12" // The rate limit. 1 mbps = 1 MB/s.
      }
   },
   "order": {
      "hops": [
         {
            "from": "Reader",
            "to": "Writer"
         }
      ]
   }
}

Writer script parameters

Parameter

Description

Required

Default Value

datasource

The DLF data source.

Yes

None

table

The table name.

Yes

None

column

The column names.

Yes

None