All Products
Search
Document Center

DataWorks:Data Lake Formation

Last Updated:Mar 31, 2026

Alibaba Cloud Data Lake Formation (DLF) is a fully managed platform that provides unified metadata, data storage, and data management. DLF includes features such as Metadata Management, Storage Management, Permission Management, Storage Analysis, and Storage Optimization. DataWorks Data Integration supports writing data to a DLF data source.

Limitations

  • A DLF data source can only be used in Data Integration.

  • A Serverless resource group is required.

Prerequisites

Before you begin, ensure that you have:

  • A Serverless resource group created in your DataWorks workspace DLF Data Catalog.

  • If you plan to use a RAM user or RAM role as the access identity, the following permissions granted in advance:

Note

If you select Alibaba Cloud Account as the access identity, no additional permissions are required.

Add a DLF data source

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. Select the desired workspace from the drop-down list and click Go to Management Center.

  2. In the left-side navigation pane of the SettingCenter page, click Data Sources.

  3. Click Add Data Source, search for and select DLF, then configure the following parameters:

    Parameter

    Description

    Data Source Name

    Enter a custom name. The name must be unique within the workspace and can contain only letters, digits, and underscores (_). It cannot start with a digit or an underscore.

    Configuration Mode

    Only Alibaba Cloud Instance Mode is supported.

    Endpoint

    Select the endpoint of the DLF engine instance from the drop-down list.

    Access Identity

    Select Alibaba Cloud Account, Alibaba Cloud RAM User, or Alibaba Cloud RAM Role. If you select RAM User or RAM Role, complete the permission setup described in Prerequisites before proceeding.

    DLF Data Catalog

    Select a DLF Data Catalog in the same region as your DataWorks workspace.

    Database Name

    Select a database in the Data Catalog.

  4. Test the connectivity between the data source and the Serverless resource group. If the test passes, click Complete Modification. If the test fails, see Network connectivity configuration for troubleshooting.

Create a data integration task

To use the DLF data source in a Data Integration task, see Data synchronization to Data Lake Formation.

Appendix: Script mode reference

When configuring an offline task in script mode, format the script parameters as shown below. For an overview of script mode, see Use the code editor.

All examples use "stepType": "dlf" to identify the DLF Reader or Writer step.

Reader script example

{
   "type": "job",
   "version": "2.0",
   "steps": [
      {
         "stepType": "dlf",
         "parameter": {
            "datasource": "guxuan_dlf",
            "table": "auto_ob_3088545_0523",
            "column": [
               "id",
               "col1",
               "col2",
               "col3"
            ],
            "tableType": "table",
            "where": "id > 1"
         },
         "name": "Reader",
         "category": "reader"
      },
      {
         "stepType": "stream",
         "parameter": {
            "print": false
         },
         "name": "Writer",
         "category": "writer"
      }
   ],
   "setting": {
      "errorLimit": {
         "record": ""
      },
      "speed": {
         "throttle": true,
         "concurrent": 20,
         "mbps": "12"
      }
   },
   "order": {
      "hops": [
         {
            "from": "Reader",
            "to": "Writer"
         }
      ]
   }
}

Reader parameters

Parameter

Required

Default

Description

datasource

Yes

Name of the DLF data source.

table

Yes

Name of the source table.

tableType

No

table

Table type. Valid values: table (Paimon table), format-table (format table), iceberg-table (Iceberg table).

column

Yes

Columns to read from the source table.

where

No

Filter condition.

Writer script example

{
   "type": "job",
   "version": "2.0",
   "steps": [
      {
         "stepType": "stream",
         "parameter": {
         },
         "name": "Reader",
         "category": "reader"
      },
      {
         "stepType": "dlf",
         "parameter": {
            "datasource": "guxuan_dlf",
            "column": [
               "id",
               "col1",
               "col2",
               "col3"
            ],
            "tableType": "table",
            "table": "auto_ob_3088545_0523"
         },
         "name": "Writer",
         "category": "writer"
      }
   ],
   "setting": {
      "errorLimit": {
         "record": ""
      },
      "speed": {
         "throttle": true,
         "concurrent": 20,
         "mbps": "12"
      }
   },
   "order": {
      "hops": [
         {
            "from": "Reader",
            "to": "Writer"
         }
      ]
   }
}

Writer parameters

Parameter

Required

Default

Description

datasource

Yes

Name of the DLF data source.

table

Yes

Name of the destination table.

tableType

No

table

Table type. Valid values: table (Paimon table), format-table (format table), iceberg-table (Iceberg table).

column

Yes

Columns to write to the destination table.

Speed and error limit settings

Parameter

Description

Example

errorLimit.record

Maximum number of error records allowed before the job fails.

""

speed.throttle

Enables (true) or disables (false) transfer rate throttling. When false, the mbps value is ignored.

true

speed.concurrent

Number of concurrent job threads.

20

speed.mbps

Maximum transfer rate in MB/s. Applies only when throttle is true.

"12"