Alibaba Cloud Data Lake Formation (DLF) is a fully managed platform that provides unified metadata, data storage, and data management. DLF includes features such as Metadata Management, Storage Management, Permission Management, Storage Analysis, and Storage Optimization. DataWorks Data Integration supports writing data to a DLF data source.
Limitations
A DLF data source can only be used in Data Integration.
A Serverless resource group is required.
Prerequisites
Before you begin, ensure that you have:
A Serverless resource group created in your DataWorks workspace DLF Data Catalog.
If you plan to use a RAM user or RAM role as the access identity, the following permissions granted in advance:
In the RAM console, attach the AliyunDataWorksDIAccessDLF system policy to the RAM user or RAM role. For details, see Grant permissions to a RAM user.
In the Data Lake Formation console, grant the Data Editor permission on the target tables to the RAM user or RAM role.
If you select Alibaba Cloud Account as the access identity, no additional permissions are required.
Add a DLF data source
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose More > Management Center. Select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane of the SettingCenter page, click Data Sources.
Click Add Data Source, search for and select DLF, then configure the following parameters:
Parameter
Description
Data Source Name
Enter a custom name. The name must be unique within the workspace and can contain only letters, digits, and underscores (_). It cannot start with a digit or an underscore.
Configuration Mode
Only Alibaba Cloud Instance Mode is supported.
Endpoint
Select the endpoint of the DLF engine instance from the drop-down list.
Access Identity
Select Alibaba Cloud Account, Alibaba Cloud RAM User, or Alibaba Cloud RAM Role. If you select RAM User or RAM Role, complete the permission setup described in Prerequisites before proceeding.
DLF Data Catalog
Select a DLF Data Catalog in the same region as your DataWorks workspace.
Database Name
Select a database in the Data Catalog.
Test the connectivity between the data source and the Serverless resource group. If the test passes, click Complete Modification. If the test fails, see Network connectivity configuration for troubleshooting.
Create a data integration task
To use the DLF data source in a Data Integration task, see Data synchronization to Data Lake Formation.
Appendix: Script mode reference
When configuring an offline task in script mode, format the script parameters as shown below. For an overview of script mode, see Use the code editor.
All examples use "stepType": "dlf" to identify the DLF Reader or Writer step.
Reader script example
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "dlf",
"parameter": {
"datasource": "guxuan_dlf",
"table": "auto_ob_3088545_0523",
"column": [
"id",
"col1",
"col2",
"col3"
],
"tableType": "table",
"where": "id > 1"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": ""
},
"speed": {
"throttle": true,
"concurrent": 20,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Reader parameters
Parameter | Required | Default | Description |
| Yes | — | Name of the DLF data source. |
| Yes | — | Name of the source table. |
| No |
| Table type. Valid values: |
| Yes | — | Columns to read from the source table. |
| No | — | Filter condition. |
Writer script example
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "dlf",
"parameter": {
"datasource": "guxuan_dlf",
"column": [
"id",
"col1",
"col2",
"col3"
],
"tableType": "table",
"table": "auto_ob_3088545_0523"
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": ""
},
"speed": {
"throttle": true,
"concurrent": 20,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Writer parameters
Parameter | Required | Default | Description |
| Yes | — | Name of the DLF data source. |
| Yes | — | Name of the destination table. |
| No |
| Table type. Valid values: |
| Yes | — | Columns to write to the destination table. |
Speed and error limit settings
Parameter | Description | Example |
| Maximum number of error records allowed before the job fails. |
|
| Enables ( |
|
| Number of concurrent job threads. |
|
| Maximum transfer rate in MB/s. Applies only when |
|