All Products
Search
Document Center

Data Lake Formation:Manage data import tasks

Last Updated:Feb 28, 2022

In Data Lake Formation (DLF), you can use data import tasks to define the method to be used to import data to DLF and the resources to be consumed.

Data ingestion tasks page in the DLF console

On the Data ingestion tasks page, you can view the status of data import tasks. You can stop a data import task that is running. You can also start or delete a data import task.Data ingestion tasks page

Procedure

To create a data import task, perform the following steps:

  1. Log on to the DLF console. In the left-side navigation pane, choose Data into the lake > Data ingestion tasks.

  2. Specify the data import template. Six types of data import templates are provided to adapt to different data import scenarios.

    Data import templates
    • Relational database full into the lake: You can use this template to synchronize full data from the specified table in an ApsaraDB RDS for MySQL data source to DLF. Select an ApsaraDB RDS for MySQL data source that you have created, and specify the table whose full data you want to synchronize. Then, DLF synchronizes the full data from the table in the ApsaraDB RDS for MySQL data source to an Object Storage Service (OSS) bucket. A large number of resources are required to synchronize a large amount of data. We recommend that you synchronize a large amount of data during off-peak hours so that business is not interrupted. Make sure that the table you specified in the ApsaraDB RDS for MySQL data source contains a primary key. Otherwise, an error is returned.

    • Relational database enters the lake in real time: You can use this template to synchronize incremental data from the specified table in an ApsaraDB RDS for MySQL data source to DLF. Select an ApsaraDB RDS for MySQL data source that you have created, and specify the table whose incremental data you want to synchronize. Then, DLF synchronizes the incremental data from the table in the ApsaraDB RDS for MySQL data source to DLF in real time by synchronizing the binary logs of the incremental data. Make sure that the table you specified in the ApsaraDB RDS for MySQL data source contains a primary key. Otherwise, an error is returned.

    • SLS log into the lake in real time: You can use this template to synchronize data from Log Service to DLF in real time. You can select a Log Service project within your Alibaba Cloud account, and specify the Logstore whose data you want to synchronize. Then, DLF synchronizes data from the Logstore you specified in the Log Service project to DLF in real time.

    • TableStore enters the lake in real time: You can use this template to synchronize data from the specified table in Tablestore to DLF. DLF obtains the real-time binary logs of the specified table in Tablestore to synchronize the data of the specified table to DLF in real time.

    • OSS data format conversion: You can use this template to convert data formats in OSS. For example, you can convert the TEXT format to the Parquet format.

  3. Specify the OSS path to which you want to store the imported data.

  4. Set the parameters of the data import workflow, including the workflow name and the RAM role that is used to run DLF tasks. By default, the AliyunDLFWorkFlowDefaultRole role is used. You can also create a custom role in the Resource Access Management (RAM) console based on your business needs.

  5. Specify the resource quota that is required to run the workflow. DLF calculates resource usage based on compute units (CUs). A CU consists of two vCPUs and 8 GiB memory for computing.

  6. Specify the mode in which the workflow is triggered. You can specify that the workflow is manually triggered or triggered at the scheduled time.

    Configure a workflow

Delete a data import task

1. In the left-side navigation pane of the DLF console, choose Data into the lake > Data ingestion tasks.

2. Find the data import task that you want to delete and click Delete in the Operation column. In the message that appears, click OK.

Delete a data import task