Data Lake Formation (DLF) can help you build cloud-based data lakes with ease, and manage all data lakes in a centralized manner.
DLF is in public preview. You can activate and use DLF at any time. At present, all features of DLF are free of charge.
Usage procedure
DLF can be used to extract source data to data lakes with ease. The process of using DLF includes the following steps:
After DLF in public preview is activated, log on to the Alibaba Cloud Management Console. Choose Products > Analytics Computing > Data Lake Formation (DLF). On the page that appears, click Console to go to the DLF console.
Create a data source. In this step, select the data source whose data you want to import to a data lake. For more information, see Manage data sources.
Create a data import template to periodically extract data from the data source to the data lake. For more information, see Manage data import tasks.
Define a metadatabase and a metadata table in the data lake. For more information, see Manage metadata.
Overview of the DLF console
The homepage of the DLF console consists of the left-side navigation pane and the DLF information section. The DLF console provides quick links for you to use the major features of DLF. This helps you get started with DLF with ease.
Data lake location
All the data in data lakes that are created by using DLF is stored in Object Storage Service (OSS). You must specify an OSS bucket or an OSS path to store the data of your data lake.
Metadata management
The metadata management in a data lake of DLF consists of the management of metadata in metadatabases and metadata tables.
Data sources
Data is extracted from data sources to the specified data lake location. DLF supports various data sources, such as ApsaraDB RDS for MySQL data sources.
Parameter | Description |
Connection Name | The unique name of the connection in DLF. |
Connection Type | Only ApsaraDB RDS for MySQL data sources are supported. |
Username | The username that is used to connect to the ApsaraDB RDS for MySQL data source. |
Password | The password that is used to connect to the ApsaraDB RDS for MySQL data source. |
VPC | The virtual private cloud (VPC) in which the ApsaraDB RDS for MySQL data source resides. |
VSwitch | The vSwitch in which the ApsaraDB RDS for MySQL data source resides. |
Security Group | The security group to which the ApsaraDB RDS for MySQL data source belongs. |
Data import templates
You can create data import templates to manually import data from data sources to a data lake in DLF. Alternatively, you can schedule data to be imported from data sources to a data lake in DLF at the specified time.