Data Lake Analytics (DLA) is a serverless cloud native interactive search and analytics service. It allows you to use the SQL and Spark engines to analyze data from a variety of data sources. Quick Start helps you understand the basic procedure of using DLA and provides guidance for you to activate DLA, build a data lake, and use the SQL and Spark engines to analyze and compute data.
If you use DLA for the first time, we recommend that you read the following references
first:
- Product Introduction: This document describes the concepts, benefits, and scenarios of DLA.
- Pricing: This document describes the pricing and billing methods of DLA.
- Activate DLA.
- Optional:Create a virtual cluster. DLA CU Edition is suitable for scenarios in which large amounts of data is frequently
queried. It also helps you determine the costs of using DLA. We recommend that you
use DLA CU Edition to analyze and compute data.
Note If you use the default edition to analyze and compute data, skip this step. The default edition uses the billing method based on the number of bytes scanned. For more information about the differences between the default edition and DLA CU Edition, see Differences between billing methods.
- Build a data lake. You can use one of the following methods to build a data lake:
- Manually upload files to Object Storage Service (OSS). Then, use the metadata crawling feature to create tables to build a data lake. For more information, see Upload objects and Crawl metadata.
- Use another service to deliver files to OSS. For example, use the ActionTrail console to deliver log files to OSS. Then, use the metadata crawling feature to create tables to build a data lake. For more information, see Create a single-account trail and Crawl metadata.
- You can build a data lake through one-click data warehousing or by merging multiple databases. You can also build a real-time data lake based on databases and message logs. For more information, see One-click data warehousing, Create a data warehouse by merging databases, and Build a real-time data lake.
- Access data sources. You can use DLA to access OSS or other data sources to analyze and compute data. For more information, see Use the serverless SQL engine to access data sourcesUse the serverless Spark engine to access data sources.
- Analyze and compute data. You can use the serverless SQL or Spark engine to analyze and compute data. For more information, see Serverless SQL and Serverless Spark.
- Implement data applications. You can use DataWorks or Data Management (DMS) to schedule DLA SQL and Spark tasks, and display the query and analysis results of OSS data as business intelligence (BI) reports. For more information, see ETL scheduling and Create Quick BI visualized reports.