What is DLA? - Data Lake Analytics - Deprecated - Alibaba Cloud Documentation Center

Important

Data Lake Analytics (DLA) is discontinued. AnalyticDB for MySQL supports the features of DLA and provides additional features and enhanced performance. For more information about how to use AnalyticDB for MySQL, see What is AnalyticDB for MySQL?

DLA is a next-generation big data solution that separates data computing from data storage. DLA can archive messages and database data and build data warehouses in real time. The supported databases include relational databases, PolarDB databases, and NoSQL databases. In addition, DLA provides the serverless Spark and Presto engines to meet the requirements for online interactive search, stream processing, batch processing, and machine learning. As a robust alternative to traditional Hadoop solutions, DLA facilitates a seamless transition to cloud-based analytics.

Data sources supported by DLA

For more information about the data sources that are supported by DLA, see Compatibility matrix for data sources and SQL statements.

Data source	Serverless Presto engine	Serverless Spark engine
Object Storage Service (OSS)	Supported	Supported
ApsaraDB RDS	Supported	Supported
PolarDB	Supported	Supported
ApsaraDB for HBase	To be supported	Supported
ApsaraDB for MongoDB	Supported	To be supported
Tablestore	Supported	Supported
AnalyticDB for MySQL V2.0	Supported	Supported
AnalyticDB for MySQL V3.0	Supported	Supported
AnalyticDB for PostgreSQL	Supported	Supported
MaxCompute	Supported	Supported
Elasticsearch	Supported	Supported
ApsaraDB for Cassandra	Supported	Supported
Kudu	Supported	Supported
Self-managed Druid database hosted on an Elastic Compute Service (ECS) instance	Supported	Supported

Features

DLA provides an end-to-end cloud-native data lake analytics and computing solution for data that is stored in OSS. DLA has the following benefits that help troubleshoot various issues:

End-to-end data lake solution: This solution enables efficient data ingestion, extract, transform, load (ETL), machine learning, and interactive analytics. DLA provides Data Lake Formation (DLF) and serverless Presto and Spark engines.
Secure data processing: All tables in databases and the stored data of DLA have separate security solutions. This prevents data misuse.
Cost-effective data processing: The serverless cloud-native data processing solution of DLA is cost-effective.
Smooth evolution: DLA ensures a smooth evolution from a Hadoop system to a data lake solution.

Support for serverless Presto and Spark engines

The serverless Presto engine of DLA is built based on Apache Presto. All the computing jobs are implemented by the memory. The serverless Presto engine delivers a high-performance and interactive analysis experience, and returns analysis results in seconds. The serverless Spark engine is compatible with all the API operations provided by Apache Spark.

We recommend that you use the serverless Spark engine of DLA in the following scenarios:

You must customize code or SQL statements cannot meet your business requirements.
A large amount of data needs to be cleansed. For example, one terabyte to one petabyte of data stored in OSS must be cleansed per day.
A wide range of algorithms must be supported. The serverless Spark engine of DLA supports all Spark algorithms.
Streaming must be supported.