Data Lake Analytics (DLA) is a next-generation big data solution that separates data computing from data storage. DLA can archive messages and database data and build data warehouses in real time. The supported databases include relational databases, PolarDB databases, and NoSQL databases. In addition, DLA provides the serverless Spark and Presto engines to meet the requirements for online interactive search, stream processing, batch processing, and machine learning. DLA is a competitive solution that migrates a traditional Hadoop solution to the cloud.
Data sources supported by DLA
For more information about the data sources that are supported by the serverless Spark and Presto engines of DLA, see Compatibility matrix for data sources and SQL statements.
Data source | Serverless Presto engine | Serverless Spark engine |
---|---|---|
OSS | Supported | Supported |
ApsaraDB RDS | Supported | Supported |
PolarDB | Supported | Supported |
ApsaraDB for HBase | To be supported | Supported |
MongoDB | Supported | To be supported |
Tablestore | Supported | Supported |
AnalyticDB MySQL 2.0 | Supported | Supported |
AnalyticDB MySQL 3.0 | Supported | Supported |
AnalyticDB PostgreSQL | Supported | Supported |
MaxCompute | Supported | Supported |
Elasticsearch | Supported | Supported |
Cassandra | Supported | Supported |
Kudu | Supported | Supported |
Self-managed Druid database hosted on an Elastic Compute Service (ECS) instance | Supported | Supported |
Features
DLA provides an end-to-end cloud-native data lake analytics and computing solution for data that is stored in Object Storage Service (OSS). DLA has the following benefits that help address various issues:
- End-to-end data lake solution: This solution enables efficient data ingestion, extract, transform, load (ETL), machine learning, and interactive analytics. DLA provides Data Lake Formation (DLF) and serverless Presto and Spark engines.
- Secure data processing: DLA helps prevent data misuse because all tables in databases and the stored data of DLA have separate security solutions.
- Cost-effective data processing: The serverless cloud-native data processing solution of DLA is cost-effective.
- Smooth evolution: DLA ensures a smooth evolution from a Hadoop system to a data lake solution.
Support for serverless Presto and Spark engines
The serverless Presto engine of DLA is developed based on Apache Presto. All the computing works are implemented by the memory. The serverless Presto engine delivers a high-performance and interactive analytics experience so that analytic results are returned within seconds. The serverless Spark engine is developed based on Apache Spark and is compatible with all the APIs provided by Apache Spark.
We recommend that you use the serverless Spark engine of DLA in the following scenarios:
- Code must be customized or SQL statements cannot meet your business requirements.
- A large amount of data needs to be cleansed. For example, one terabyte to one petabyte of data stored in OSS must be cleansed per day.
- A wide range of algorithms must be supported. The serverless Spark engine of DLA supports all Spark algorithms.
- Streaming must be supported.