Data Lake Analytics (DLA) is a next-generation big data solution that separates data computing from data storage. DLA can archive messages and database data and build data warehouses in real time. The supported databases include relational databases, PolarDB databases, and NoSQL databases. In addition, DLA provides the serverless Spark and Presto engines to meet the requirements for online interactive search, stream processing, batch processing, and machine learning. DLA is a competitive solution that migrates a traditional Hadoop solution to the cloud.

DLA supports the pay-per-byte and pay-per-CU billing types. The serverless Presto engine supports the two billing types. The serverless Spark engine supports only the pay-per-CU billing type. For more information about the differences between the two billing types, see Differences between pay-per-byte and pay-per-CU.

Data sources supported by DLA

For more information about the data sources that are supported by the serverless Spark and Presto engines of DLA, see Compatibility matrix for data sources and SQL statements.

Data source Serverless Presto engine Serverless Spark engine
OSS Supported Supported
ApsaraDB RDS Supported Supported
PolarDB Supported Supported
ApsaraDB for HBase To be supported Supported
MongoDB Supported To be supported
Tablestore Supported Supported
AnalyticDB MySQL 2.0 Supported Supported
AnalyticDB MySQL 3.0 Supported Supported
AnalyticDB PostgreSQL Supported Supported
MaxCompute Supported Supported
Elasticsearch Supported Supported
Cassandra Supported Supported
Kudu Supported Supported
Self-managed Druid database hosted on an Elastic Compute Service (ECS) instance Supported Supported

Features

DLA provides an end-to-end cloud-native data lake analytics and computing solution for data that is stored in Object Storage Service (OSS). DLA has the following benefits that help address various issues:

  • End-to-end data lake solution: This solution enables efficient data ingestion, extract, transform, load (ETL), machine learning, and interactive analytics. DLA provides Data Lake Formation (DLF) and serverless Presto and Spark engines.
  • Secure data processing: DLA helps prevent data misuse because all tables in databases and the stored data of DLA have separate security solutions.
  • Cost-effective data processing: The serverless cloud-native data processing solution of DLA is cost-effective.
  • Smooth evolution: DLA ensures a smooth evolution from a Hadoop system to a data lake solution.

Support for serverless Presto and Spark engines

The serverless Presto engine of DLA is developed based on Apache Presto. All the computing works are implemented by the memory. The serverless Presto engine delivers a high-performance and interactive analytics experience so that analytic results are returned within seconds. The serverless Spark engine is developed based on Apache Spark and is compatible with all the APIs provided by Apache Spark.

We recommend that you use the serverless Spark engine of DLA in the following scenarios:

  • Code must be customized or SQL statements cannot meet your business requirements.
  • A large amount of data needs to be cleansed. For example, one terabyte to one petabyte of data stored in OSS must be cleansed per day.
  • A wide range of algorithms must be supported. The serverless Spark engine of DLA supports all Spark algorithms.
  • Streaming must be supported.