Data Lake Analytics (DLA) is a next-generation big data solution that separates computing from storage. DLA can archive messages and database data and create data warehouses in real time. The databases include ApsaraDB RDS, PolarDB, and NoSQL databases. In addition, DLA provides the serverless Spark and Presto-compatible SQL engines to meet the requirements of online interactive search, stream processing, batch processing, and machine learning. Compared with a traditional Hadoop solution, DLA is also a competitive Hadoop solution.

DLA uses the pay-as-you-go billing method, which charges you based on the number of bytes scanned or based on the number of CUs used. The serverless SQL engine supports both the billing method based on the number of bytes scanned and the billing method based on the number of CUs used. The serverless Spark engine supports only the billing method based on the number of CUs used. For more information, see Differences between billing methods.

Data sources supported by DLA

For more information about the matrix of data sources supported by DLA, see Compatibility matrix for data sources and SQL statements.

Data source Serverless Presto-compatible SQL engine Serverless Spark
OSS Supported Supported
RDS Supported Supported
PolarDB Supported Supported
Hbase Supported later Supported
MongoDB Supported Supported later
Tablestore Supported Supported
AnalyticDB MySQL Supported Supported
AnalyticDB MySQL Supported Supported
AnalyticDB PostgreSQL Supported Supported
MaxCompute Supported Supported
Elasticsearch Supported Supported
Self-managed Druid database hosted on an ECS instance Supported Supported

Features

DLA provides an end-to-end cloud-native data lake analytics and computing solution for the data stored in Object Storage Service (OSS). DLA offers the following features to help you tackle the challenges you face:

  • End-to-end data lake solution that enables efficient data access, extract, transform, load (ETL), machine learning, and interactive analytics. DLA provides a platform for you to build a data lake and offers the serverless Presto-compatible SQL engine and the serverless Spark engine.
  • Secure data processing. DLA is best suited to prevent data misuse because all tables in databases and stored data of DLA have their respective security solutions.
  • Cost-effective data processing. DLA is the preferred choice for you because it is a serverless cloud-native data processing solution.
  • Smooth evolution solution. DLA ensures smooth evolution from a Hadoop system to a data lake solution.

Support for both serverless SQL and serverless Spark engines

The serverless SQL engine of DLA is developed based on the open source Apache Presto engine. All the computing work is completed by the memory. The serverless SQL engine delivers a high-performance and interactive analytics experience so that SQL queries can be responded within seconds. The serverless Spark engine is developed based on the open source Apache Spark engine and is compatible with all Apache Spark APIs.

We recommend that you use the DLA serverless Spark engine in the following scenarios:

  • You must customize code or SQL statements cannot meet your business requirements.
  • Large amounts of data needs to be cleansed, for example, one terabyte to one petabyte of data stored in OSS needs to be cleansed in a day
  • A wide range of algorithms need to be supported. The DLA serverless Spark engine supports a complete library of Spark algorithms.
  • Streaming is required.