This topic introduces the basic concepts used in Data Lake Analytics (DLA).

data lake

A data lake is a centralized repository that allows you to store all structured and unstructured data in any size. A data lake can store up to exabytes of data from a variety of data sources, such as Object Storage Service (OSS).

DLA

DLA is a next-generation big data solution that separates computing from storage. DLA can archive messages and database data and create data warehouses in real time. The databases include ApsaraDB RDS and PolarDB databases. In addition, DLA provides the serverless Spark and Presto-compatible SQL engines to meet the requirements of online interactive search, stream processing, batch processing, and machine learning. Compared with traditional Hadoop solutions, DLA is also a competitive cloud-based Hadoop solution. Scalability is the core competitiveness of DLA.

virtual cluster

Virtual cluster (VC) is the abstraction of underlying resources. You can configure network connections and basic information for a VC. A VC must be created if you use the billing method based on the number of CUs used. If you use the billing method based on the number of bytes scanned, you are not charged for the creation of VCs. Instead, you are charged only for the number of bytes scanned. This ensures that your queries can be immediately responded even when you have not purchased resources.

DLA accounts

DLA provides two types of accounts: DLA accounts and RAM users. You can associate DLA accounts with RAM users.

DLA metadata

Metadata refers to schemas, tables, columns, and views. Schema is a set of tables and corresponds to only one data source. Table is a set of homogeneous rows. Column describes a property of a row of data. View is a table that is abstracted from a query result. You can use the serverless SQL or Spark engine to securely access metadata.

DLA syntax standards

  • DDL: Refer to the Hive standards.
  • DCL: Refer to the MySQL database standards.
  • DML: For the serverless SQL engine, you can refer to the Presto standards. For the serverless Spark engine, you can refer to the Spark standards.