All Products
Search
Document Center

E-MapReduce:Overview

Last Updated:Jun 29, 2023

Presto (namely PrestoDB) is a flexible and scalable distributed SQL query engine. This topic describes the basic features, architecture, and benefits of Presto.

Precautions

Presto is supported in E-MapReduce (EMR) V3.45.0, V5.11.0, and their later minor versions. The version number of Presto is 0.2XX. In EMR clusters of a minor version earlier than V3.45.0 or V5.11.0, the kernel of Presto 3XX is PrestoSQL or Trino. For more information, see Trino overview.

Basic features

Presto is implemented in Java. It is easy to use and offers high performance and strong scalability. Presto provides the following features:

  • Supports American National Standards Institute (ANSI) SQL.

  • Supports a wide range of data sources, such as Hive, Hudi, Iceberg, Delta Lake, MySQL, and PostgreSQL.

  • Supports advanced data structures:

    • Array and map data

    • JSON data

    • Geographic information system (GIS) data

    • Color data

  • Delivers strong scalability:

    • Various data connectors

    • Custom data types

    • Custom SQL functions

  • Uses a pipeline model to process data and return data in real time.

  • Provides a monitoring interface.

    • Provides a web UI, on which you can view the execution processes of queries.

    • Supports Java Management Extensions (JMX) protocols.

Architecture

The following figure shows the architecture of Presto.Presto系统组成

Presto has a typical master/slave architecture that comprises a coordinator and multiple workers. The coordinator provides the following features:

  • Receives and parses query requests, generates an execution plan, and then delivers the execution plan to workers for execution.

  • Monitors the status of workers. Each worker maintains a heartbeat connection with the coordinator.

  • Maintains metastore data.

Workers run the tasks that are assigned by the coordinator, use connectors to read data from external storage systems, process the data, and send the processing results to the coordinator.

Presto does not support a high-availability architecture. In an EMR cluster, the coordinator is deployed only on the master-1-1 node, and workers are deployed on all core and task nodes.

Scenarios

Presto is a distributed SQL query engine for data warehousing and data analytics services. Presto is suitable for the following scenarios:

  • Extract, transform, and load (ETL)

  • Ad hoc queries

  • Analysis of large amounts of structured or semi-structured data

  • Aggregation of large amounts of multidimensional data, and report analysis

Important

Presto is a data warehousing product. It offers limited support for transactions and is not suitable for online business scenarios.

Benefits

EMR Presto has the following advantages over open source Presto:

  • You can quickly deploy a Presto cluster that has hundreds of nodes.

  • EMR Presto supports auto scaling. You can easily scale a Presto cluster.

  • EMR Presto can process data stored in Data Lake Formation (DLF), Object Storage Service (OSS), or OSS-HDFS.

  • EMR Presto provides a one-stop service. No O&M is required.

Terms

Data model

A data model is a data organization form. Presto uses three levels of components to manage data, which are catalogs, schemas, and tables.

  • Catalog: A catalog contains multiple schemas and references an external data source, which can be accessed by using connectors. You can execute an SQL statement in Presto to access one or more catalogs.

  • Schema: A schema is a database instance that contains multiple data tables.

  • Table: A data table is the same as a common database table.

Connector

Presto uses built-in connectors to connect to various external data sources. Presto provides a standard service provider interface (SPI), which allows you to develop your own connectors to access custom data sources.

A catalog is typically associated with a specific type of connector that is configured in the Properties file of the catalog.

References

You can visit https://prestodb.io/docs/XXX/ in a web browser to view the open source Presto documentation. Replace XXX with the version number of the Presto service that you use.

For example, you can visit https://prestodb.io/docs/0.279/ to view Presto 0.279 documentation.