Presto is an open source distributed SQL query engine. It is used to run interactive analytic queries.

Background information

This topic describes the following aspects of Presto:

Basic features

Presto is implemented in Java. It is easy to use and offers high performance and strong scalability. Presto provides the following features:
  • Supports American National Standards Institute (ANSI) SQL.
  • Supports various data sources:
    • Hive
    • Cassandra
    • Kafka
    • MongoDB
    • MySQL
    • PostgreSQL
    • SQL Server
    • Redis
    • Redshift
    • Local files
  • Supports advanced data structures:
    • Array and map data
    • JSON data
    • GIS data
    • Color data
  • Delivers strong scalability:
    • Support for various data connectors
    • Customization of data types
    • Customization of SQL functions
  • Uses a pipeline model to process data and return data in real time.
  • Provides a monitoring interface:
    • Provides a web UI, on which you can view the execution processes of queries.
    • Supports Java Management Extensions (JMX) protocols.

Architecture

The following figure shows the architecture of Presto. Architecture of Presto
Presto has a typical master/slave architecture that comprises a coordinator node and multiple worker nodes. The coordinator node provides the following features:
  • Receives and parses query requests, generates an execution plan, and delivers the execution plan to worker nodes for execution.
  • Monitors the running status of worker nodes. Each worker node maintains a heartbeat connection with the coordinator node.
  • Maintains metastore data.

Worker nodes run the tasks that are assigned by the coordinator node, use connectors to read data from external storage systems, process the data, and send the processing results to the coordinator node.

Scenarios

Presto is a distributed SQL query engine for data warehousing and data analytics services. It can be used in the following scenarios:
  • Extract, transform, load (ETL)
  • Ad hoc queries
  • Analysis of large amounts of structured or semi-structured data
  • Aggregation of large amounts of multidimensional data, and report analysis
Notice Presto is a data warehousing product. It offers limited support for transactions and is not suitable for online business scenarios.

Benefits

EMR Presto has the following advantages over open source Presto:
  • You can quickly deploy a Presto cluster that has hundreds of nodes.
  • EMR Presto supports auto scaling. You can easily scale out a Presto cluster.
  • EMR Presto can process data stored in OSS buckets.
  • EMR Presto provides an end-to-end service that requires no O&M.

Basic concepts

Data source model

A data source model is a data organization form. Presto uses three levels of components to manage data, which are catalogs, schemas, and tables.
  • Catalog

    A catalog contains multiple schemas and references an external data source, which can be accessed by using connectors. You execute an SQL statement in Presto to access one or more catalogs.

  • Schema

    A schema is a database instance that contains multiple data tables.

  • Table

    A data table is the same as a general database table.

The following figure shows the relationships among catalogs, schemas, and tables. Relationships among catalogs, schemas, and tables

Connector

Presto uses connectors to connect to various external data sources. Presto provides a standard SPI, which allows you to develop your own connectors to access custom data sources.

A catalog is typically associated with a specific type of connector that is configured in the Properties file of the catalog. Presto contains multiple built-in connectors.

References

The Presto version depends on the EMR version that you select when you create a cluster. For the mapping between EMR versions and Presto versions, see Overview.

The URL of open source Presto documentation varies based on the Presto version.
  • If the Presto version is 3XX, visit http://trino.io/docs/3XX/.

    For example, visit http://trino.io/docs/331/ to view Presto 331 Documentation.

  • If the Presto version is 0.2XX, visit http://prestodb.io/docs/0.2XX/.

    For example, visit http://prestodb.io/docs/0.228/ to view Presto 0.228 Documentation.