Presto-Alibaba Cloud Developer Community

Presto is a distributed system running on clusters. A complete installation applies for a coordinator process and multiple workers processes. Queries are submitted to the coordinator process through a client such as Presto CLI. The coordinator process parses, analyzes, and generates a query execution plan, and then distributes the execution process to the workers process.

The following is an architecture diagram  http://www.dw4e.com/? p = 141, this figure slightly modifies the architecture of the official website to add Discovery services, which may look clearer):

the Presto query engine is a Master-Slave architecture that consists of a Coordinato r node, a Discovery Server node, and multiple Worker nodes. The Discovery Server is usually embedded in Coordinator nodes. Coordinator parses SQL statements, generates execution plans, and distributes execution tasks to Worker nodes for execution. The Worker node is responsible for actual query tasks. After the Worker node is started, it registers with the Discovery Server service Coordinator obtain a Worker node that works properly from the Discovery Server. If Hive Connector is configured, you need to configure a Hive MetaStore service to provide Hive meta information for Presto, and the Worker node interacts with HDFS to read data.

Presto has the following basic requirements:

  • Linux or Mac OS X
  • Java 8,64 wei
  • Python 2.4++

2.1 connectors

Presto supports pluggable connectors for data query. Different connectors have different requirements.

HADOOP/HIVE

Presto supports reading the following versions of hive data:

  • Apache Hadoop 1.x, use  hive-hadoop1 connector
  • Apache Hadoop 2.x, use  hive-hadoop2 connector
  • Cloudera CDH 4 使用 hive-cdh4 connector
  • Cloudera CDH 5 使用 hive-cdh5 connector

the following formats are supported: Text, SequenceFile, RCFile, and ORC.

In addition, a remote Hive metastore service is required. Local or embedded mode is not supported. Presto does not use MapReduce and only requires HDFS.

CASSANDRA

Cassandra 2.x is required. This connector is completely independent of the Hive connector and requires only one installed Cassandra cluster.

TPC-H

TPC-H connectors dynamically generate data for experiments and testing Presto. This connector has no additional requirements.

Of course, Presto also supports some other connectors, including:

  • JMX
  • Kafka
  • MySQL
  • PostgreSQL

3.1 What Presto Is Not

Presto supports SQL and provides the syntax features of a standard database, but it is not a common relational database. It is not a substitute for relational databases, such as MySQL, PostgreSQL, or Oracle. Presto is not designed to solve online transaction processing (OLTP).

3.2 What Presto Is

Presto is a tool used to effectively query large amounts of data through distributed queries. Presto is an optional tool that can be used to query HDFS. By using the pipeline of MapReduce jobs, such as hive and pig, but not limited to querying HDFS Data, it can also query data from different data sources, including relational databases and other data sources, such as cassandra.

Presto is designed to process data warehouses and analyses: analyze data, aggregate large amounts of data, and generate reports. These scenarios are usually defined as OLAP.

3.3 Who uses Presto?

Foreign countries:

domestic:

the following are some materials, hoping to help you understand Presto:

  • official Presto documentation: http://prestodb.io/
  • shib:Shib is a web-client written in Node.js designed to query Presto and Hive.

  • Facebook Presto team's article introducing Presto:  https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920

  • SlideShare share Presto's PPT:  http://www.slideshare.net/zhusx/presto-overview? from_search = 1 And  http://www.slideshare.net/frsyuki/hadoop-source-code-reading-15-in-japan-presto

  • Presto single-node and multi-node configuration
  • Impala Presto wiki  this article mainly introduces the architecture, principle and workflow of Presto, and compares it with impala.
  • Record the configuration process of the Presto data query engine.
Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now