Presto is an open source distributed SQL query engine. It is used to run interactive analytic queries.
- Supports American National Standards Institute (ANSI) SQL.
- Supports various data sources:
- SQL Server
- Local files
- Supports advanced data structures:
- Array and map data
- JSON data
- GIS data
- Color data
- Delivers strong scalability:
- Support for more data connectors
- Customization of data types
- Customization of SQL functions
- Uses a pipeline model to process data and return data in real time.
- Provides a monitoring interface:
- Provides a web UI, on which you can view the execution processes of queries.
- Supports Java Management Extensions (JMX) protocols.
- Extract, transform, load (ETL)
- Ad hoc queries
- Analysis of large amounts of structured or semi-structured data
- Aggregation of large amounts of multidimensional data, and report analysis
- You can quickly deploy a Presto cluster with hundreds of nodes.
- EMR Presto supports auto scaling. You can easily scale out a Presto cluster.
- EMR Presto can process data stored in OSS buckets.
- EMR Presto provides an end-to-end service that requires no O&M.
The Presto version depends on the EMR version that you select when you create a cluster. For the mapping between EMR versions and Presto versions, see Overview.
For more information about how to use Presto in Zeppelin, see Zeppelin.