Presto is an open-source distributed SQL query engine. It is used to run interactive analytic queries.
Basic features
- Supports American National Standards Institute (ANSI) SQL.
- Supports various data sources:
- Hive
- Cassandra
- Kafka
- MongoDB
- MySQL
- PostgreSQL
- SQL Server
- Redis
- Redshift
- Local files
- Supports advanced data structures:
- Array and map data
- JSON data
- GIS data
- Color data
- Delivers strong scalability:
- Data connector expansion
- Customization of data types
- Customization of SQL functions
You can expand modules based on your business requirements to improve the processing efficiency of your business.
- Uses a pipeline model to process data and return data in real time.
- Provides a monitoring interface:
- Provides a web UI, on which you can view the execution processes of queries.
- Supports Java Management Extensions (JMX) protocols.
Scenarios
- Extract, transform, load (ETL)
- Ad hoc queries
- Analysis of large volumes of structured or semi-structured data
- Aggregation of large volumes of multidimensional data, and report analysis
Benefits
- You can quickly deploy a Presto cluster with hundreds of nodes.
- EMR Presto supports auto scaling. You can easily scale out a Presto cluster.
- EMR Presto can process data stored in OSS buckets.
- EMR Presto is O&M free and offers 24/7 service.
References
The Presto version depends on the EMR version that you select when you create a cluster. For the mapping between EMR versions and Presto versions, see Version overview.
- If the Presto version is 3XX, visit prestosql.io/docs/3XX/.
For example, visit prestosql.io/docs/331/ to view Presto 331 Documentation.
- If the Presto version is 0.2XX, visit prestodb.io/docs/0.2XX/.
For example, visit prestodb.io/docs/0.228/ to view Presto 0.228 Documentation.