All Products
Search
Document Center

E-MapReduce:Specification and memory configuration recommendations for EMR Trino clusters

Last Updated:Jan 19, 2026

This topic describes how to select appropriate instance specifications and configure memory settings for an E-MapReduce (EMR) Trino cluster based on your business requirements.

Recommendations for EMR Trino cluster specifications

A Trino cluster consists of a coordinator and multiple workers. In EMR, the coordinator runs on the master node, while workers run on core or task nodes. Note: Trino does not support high availability (HA). Even in an HA EMR cluster, only one master node hosts the Trino coordinator. A single coordinator can manage hundreds of workers.

When creating an EMR Trino cluster, select only the Trino engine and necessary data lake components. Use the following guidelines to choose appropriate instance specifications:

  • If master and core nodes use the same specifications, purchase 5 to 20 worker nodes, or more if needed.

  • For clusters with few worker nodes, you can use master nodes with half the specifications of core nodes. However, ensure the master node has sufficient vCPUs to maintain performance and stability.

  • For large-scale clusters, increase the number of worker nodes instead of scaling up individual node specifications.

  • To prevent memory issues and optimize costs, enable auto scaling to automatically adjust the cluster size based on load. For more information, see Overview.

  • If you only need Trino, avoid installing unrelated components like HDFS or Hive. Select only Trino and required data lake components to save resources.

Query performance depends on vCPUs (speed) and memory (execution success). General-purpose instances are suitable for most scenarios. Adjust specifications based on your specific SQL complexity and business needs. The following table lists typical cluster specifications:

Node type and quantity

Number of vCPUs for a cluster

Memory size for a cluster

1 Master

16 vCPUs

64 GB

5 Core

16 vCPUs

64 GB

Memory configuration recommendations for EMR Trino clusters

Insufficient memory is a common cause of query task failures. The following parameters control memory usage (in GB or MB):

  • query.max-memory-per-node

  • query.max-memory

  • query.max-total-memory-per-node (Removed in Trino Release 369; not applicable to DataLake clusters.)

  • query.max-total-memory

  • memory.heap-headroom-per-node

To modify these parameters:

  1. Modify JVM parameters.

    You must configure JVM parameters before adjusting other memory settings.

    On the Trino service page in the EMR console, go to the Configure tab. Click the jvm.config subtab and edit the value following -Xmx.Edit Trino configuration

    • Set the value to approximately 70% of the node's physical memory. This balances efficiency and stability.

    • For nodes with large memory (>128 GB), you can increase this value but avoid setting it too high. Excessive heap memory can leave insufficient space for native method requests, causing the OS to kill the process (OOM). Adjust based on actual stability.

    • If master and core nodes use different specifications, you can configure each node group separately.

    For the clusters listed in Recommendations for EMR Trino cluster specifications, set -Xmx to 45G–50G.

  2. Configure memory parameters.

    On the Configure tab, click the config.properties tab.

    Parameter

    Description

    Default value

    Recommended value

    query.max-memory-per-node

    The maximum amount of memory that a query can use on a single worker node.

    2 GB

    Set this parameter to a value less than or equal to (JVM heap memory - memory.heap-headroom-per-node). This ensures that the query memory on a single worker node does not exceed 70% of the JVM memory. If concurrency is high, decrease this value accordingly.

    query.max-memory

    The maximum amount of memory that a query can use across the entire cluster.

    4 GB

    Recommended formula: query.max-memory-per-node × Number of worker nodes.

    query.max-total-memory

    The maximum amount of memory (including revocable memory) that a query can use on the cluster. Must be ≥ query.max-memory.

    6 GB

    If concurrency is low, set equal to query.max-memory. If concurrency is high, this value cannot exceed the cluster's total max memory (70% of JVM memory × Number of worker nodes). You can set query.max-memory to half of query.max-total-memory, and query.max-memory-per-node to query.max-memory / Number of worker nodes.

    memory.heap-headroom-per-node

    The reserved JVM heap memory.

    30% of the JVM memory

    Retain the default value unless you have special requirements.

    Configure query.max-memory-per-node and memory.heap-headroom-per-node for each node. Configure query.max-memory and query.max-total-memory globally for the cluster.

    For the cluster specifications in Recommendations for EMR Trino cluster specifications, set query.max-memory-per-node to 30–35 GB, and query.max-memory and query.max-total-memory to 150–165 GB. If services frequently stop during queries, gradually decrease query.max-memory-per-node.

  3. After saving configurations, restart all nodes to apply changes.

Related configurations

To limit resources for a single query, use the Resource Groups feature.

To improve query speed, adjust concurrency using the task.concurrency parameter. For more details, see Task Properties.