All Products
Search
Document Center

E-MapReduce:Configure Auto Scaling rules for a cluster

Last Updated:Mar 26, 2026

When cluster workloads fluctuate, manually resizing node counts is slow and error-prone. Auto scaling in E-MapReduce (EMR) adds or removes task nodes automatically based on rules you define — keeping jobs running smoothly during peaks and reducing costs during off-peak hours.

Prerequisites

Before you begin, ensure that you have:

  • A DataLake, Dataflow, online analytical processing (OLAP), DataServing, or custom cluster. For more information, see Create a cluster.

  • A task node group containing pay-as-you-go instances or preemptible instances. For more information, see Create a node group.

Step 1: Select a trigger mode

Choose a trigger mode based on your workload characteristics:

Scenario Trigger mode
Workloads fluctuate on a predictable schedule, or you need a fixed node count within a specific period. Time-based scaling
Workloads fluctuate without a predictable schedule, and the required node count changes with the actual workload. Load-based scaling
Workloads have both predictable patterns and dynamic fluctuations. Time-based scaling combined with load-based scaling

Step 2: Configure auto scaling rules

When multiple auto scaling rules are configured and their conditions are met simultaneously, the system applies the following execution order:
Scale-out rules take precedence over scale-in rules.
Time-based and load-based rules execute based on trigger sequence.
Load-based rules trigger based on the time the cluster load metrics are evaluated.
Load-based rules with the same cluster load metrics trigger in the order they were configured.

Time-based scaling

Configure a time-based scale-out rule to run repeatedly or once at a specific time when workload is expected to increase. Pair it with a scale-in rule to reduce nodes during off-peak hours.

For a rule that repeats, set Rule Expiration Time to stop triggering scaling after a specified date.

Example: If workloads increase at 22:00 and decrease at 04:00 daily, configure a repeated scale-out rule at 22:00 and a repeated scale-in rule at 04:00.

For parameter details, see Configure custom auto scaling rules.

Load-based scaling

Load-based scaling monitors cluster load metrics and triggers scaling when thresholds are crossed. After configuration, click OK, then click Save and Apply in the Configure Auto Scaling panel.

Getting started

1. Select cluster load metrics

On the Metric Monitoring subtab of the Monitoring tab, select YARN-HOME from the Dashboard drop-down list. Observe how metrics change across different periods and business workloads.

View cluster load metrics

Select metrics whose values are inversely related to cluster capacity — after a scaling activity, metric values should decrease as the number of instances increases.

The following YARN metrics are recommended:

Metric Service Description
yarn_resourcemanager_queue_AvailableMBPercentage YARN Percentage of available memory to total memory in the root queue
yarn_resourcemanager_queue_AvailableVCores YARN Number of available vCPUs in the root queue
yarn_resourcemanager_queue_AvailableMB YARN Available memory in the root queue (MB)
yarn_resourcemanager_queue_AppsPending YARN Number of pending tasks in the root queue
yarn_resourcemanager_queue_PendingContainers YARN Number of containers waiting to be allocated in the root queue
yarn_resourcemanager_queue_AvailableVCoresPercentage YARN Percentage of available vCPUs in the root queue

For your first load-based rule, use pending-related metrics (such as yarn_resourcemanager_queue_AppsPending) for scale-out rules and available-related metrics for scale-in rules.

Example: Set a scale-out rule to trigger when yarn_resourcemanager_queue_AppsPending is greater than or equal to 1 over 60 seconds. When the condition is met, one node is added and the number of pending tasks decreases.

2. Configure the rule parameters

Parameter Description Recommendation
Trigger conditions One or more metric thresholds with AND or OR logic Use multiple conditions with AND to reduce false triggers
Statistical Period Time window for evaluating metric values Set to 1 minute. Larger values may trigger scaling on stale data.
Cooldown Time Minimum time between scaling activities Set to 100–300 seconds for scale-out rules. Adding nodes takes an average of 1.55 minutes (or 1.83 minutes for 100 nodes), so a cooldown lets new nodes stabilize before re-evaluating.
Number of nodes Instances to add or remove per activity Estimate based on existing node capacity and expected workload growth
Effective time period Hours within a day when the rule is active Configure different rules for different time periods as needed

3. Set node group limits

The Limits on Node Quantity of Current Node Group section defines the node count boundaries:

  • Maximum Number of Instances — upper limit for the node group, preventing indefinite scale-out.

  • Minimum Number of Instances — lower limit; if instances are released unexpectedly, the system adds them back to meet this minimum.

4. Tune after initial deployment

After the rule is active, review metric trends and scaling activity records to adjust the configuration:

  • Too many scaling events, with added nodes idling then being removed — add more metric conditions using the AND operator, or increase the cooldown period value.

  • Scale-out too slow or insufficient — increase the number of instances added per scale-out activity.

What's next