All Products
Search
Document Center

E-MapReduce:Configure auto scaling rules

Last Updated:Mar 17, 2025

The auto scaling feature is the core capability provided by the cloud-based big data platform E-MapReduce (EMR). After you configure auto scaling rules, the system adds or removes nodes in an EMR cluster based on your business requirements. This helps meet the requirements of business workload fluctuations and reduce costs. This topic describes how to configure appropriate auto scaling rules for an EMR cluster based on your business requirements.

Prerequisites

  • A DataLake, Dataflow, online analytical processing (OLAP), DataServing, or custom cluster is created. For more information, see Create a cluster.
  • A task node group that contains pay-as-you-go instances or preemptible instances is created in the cluster. For more information, see Create a node group.

Step 1: Select a trigger mode

You can select a trigger mode based on your business requirements.
ScenarioTrigger mode
Your business workloads regularly fluctuate over time or you require a fixed number of nodes within a specific period of time.We recommend that you use time-based scaling.
Your business workloads fluctuate without significant time patterns, and the number of required nodes changes with the business workloads.We recommend that you use load-based scaling. This helps detect business workload fluctuations based on the cluster load metrics that you configure and improve the running efficiency of jobs.
Your business meets the characteristics of both time-based scaling and load-based scaling.We recommend that you use time-based scaling together with load-based scaling.

Step 2: Configure auto scaling rules

Note If multiple auto scaling rules are configured and the specified conditions are met at the same time, the system triggers and executes the rules based on the following rules:
  • Scale-out rules take precedence over scale-in rules.
  • The time-based scaling rules and load-based scaling rules are executed based on the trigger sequence.
  • The load-based scaling rules are triggered based on the time when the cluster load metrics are triggered.
  • The load-based scaling rules that are configured with the same cluster load metrics are triggered based on the order in which the rules are configured.

Time-based scaling

You can configure a time-based scale-out rule that is repeatedly executed or executed only once based on the point in time when your business volume is likely to increase. You can also configure a scale-in rule to reduce the number of nodes during off-peak hours. If the time-based scaling rule that you configured is repeatedly executed, you can configure the Rule Expiration Time parameter to specify the expiration time of the rule. After the time-based scaling rule expires, no scaling activity is triggered.

For example, your business workloads increase at 22:00 and decrease at 04:00 every day. In this case, you can configure a time-based scale-out rule that is repeatedly executed at 22:00 every day and a time-based scale-in rule that is repeatedly executed at 04:00 every day.

For more information about the parameters and cluster load metrics, see Configure custom auto scaling rules.

Load-based scaling

By default, common cluster load metrics are automatically specified when you configure a load-based scaling rule. You must specify thresholds for the cluster load metrics based on the changes in the metrics. After the configurations are complete, click OK. In the Configure Auto Scaling panel, click Save and Apply. If your business workloads fluctuate, the configured load-based scaling rules are triggered. Getting started
You can perform the following steps to configure a load-based scaling rule based on your business requirements.
  1. Select cluster load metrics.
    On the Metric Monitoring subtab of the Monitoring tab, select YARN-HOME from the Dashboard drop-down list. Observe the changes in cluster load metrics based on your business workloads in different time periods and select appropriate metrics. View cluster load metrics

    The values of the metrics must be inversely related to the capacity change. After a scaling activity occurs, the values of the metrics decrease when the number of instances changes.

    For example, you configure a load-based scale-out rule, and the rule is triggered when the average value of the yarn_resourcemanager_queue_AppsPending metric is greater than or equal to 1 within 60 seconds. If the preceding condition is met, one node is added. After a scale-out activity is triggered, the number of pending tasks can be reduced.

    We recommend that you use the cluster load metrics that are listed in the following table.
    EMR cluster load metricServiceDescription
    yarn_resourcemanager_queue_AvailableMBPercentageYARNThe percentage of the available memory to the total memory in a root queue.
    yarn_resourcemanager_queue_AvailableVCoresYARNThe number of available vCPUs in a root queue.
    yarn_resourcemanager_queue_AvailableMBYARNThe amount of available memory in a root queue. Unit: MB.
    yarn_resourcemanager_queue_AppsPendingYARNThe number of pending tasks in a root queue.
    yarn_resourcemanager_queue_PendingContainersYARNThe number of to-be-allocated containers in a root queue.
    yarn_resourcemanager_queue_AvailableVCoresPercentageYARNThe percentage of available vCPUs in a root queue.
  2. Configure a load-based scaling rule.
    • The first time you configure a load-based scaling rule, you can select pending-related metrics for scale-out rules and available-related metrics for scale-in rules.
    • You can configure multiple load metric-based trigger conditions for a load-based scaling rule and specify the logical operator AND or OR between the conditions. This allows you to manage the metric-based trigger conditions in a fine-grained manner.
    • To prevent resource waste that is caused by frequent scaling activities, you can specify the cooldown time for load-based scaling rules. During the cooldown time, load-based scaling rules are not triggered even if the conditions are met.

      The average time period for adding nodes requires 1.55 minutes, and the average time period for adding 100 nodes requires only 1.83 minutes. You can set the cooldown time for a scale-out rule to a value that ranges from 100 to 300 (Unit: seconds). This way, after the new nodes are used, you can check whether the values of the configured cluster load metrics decrease and determine whether another scale-out activity is required. This helps prevent resource waste.

    • To respond to changes in cluster load metrics more quickly, we recommend that you set the Statistical Period parameter to 1 minute. If the value of the parameter is excessively large, scaling activities may be triggered by expired data. This results in unnecessary resource waste.
    • You can specify the number of nodes that you want to add or remove based on the capability of existing nodes to process jobs and expected business growth. You can also estimate the number of nodes that are required to decrease the values of related cluster load metrics.
    • You can specify the effective time period in which load-based scaling rules take effect within one day. You can configure different load-based scaling rules in different time periods.
  3. Specify the maximum and minimum number of nodes in a node group.

    The Limits on Node Quantity of Current Node Group parameter specifies the limits on the number of nodes in the current node group. The Maximum Number of Instances parameter specifies the upper limit for the number of nodes in the current node group. This prevents your node group from being indefinitely scaled out. The Minimum Number of Instances parameter specifies the lower limit for the number of nodes that are required to process your business. If your instances are released due to unexpected factors, the system adds instances to meet the minimum number of instances.

  4. Modify a load-based scaling rule.

    After you configure a load-based scaling rule, you can modify the parameters of the rule based on the metrics and scaling activity records in a specific period of time.

    • If scaling activities are frequently triggered and the added instances enter the idle state and are frequently removed, you can use the AND logical operator in the rule to add the load metric-based trigger conditions to reduce the frequency of scaling activities. You can also increase the value of the Cooldown Time parameter.
    • If your jobs require multiple scale-out activities or the scale-out speed does not meet your requirements of processing jobs, you can increase the number of instances that are added for each scale-out activity.