All Products
Search
Document Center

E-MapReduce:Add auto scaling rules

Last Updated:Sep 25, 2023

If your business workloads fluctuate, we recommend that you enable auto scaling for your E-MapReduce (EMR) cluster and configure auto scaling rules to increase or decrease task nodes based on your business requirements. Auto scaling helps you reduce costs and ensures sufficient computing resources for your jobs. This topic describes how to configure auto scaling in the EMR console.

Prerequisites

  • A DataLake cluster, a Dataflow cluster, an online analytical processing (OLAP) cluster, a DataServing cluster, or a custom cluster is created. For more information, see Create a cluster.

  • A task node group that contains pay-as-you-go instances or preemptible instances is created in the cluster. For more information, see Create a node group.

Limits

  • To prevent auto scaling failures due to insufficient Elastic Compute Service (ECS) instances, you can select up to 10 ECS instance types. By default, the instance type that is first selected is preferentially used when your cluster is scaled out. Nodes are created based on the order of the instance types in the list. If an instance type is unavailable, the next instance type is used. The actual instance types used to create nodes are subject to inventory availability.

  • Only clusters in which the YARN service is deployed support load-based scaling rules.

Precautions

  • You can add auto scaling rules to node groups that meet the requirements. When the auto scaling rules are triggered, the node groups are automatically scaled in or out based on the auto scaling configurations. If no auto scaling rule is configured, no auto scaling activity is triggered.

  • The system automatically searches for the instance types that match the vCPU and memory specifications you specified and displays the instance types in the Instance Type section. You must select one or more instance types in the Instance Type section. This way, the cluster can be scaled based on the selected instance types.

Procedure

  1. Go to the Auto Scaling tab.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click the name of the cluster in the Cluster ID/Name column.

    4. On the page that appears, click the Auto Scaling tab.

  2. Configure auto scaling rules.

    1. On the Configure Auto Scaling tab, find the desired node group and click Edit in the Actions column.

    2. In the Configure Auto Scaling panel, configure the following parameters:

      • Limits on Node Quantity of Current Node Group: specifies the range of the number of nodes in the current node group. This parameter defines the upper and lower limits for the number of nodes in the node group when the node group is automatically scaled in or scaled out based on the auto scaling rules.

        • Maximum Number of Instances: specifies the upper limit for the number of nodes in the current node group. The node group can no longer be scaled out when the number of nodes in the node group reaches the upper limit.

        • Minimum Number of Instances: specifies the lower limit for the number of nodes in the current node group. The node group can no longer be scaled in when the number of nodes in the node group reaches the lower limit.

        To change the maximum and minimum numbers of nodes in a node group, click Modify Limit.

      • Trigger Rule: displays the auto scaling rules that are configured for the current node group. You can also add more auto scaling rules for the node group.

        You can select a trigger mode and auto scaling rules.

        • Time-based scaling

          If the computing workloads of a cluster fluctuate on a regular basis, you can add or remove a specific number of task nodes at fixed points in time every day, every week, or every month to supplement or save computing resources. This ensures that jobs can be complete at low costs. Auto scaling rules are divided into scale-out rules and scale-in rules. The following table describes the parameters for a scale-out rule.

          Parameter

          Description

          Scale Out Type

          The trigger mode in which the scale-out rule is triggered. Set this parameter to Scale Out by Time.

          Rule Name

          The name of the scale-out rule. The names of auto scaling rules must be unique in a cluster.

          Frequency

          • Execute Repeatedly: Auto scaling is performed at a specific point in time every day, every week, or every month.

          • Execute Only Once: Auto scaling is performed only once at a specific point in time.

          Execution Time

          The specific point in time at which the scale-out rule is executed.

          Rule Expiration Time

          The expiration time of the scale-out rule. After the scale-out rule expires, no scale-out activity is triggered.

          Retry Time Range

          The retry interval. Auto scaling may fail at the specified point in time due to various reasons. If the retry interval is specified, the system retries auto scaling every 30 seconds during the retry interval specified by this parameter until auto scaling is performed. Valid values: 0 to 3600. Unit: seconds.

          For example, auto scaling operation A needs to be performed in a specified time period. If auto scaling operation B is still in progress or is in the cooldown state during this period, operation A cannot be performed. In this case, the system tries to perform operation A every 30 seconds within the retry interval that you specified. If required conditions are met, the cluster immediately performs auto scaling.

          Nodes for Each Scale Out

          The number of nodes that are added to the node group each time the scale-out rule is triggered.

        • Load-based scaling

          Note

          Only clusters in which the YARN service is deployed support load-based scaling rules.

          You can add load-based scaling rules for task node groups in a cluster for which you cannot accurately estimate the fluctuation of big data computing workloads. Auto scaling rules are divided into scale-out rules and scale-in rules. The following table describes the parameters for a scale-out rule.

          Parameter

          Description

          Scale Out Type

          The trigger mode in which the scale-out rule is triggered. Set this parameter to Scale Out by Load.

          Rule Name

          The name of the scale-out rule. The names of auto scaling rules must be unique in a cluster.

          Load Metric-based Trigger Conditions

          The conditions that must be met to trigger the scale-out rule. You must select at least one system-defined load metric. To select multiple system-defined load metrics, click Add Metric. You must configure the following parameters:

          • Load metrics: Select system-defined YARN load metrics. For information about the mappings between EMR auto scaling metrics and YARN load metrics, see Description of EMR auto scaling metrics that match YARN load metrics.

            Note

            System-defined load metrics vary based on the cluster type. You can view the supported system-defined load metrics in the EMR console.

          • Statistical method: the rule that is used to specify the threshold for each load metric. The scale-out rule is triggered once if the value of the specified aggregation dimension (average, maximum, or minimum) for each selected load metric reaches the threshold within a statistical period.

          Multi-metric Relationship

          The relationship among the metrics that are used to trigger the scale-out rule. Valid values: All Metrics Meet the Conditions and Any Metric Meets the Condition.

          Statistical Period

          The statistical period for measuring whether the value of the specified aggregation dimension (average, maximum, or minimum) for each selected load metric reaches the threshold. Data is collected, summarized, and compared based on the specified statistical period. The shorter the statistical period is, the more frequently the scale-out rule is likely to be triggered. Specify the statistical period based on your business requirements.

          Repetitions That Trigger Scale Out

          The threshold for the number of times that the conditions to trigger the scale-out rule are met. When the number reaches the value of this parameter, the scale-out rule is triggered.

          Nodes for Each Scale Out

          The number of nodes that are added to the node group each time the scale-out rule is triggered.

          Cooldown Time

          The interval between two scale-out activities. During the cooldown time, the scale-out rule is not triggered even if the scale-out conditions are met. A scale-out activity is performed after the cooldown time ends and the scale-out conditions are met again.

          After the node group is scaled out and reaches the expected state, the cooldown time keeps the cluster load metric that triggers subsequent scale-out activities in a stable state.

          Effective Time Period

          The time range in which the scale-out rule takes effect. This parameter is optional. By default, the scale-out rule is available 24 hours a day. A scale-out activity is triggered only within the time range specified by this parameter.

    3. Click Save and Apply.

      Scale-out activities are triggered for the node group when the scale-out conditions are met.

Description of EMR auto scaling metrics that match YARN load metrics

EMR auto scaling metric

Service

Description

yarn_resourcemanager_queue_AvailableVCores

YARN

The number of available vCPUs in a root queue.

yarn_resourcemanager_queue_PendingVCores

YARN

The number of to-be-allocated vCPUs in a root queue.

yarn_resourcemanager_queue_AllocatedVCores

YARN

The number of allocated vCPUs in a root queue.

yarn_resourcemanager_queue_ReservedVCores

YARN

The number of reserved vCPUs in a root queue.

yarn_resourcemanager_queue_AvailableMB

YARN

The amount of available memory in a root queue.

yarn_resourcemanager_queue_PendingMB

YARN

The amount of to-be-allocated memory in a root queue.

yarn_resourcemanager_queue_AllocatedMB

YARN

The amount of allocated memory in a root queue.

yarn_resourcemanager_queue_ReservedMB

YARN

The amount of reserved memory in a root queue.

yarn_resourcemanager_queue_AppsRunning

YARN

The number of running tasks in a root queue.

yarn_resourcemanager_queue_AppsPending

YARN

The number of pending tasks in a root queue.

yarn_resourcemanager_queue_AppsKilled

YARN

The number of terminated tasks in a root queue.

yarn_resourcemanager_queue_AppsFailed

YARN

The number of failed tasks in a root queue.

yarn_resourcemanager_queue_AppsCompleted

YARN

The number of completed tasks in a root queue.

yarn_resourcemanager_queue_AppsSubmitted

YARN

The number of submitted tasks in a root queue.

yarn_resourcemanager_queue_AllocatedContainers

YARN

The number of allocated containers in a root queue.

yarn_resourcemanager_queue_PendingContainers

YARN

The number of to-be-allocated containers in a root queue.

yarn_resourcemanager_queue_ReservedContainers

YARN

The number of reserved containers in a root queue.

yarn_resourcemanager_queue_AvailableMBPercentage

YARN

The percentage of the available memory in the total memory in a root queue. The value is calculated by using the following formula: MemoryAvailablePrecentage = AvailableMemory/Total Memory.

Note

This metric is available for EMR clusters of V3.43, V5.9, and minor versions later than V3.43 or V5.9.

yarn_resourcemanager_queue_PendingContainersRatio

YARN

The ratio of to-be-allocated containers to allocated containers in a root queue. The value is calculated by using the following formula: ContainerPendingRatio = PendingContainers/AllocatedContainers.

Note

This metric is available for EMR clusters of V3.43, V5.9, and minor versions later than V3.43 or V5.9.

yarn_resourcemanager_queue_AvailableVCoresPercentage

YARN

The percentage of available vCPUs in a root queue. The value is calculated by using the following formula: AvailableVCoresPercentage = AvailableVCores/(ReservedVCores + AvailableVCores + AllocatedVCores) × 100.

Note

This metric is available for EMR clusters of V3.43, V5.9, and minor versions later than V3.43 or V5.9.

yarn_cluster_numContainersByPartition

YARN

The number of containers for a specified partition. The partition_name parameter specifies the partition name.

Note

This metric is available for EMR clusters of V3.44, V5.10, and minor versions later than V3.44 or V5.10.

yarn_cluster_usedMemoryMBByPartition

YARN

The amount of used memory for a specified partition. The partition_name parameter specifies the partition name. If you leave the partition_name parameter empty, the default partition is used.

Note

This metric is available for EMR clusters of V3.44, V5.10, and minor versions later than V3.44 or V5.10.

yarn_cluster_availMemoryMBByPartition

YARN

The amount of available memory for a specified partition. The partition_name parameter specifies the partition name. If you leave the partition_name parameter empty, the default partition is used.

Note

This metric is available for EMR clusters of V3.44, V5.10, and minor versions later than V3.44 or V5.10.

yarn_cluster_usedVirtualCoresByPartition

YARN

The number of used vCPUs for a specified partition. The partition_name parameter specifies the partition name. If you leave the partition_name parameter empty, the default partition is used.

Note

This metric is available for EMR clusters of V3.44, V5.10, and minor versions later than V3.44 or V5.10.

yarn_cluster_availableVirtualCoresByPartition

YARN

The number of available vCPUs for a specified partition. The partition_name parameter specifies the partition name. If you leave the partition_name parameter empty, the default partition is used.

Note

This metric is available for EMR clusters of V3.44, V5.10, and minor versions later than V3.44 or V5.10.