If your business workloads fluctuate, we recommend that you enable auto scaling and configure scaling rules. This allows E-MapReduce (EMR) to add or remove task nodes in response to workload fluctuations, ensuring jobs are completed while saving costs. This topic describes how to configure auto scaling in the EMR on ECS console.
Prerequisites
A Hadoop cluster is created. For more information, see Create a cluster.
Usage notes
You can specify the hardware specifications, or instance types, for the nodes to be scaled. You can configure the instance types only when auto scaling is disabled. If you need to modify the instance types, disable auto scaling, modify the specifications, and then re-enable auto scaling.
The system automatically matches and lists instance types that meet the vCPU and memory specifications you select. You must select the desired instance types from this list to allow the cluster to scale using those specifications.
To prevent auto scaling failures due to insufficient ECS inventory, you can select up to three ECS instance types.
Whether you select an ultra disk or a standard SSD, the minimum data disk size is 40 GB.
Load-based scaling is a dynamic management feature for auto scaling groups that relies on CloudMonitor. After you successfully configure a scaling rule, the system automatically creates a corresponding alarm rule in CloudMonitor. To ensure that EMR auto scaling activities function correctly, do not modify, delete, or disable these system-generated alarm rules.
Procedure
Go to the auto scaling page.
Log on to the EMR on ECS console.
In the top navigation bar, select a region and a resource group as needed.
On the EMR on ECS page, click the cluster ID of your target cluster.
Click the Auto Scaling tab.
Create an auto scaling group.
On the Configure Scaling tab, click Create Auto Scaling Group.
NoteYou can manage and configure auto scaling groups only on the Auto Scaling page. You cannot manage them from the node management page.
In the Add Auto Scaling Group dialog box, enter a Node group name and click OK.
On the Configure Scaling tab, find the target node group and click Configure Rule in the Actions column.
In the Configure Auto Scaling panel, configure the parameters in the Basic Information section.
Parameter
Description
Maximum Number of Instances
The maximum number of task nodes in the auto scaling group. The group will not scale out beyond this limit, even if a scaling rule is triggered. The maximum value is 1,000.
Minimum Number of Instances
The minimum number of task nodes in the auto scaling group. The group will not scale below this number.
For example, if a rule to add one node is triggered when the current count is zero, but the minimum is set to 3, the system adds three nodes to meet the minimum requirement.
Graceful Shutdown
You can set a timeout to decommission a task node that is running YARN jobs. A node is decommissioned if it has no running jobs, or if a running job exceeds the timeout. The maximum timeout is 3,600 seconds.
ImportantWhen you enable graceful shutdown, first change the value of the yarn.resourcemanager.nodes.exclude-path YARN parameter to /etc/ecm/hadoop-conf/yarn-exclude.xml.
After you change the timeout, restart YARN ResourceManager during off-peak hours for the change to take effect.
In the instance section of the Configure Auto Scaling panel, select an instance mode, billing method, and instance types.
Single Billing Method
The system automatically finds instance types that match your vCPU and memory specifications and displays them in the Instance Type section. You must select one or more instance types to allow the cluster to scale.
Pay-as-you-go
The order in which you select instance types determines their provisioning priority. The hourly price of each instance, which includes both the EMR and ECS instance prices, is displayed.
Preemptible Instance
ImportantIf your jobs have high Service Level Agreement (SLA) requirements, use this instance type with caution. Spot instances can be reclaimed for reasons such as bid failures, which may interrupt your jobs.
The order in which you select instance types determines their provisioning priority. The hourly pay-as-you-go price is displayed for each instance type. You can also set a maximum hourly price (your bid) for each instance type. An instance is launched only if its current market price is at or below your bid. For more information, see What is a spot instance?.
Cost Optimization Mode
In this mode, you can create a detailed cost control policy to balance cost and stability.
Parameter
Description
Minimum Pay-As-You-Go Nodes in Auto Scaling Group
The minimum number of pay-as-you-go instances required in the auto scaling group. If the current number of pay-as-you-go instances is below this value, new instances are provisioned as pay-as-you-go instances first.
Percentage of Pay-As-You-Go Nodes
After the minimum number of pay-as-you-go instances is met, this percentage determines the proportion of new instances to provision as pay-as-you-go.
Lowest-Cost Instance Types
The number of lowest-cost instance types to use. When creating spot instances, the system distributes them evenly across the specified number of instance types. The maximum value is 3.
Replace Preemptible Instances
Specifies whether to enable the compensation mechanism for spot instances. If enabled, the system proactively replaces a spot instance approximately 5 minutes before it is reclaimed.
If you do not specify the Minimum Pay-As-You-Go Nodes in Auto Scaling Group, Percentage of Pay-As-You-Go Nodes, and Lowest-Cost Instance Types parameters, you create a standard cost-optimized scaling group. Otherwise, you create a cost-optimized scaling group with a mixed-instance policy. The two types are fully compatible in terms of API operations and features.
For a cost-optimized scaling group with a mixed-instance policy, you can configure the policy to replicate the behavior of a standard cost-optimized group. For example:
To create only pay-as-you-go instances:
Set Minimum Pay-As-You-Go Nodes in Auto Scaling Group to 0, Percentage of Pay-As-You-Go Nodes to 100, and Lowest-Cost Instance Types to 1.
To prioritize creating spot instances:
Set Minimum Pay-As-You-Go Nodes in Auto Scaling Group to 0, Percentage of Pay-As-You-Go Nodes to 0, and Lowest-Cost Instance Types to 1.
In the Configure Auto Scaling section of the Trigger Mode panel, select a trigger mode and configure its rules.
Time-based Scaling: If your Hadoop cluster workloads have predictable peaks and troughs, you can schedule scaling activities to add a specific number of task nodes at fixed times. This ensures job completion while saving costs.
Scaling rules include scale-out rules and scale-in rules. This topic uses a scale-out rule as an example. If you disable auto scaling for the cluster, all rules are deleted. You must reconfigure them if you re-enable auto scaling.
Parameter
Description
Rule Name
The names of scaling rules, including scale-out and scale-in rules, must be unique within a cluster.
Execution Rule
Execute Repeatedly: You can schedule a scaling activity to run at a specific time daily, weekly, or monthly.
Execute Only Once: The scaling activity runs only once at the specified time.
Execution Time
The time when the rule is executed.
Rule Expiration Time
The expiration date and time of the rule.
Retry timeout (seconds)
A scheduled scaling activity may fail to execute at the specified time for various reasons. If you set a retry timeout, the system retries the activity every 30 seconds during this period until it succeeds. The value ranges from 0 to 21,600 seconds.
For example, suppose scaling activity A is scheduled to run, but another scaling activity B is already in progress or in its cooldown period. In this case, activity A cannot be executed. The system retries activity A every 30 seconds during the specified retry period. Once the conditions are met, the cluster immediately performs the scaling activity.
Number of Adjusted Instances
The number of task nodes to add each time the rule is triggered.
Cooldown Time (s)
The interval after a scaling activity is completed during which no other scaling activities can be triggered.
Load-based Scaling: If you cannot accurately predict the peaks and troughs of your big data workloads, you can use a load-based scaling policy.
Scaling rules include scale-out rules and scale-in rules. This topic uses a scale-out rule as an example. If you disable auto scaling, all rules are deleted. You must reconfigure them if you re-enable auto scaling. When you switch scaling policies, such as from load-based to time-based scaling, the rules of the previous policy become inactive and are not triggered. However, any nodes that were added based on those rules are retained and not released.
Parameter
Description
Rule Name
The names of scaling rules, including scale-out and scale-in rules, must be unique within a cluster.
Cluster Load Metrics
The metrics are obtained from YARN. For more information, see the official Hadoop documentation.
For the mapping between E-MapReduce auto scaling metrics and YARN services, see Mapping between E-MapReduce auto scaling metrics and YARN services.
Statistical Period
The system checks if the selected cluster load metric, aggregated by using the specified statistical rule (for example, Average, Maximum, or Minimum) over a statistics period, meets the threshold. Each time it meets the threshold counts as one trigger.
Statistical Rule
Condition Repetition Threshold
The number of consecutive times the metric must meet the threshold before a scaling activity is triggered.
Adjustment value
The number of task nodes to add each time the rule is triggered.
Cooldown Time (s)
The period after a scaling activity completes during which other scaling activities cannot be triggered. The system ignores any triggers that occur during this time and waits for the next valid trigger after the cooldown ends.
Click Save.
You can enable auto scaling as needed. For more information, see Enable or disable auto scaling (Hadoop clusters only).
Mapping EMR auto scaling metrics to YARN
Auto scaling metric | Service | Description |
YARN.AvailableVCores | YARN | The number of available virtual cores. |
YARN.PendingVCores | YARN | The number of virtual cores pending allocation. |
YARN.AllocatedVCores | YARN | The number of allocated virtual cores. |
YARN.ReservedVCores | YARN | The number of reserved virtual cores. |
YARN.AvailableMemory | YARN | The amount of available memory, in MB. |
YARN.PendingMemory | YARN | The amount of memory pending allocation, in MB. |
YARN.AllocatedMemory | YARN | The amount of allocated memory, in MB. |
YARN.ReservedMemory | YARN | The amount of reserved memory, in MB. |
YARN.AppsRunning | YARN | The number of running applications. |
YARN.AppsPending | YARN | The number of pending applications. |
YARN.AppsKilled | YARN | The number of killed applications. |
YARN.AppsFailed | YARN | The number of failed applications. |
YARN.AppsCompleted | YARN | The number of completed applications. |
YARN.AppsSubmitted | YARN | The number of submitted applications. |
YARN.AllocatedContainers | YARN | The number of allocated containers. |
YARN.PendingContainers | YARN | The number of containers pending allocation. |
YARN.ReservedContainers | YARN | The number of reserved containers. |
YARN.MemoryAvailablePercentage | YARN | The percentage of available memory |
YARN.ContainerPendingRatio | YARN | The ratio of pending containers to allocated containers |