Auto scaling can automatically adjust the number of nodes in your E-MapReduce (EMR) cluster based on your business requirements and scaling policies. This way, the computing capabilities of your EMR cluster are automatically adjusted. You can add managed or custom auto scaling rules for a node group. When the volume of your business data increases, more nodes are added to the node group to ensure the computing capabilities. When the volume of your business data decreases, certain number of nodes are removed from the node group to reduce costs.
Comparison between managed auto scaling and custom auto scaling
Item | Managed auto scaling | Custom auto scaling |
Auto scaling rule | Auto scaling is performed for EMR clusters based on continuous evaluation of resources in the clusters. You do not need to manually configure rules. | You must configure custom auto scaling rules, such as time-based or load-based scaling rules. |
Supported EMR versions | EMR V3.43.0, EMR V5.9.0, and minor versions later than EMR V3.43.0 and EMR V5.9.0. | EMR V3.42.0, EMR V5.8.0, and minor versions later than EMR V3.42.0 and EMR V5.8.0. |
Scaling granularity | Cluster level and intelligent node group selection. | Node group level. |
Metric collection frequency | 5 secsonds. | 30 secsonds. |
Monitoring frequency | 5 to 10 secsonds. | 30 seconds. |
Based on custom metrics | No. | Yes. |
Scenarios
You can use auto scaling in the following scenarios to reduce costs and improve task execution efficiency:
The curve of your business computing load has noticeable peaks and troughs. In this case, you can add managed auto scaling rules or load-based scaling rules.
You need to add nodes at the scheduled time to temporarily supplement the computing capabilities. In this case, you can add time-based scaling rules.
Limits
Cluster type | Limit |
DataLake, Dataflow, online analytical processing (OLAP), and custom clusters | Only task node groups that contain preemptible instances or pay-as-you-go instances support auto scaling. |
Hadoop clusters |
|
Usage notes
Cluster type | References |
DataLake, Dataflow, OLAP, and custom clusters | |
Hadoop clusters |