Select appropriate Elastic Compute Service (ECS) instance types for your cluster nodes to ensure cluster stability and reliability. This topic describes the recommended ECS instance specifications for creating an Alibaba Cloud Container Service for Kubernetes (ACK) cluster.
Instance types not supported by ACK
Before selecting instance types, check whether they are compatible with ACK. The following instance types cannot be used as worker or master nodes.
General restrictions
| Unsupported instance family | Example | Reason |
|---|---|---|
| t5, burstable | ecs.t5-lc2m1.nano | Unstable performance may cause cluster instability |
| t6, burstable | ecs.t6-c4m1.large | Unstable performance may cause cluster instability |
| Instance types with fewer than 4 vCPU cores | ecs.g6.large | Specs too low for stable cluster operation |
| c6t, security-enhanced compute-optimized | ecs.c6t.large | Not supported |
| g6t, security-enhanced general-purpose | ecs.g6t.large | Not supported |
| Super Computing Cluster (SCC) | ecs.sccg7.32xlarge | Not supported |
To use low-specification ECS instance types for clusters and node pools, submit a request in Quota Center.
For GPU-accelerated workloads, see GPU-accelerated instance families supported by ACK.
Terway network plugin restrictions
If you use the Terway network plugin, the maximum number of pods per node depends on the number of Elastic Network Interfaces (ENIs) the instance type supports. Instance types that cannot meet the minimum pod count threshold cannot be used.
-
Shared ENI mode or Shared ENI + Trunk ENI mode: The pod limit per node must exceed 11. Formula:
(Number of ENIs - 1) × Number of private IPs per ENI > 11Example: ecs.g6.large supports 2 ENIs with 6 private IPv4 addresses each. Pod limit =(2 - 1) × 6 = 6. This instance type cannot be used. -
Exclusive ENI mode: The pod limit per node must exceed 6. Formula:
Number of ENIs - 1 > 6Example: ecs.g6.xlarge supports 3 ENIs. Pod limit =3 - 1 = 2. This instance type cannot be used.
For a complete list of compatible instance types by Terway mode, see Use the Terway network plugin.
Why fewer, larger instances perform better
Using many small ECS instances creates compounding problems at scale. The system reserves CPU, memory, and disk resources on every node for cluster management components. On small instances, this reservation consumes a significant share of total capacity, leaving less room for workloads.
Resource fragmentation compounds this problem. After a container is allocated CPU and memory, the remaining resources on a small instance may be too small to run another container. Those resources sit idle but cannot be reclaimed.
Large instances address both problems:
-
Better network efficiency: More containers communicate within a single instance, reducing cross-node traffic. Large instances also provide higher network bandwidth for bandwidth-intensive applications.
-
Faster image pulls: On a large instance, a container image is pulled once and shared across all containers on that node. With many small instances, the same image must be pulled once per instance, slowing scale-out.
Select worker node specifications
Minimum specifications
Use instance types with at least 4 CPU cores and 8 GB of memory.
Sizing for fault tolerance
Calculate the total CPU cores required for your daily workload, then size nodes to absorb instance failures without service disruption.
Example:
| Fault tolerance target | Node count | Node size | Max operating load |
|---|---|---|---|
| 10% (one node can fail) | at least 10 | 16 CPU cores | 144 cores (160 × 90%) |
| 20% (one node can fail) | at least 5 | 32 CPU cores | 128 cores (160 × 80%) |
If one instance fails in either configuration, the remaining instances continue handling the peak load.
CPU-to-memory ratio
Select the ratio that matches your workload type:
-
1:2 or 1:4: General-purpose workloads
-
1:8: Memory-intensive applications such as Java services
GPU workloads
To maintain service stability and ensure accurate resource scheduling, do not mix GPU-accelerated and non-GPU instance types in the same node pool.
Persistent memory-optimized instances
Instance families such as re6p use a hybrid memory architecture combining regular memory and persistent memory. To enable persistent storage on these nodes, see Non-volatile memory volumes. For more information, see Instance families.
Large-scale clusters: ECS Bare Metal Instances
At approximately 1,000 CPU cores of daily scale, use ECS Bare Metal Instances. A single ECS Bare Metal Instance provides at least 96 CPU cores, so a 1,000-core cluster requires only 10–11 instances. For more information, see ECS Bare Metal Instances.
Select master node specifications
Master nodes run etcd, kube-apiserver, and kube-controller. For production ACK dedicated clusters, master node specifications must match cluster scale — larger clusters require higher-spec master nodes.
Cluster size in this topic is measured by the number of nodes. In practice, cluster size can also be measured by pod count, deployment frequency, or request volume.
Use small instances for personal testing and learning only. For production clusters, select master node specifications based on the following table.
| Number of nodes | Recommended master node specifications |
|---|---|
| 1–5 | 4 CPU cores, 8 GB memory (2 cores/4 GB or lower not recommended) |
| 6–20 | 4 CPU cores, 16 GB memory |
| 21–100 | 8 CPU cores, 32 GB memory |
| 100–200 | 16 CPU cores, 64 GB memory |
| 200–500 (assess blast radius risk) | 64 CPU cores, 128 GB memory |
ECS Bare Metal Instances
ECS Bare Metal Instances are built on Alibaba Cloud's virtualization 2.0 technology. They combine the elasticity of virtual machines with the performance and features of physical servers, and support nested virtualization.
ECS Bare Metal Instances are suited for dedicated compute resources, encrypted computing, and hybrid cloud deployments. For an overview and supported instance families, see Overview of ECS Bare Metal Instances.
When to use ECS Bare Metal Instances:
-
Large cluster scale: At approximately 1,000 CPU cores of daily scale, each ECS Bare Metal Instance contributes at least 96 CPU cores, so you can build the cluster with only 10–11 instances.
-
Traffic spikes requiring rapid scale-out: ECS Bare Metal Instances deliver better performance than equivalently-spec'd physical servers and can provide millions of vCPUs to handle sudden traffic increases — for example, during e-commerce sales promotions.