Best practices for spot instance node pools - Container Service for Kubernetes

Limitations

Before setting up a spot instance node pool, confirm that you meet the following requirements:

Your cluster must be version 1.9 or later. For upgrade instructions, see Manually upgrade a cluster.
The ECS instances in your Virtual Private Cloud (VPC) must have an elastic IP address (EIP) bound to them, or the VPC must have a NAT Gateway configured. This is required for nodes to access the public internet.
Your account must have sufficient node quota. For quota information and how to request an increase, see Quotas and limits.

How spot instances work

A spot instance is a type of pay-as-you-go ECS instance with a market-driven price that fluctuates based on supply and demand. The system creates a spot instance only when two conditions are both true: the real-time market price for the instance type is below your bid, and enough inventory is available.

Once created, a spot instance behaves like a standard pay-as-you-go instance. It has a one-hour protection period during which it cannot be reclaimed. After that, the system checks the market price and inventory every five minutes. If the price exceeds your bid or inventory drops too low, the instance is released.

A spot instance node pool mixes spot and pay-as-you-go instances. This hybrid composition lets you maximize savings while maintaining a stable baseline.

Use cases

Spot instance node pools work best for stateless, fault-tolerant workloads that can tolerate brief resource unavailability:

Batch processing jobs
Machine learning training
Big data ETL pipelines (such as Apache Spark)
Queue processing applications
Stateless API services

Avoid using spot instance node pools for workloads that cannot tolerate interruption:

Cluster management tools, such as monitoring and operations systems
Stateful workloads, such as databases

Spot instance node pools with auto scaling are a good fit when your workload has distinct peak and off-peak periods. When auto scaling is enabled, the node auto-scaler monitors the pool, scales out to schedule pending pods, and scales in when nodes are underutilized. This rapid scaling helps offset the impact of spot instance reclamation and improves overall cost efficiency.

Select a spot instance mix

Choosing the right instance mix is the most important step in a spot instance strategy. A good mix balances inventory availability, cost, and performance. The goal is to spread demand across multiple instance types so that if one type becomes unavailable or expensive, others can absorb the workload. There is no one-size-fits-all solution—select the configuration that best fits your business needs.

Use one of the following methods to identify good candidates.

Use console recommendations

When you are creating or editing a node pool in the ACK console, the console displays instance types currently in stock for your selected region. Filter by your resource requirements, then review the elasticity strength score and price range the console calculates. Add more instance types to improve elasticity, and set a maximum price for your instances.

For instructions on creating or editing a node pool, see Create and manage node pools.

Use the spot-instance-advisor CLI

When you are unsure which instance types offer the best price stability, use spot-instance-advisor—an open-source CLI tool provided by ACK. It retrieves historical price data and current pricing for a region via API, then ranks instance types by cost per core-hour. The ratio column measures price volatility: a lower value means more stable pricing.

Download spot-instance-advisor from the spot-instance-advisor repository.

The tool supports the following parameters:

Usage of ./spot-instance-advisor:
  -accessKeyId string
        Your accessKeyId of cloud account
  -accessKeySecret string
        Your accessKeySecret of cloud account
  -cutoff int
        Discount of the spot instance prices (default 2)
  -family string
        The spot instance family you want (e.g. ecs.n1,ecs.n2)
  -limit int
        Limit of the spot instances (default 20)
  -maxcpu int
        Max cores of spot instances  (default 32)
  -maxmem int
        Max memory of spot instances (default 64)
  -mincpu int
        Min cores of spot instances (default 1)
  -minmem int
        Min memory of spot instances (default 2)
  -region string
        The region of spot instances (default "cn-hangzhou")
  -resolution int
        The window of price history analysis (default 7)

Run the following command to get ranked instance types for your region. The accessKeyId, accessKeySecret, and region parameters are required.

./spot-instance-advisor --accessKeyId=<your-access-key-id> --accessKeySecret=<your-access-key-secret> --region=<region-id>

Sample output:

Initialize cache ready with 619 kinds of instanceTypes
Filter 93 of 98 kinds of instanceTypes.
Fetch 93 kinds of instanceTypes prices successfully.
Successfully compare 199 kinds of instanceTypes
      instanceTypeId               ZoneId     Price(Core)        Discount           ratio
        ecs.c6.large     cn-zhangjiakou-c          0.0135             1.0             0.0
        ecs.c6.large     cn-zhangjiakou-a          0.0135             1.0             0.0
      ecs.c6.2xlarge     cn-zhangjiakou-a          0.0136             1.0             0.0
      ecs.c6.2xlarge     cn-zhangjiakou-c          0.0136             1.0             0.0
      ecs.c6.3xlarge     cn-zhangjiakou-a          0.0137             1.0             0.0
      ecs.c6.3xlarge     cn-zhangjiakou-c          0.0137             1.0             0.0
       ecs.c6.xlarge     cn-zhangjiakou-c          0.0138             1.0             0.0
       ecs.c6.xlarge     cn-zhangjiakou-a          0.0138             1.0             0.0
     ecs.hfc6.xlarge     cn-zhangjiakou-a          0.0158             1.0             0.0
      ecs.hfc6.large     cn-zhangjiakou-a          0.0160             1.0             0.0
      ecs.hfc6.large     cn-zhangjiakou-c          0.0160             1.0             0.0
      ecs.g6.3xlarge     cn-zhangjiakou-a          0.0175             1.0             0.0
      ecs.g6.3xlarge     cn-zhangjiakou-c          0.0175             1.0             0.0
        ecs.g6.large     cn-zhangjiakou-a          0.0175             1.0             0.0
       ecs.g6.xlarge     cn-zhangjiakou-a          0.0175             1.0             0.0
      ecs.g6.2xlarge     cn-zhangjiakou-a          0.0175             1.0             1.0
      ecs.g6.2xlarge     cn-zhangjiakou-c          0.0175             1.0             3.0
        ecs.g6.large     cn-zhangjiakou-c          0.0175             1.0            30.8
       ecs.g6.xlarge     cn-zhangjiakou-c          0.0175             1.0             9.7
      ecs.hfg6.large     cn-zhangjiakou-c          0.0195             1.0             0.2

Prioritize instance types with a low Price(Core) and a low ratio. A Discount value of 1.0 corresponds to a 90% discount. Instance types with higher ratio values have more frequent price fluctuations, even when priced at the same discount.

Configure the spot-to-pay-as-you-go ratio

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster to manage and click its name. In the left navigation pane, choose Nodes > Node Pools.

On the Node Pools page, click Create Node Pool and configure the parameters below. For full configuration instructions, see Create and manage a node pool.

Configuration item	Description
vSwitch	Select vSwitches in different zones to improve cluster high availability.
Billing Method	Select Spot instance.
Scaling Policy (under Advanced Options)	Priority: scales based on the priority order of the vSwitches you select. If a zone runs out of capacity, the system falls back to the next zone in priority order. Cost Optimization: creates instances by ascending vCPU unit price. Spot instances are preferred. If spot instances are unavailable, pay-as-you-go instances are created as a supplement. Distribution Balancing: distributes instances evenly across zones. If distribution becomes uneven, you can trigger a rebalancing operation.
Use On-Demand Instances To Supplement Spot Capacity	When spot instances cannot be created due to price or inventory constraints, ACK automatically creates pay-as-you-go instances to meet the target count. Requires Billing Method set to Spot instance.
Enable Supplemental Spot Instance	When ACK receives a reclamation notice (5 minutes before reclamation), it attempts to launch a replacement instance proactively. See Enable supplemental spot instances for details.

After saving the configuration, go to the node pool list, click Details in the Actions column, and select the Overview tab. In the Node Configurations section, you can view the current percentage of pay-as-you-go instances.

Check the expiration status of spot instances

ACK uses the ack-node-problem-detector (NPD) component to detect impending spot instance releases and surface that information on the node.

To install NPD, see Step 1: Install the ack-node-problem-detector component.

If a spot instance is released without prior action—such as pod eviction, node draining, or node replacement—the services running on it may be disrupted. In the worst case, reclamation of a control plane node can cause a cluster-level failure. Use the InstanceExpired status in NPD to detect an impending release early.

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the cluster. In the left navigation pane, choose Nodes > Nodes.
On the Nodes page, click the name of the target node, or click More > Details in the Actions column.

In the Status section, view the InstanceExpired condition.

实例是否到期

`InstanceExpired` status	Content	Meaning
`True`	`InstanceToBeTerminated`	The spot instance is about to expire and will be released.
`False`	`InstanceNotToBeTerminated`	The spot instance has not yet expired and can continue to be used.
`Unknown`	—	Plugin error. Submit a ticket for assistance.

If the status is True, an event also appears in the Events section.

实例过期event

If InstanceExpired is True, migrate your workload to another node before the instance is released. For instructions, see Schedule applications to a specific node.

Handle spot instance interruptions

Interruption handling combines three strategies: monitoring notifications, proactive node replacement, and custom decommissioning logic.

Understand interruption notifications

ACK uses the NPD component to monitor pre-release signals for spot instances.

When no pre-release message is detected, the InstanceExpired value on the node is False.
When InstanceExpired changes to True, ACK sends a Kubernetes Event to notify you that the instance is about to be released.

Enable supplemental spot instances

When ACK receives a reclamation notice—5 minutes before the instance is released—it can automatically launch a replacement instance to maintain cluster capacity and give workloads time to migrate.

To enable this feature, turn on Enable Supplemental Spot Instance in your node pool configuration.

Once enabled, the behavior depends on whether replacement succeeds before reclamation:

If replacement succeeds: ACK drains and removes the expiring node from the cluster.
If replacement fails (due to inventory or pricing constraints): ACK does not drain the old node. Active release of spot instances may cause service interruptions. Once conditions allow, ACK automatically purchases new instances to restore the target node count.

Important

To improve replacement success rates, enable Use Pay-as-you-go Instances When Spot Instances Are Insufficient at the same time.

Add custom decommissioning logic

For some workloads, node decommissioning requires more than a standard graceful shutdown—for example, removing the node's registration from a DNS service. To support this, monitor the InstanceExpired status or listen for the InstanceToBeTerminated event. When the notification arrives, treat the node as pending decommissioning and run your custom handling logic.

For instructions on monitoring expiration status, see Check the expiration status of spot instances.

Container Service for Kubernetes:Best practices for spot instance node pools