All Products
Search
Document Center

Container Service for Kubernetes:Best practices for preemptible instance-based node pools

Last Updated:Jun 12, 2023

Preemptible instances are a type of on-demand instances and are discounted compared to pay-as-you-go instances. You can specify a ratio of preemptible instances to pay-as-you-go instances in a node pool. This helps reduce the resource cost of the node pool. This topic describes preemptible instance-based node pools and the use scenarios. This topic also describes how to configure instance types for a preemptible instance-based node pool, specify a ratio of preemptible instances to pay-as-you-go instances in a node pool, check whether a preemptible instance is about to expire, and how a preemptible instance is gracefully shut down.

Background information

Preemptible instances use the pay-as-you-go billing method. You pay for preemptible instances after you use them. Bills are calculated based on the market price and billing duration. For more information, see Preemptible instances.

Introduction

A preemptible instance-based node pool consists of preemptible instances and pay-as-you-go instances. You must specify a ratio of preemptible instances to pay-as-you-go instances in a preemptible instance-based node pool.

Preemptible instances are pay-as-you-go instances whose prices dynamically fluctuate based on factors such as the resource stock. Therefore, the price of a preemptible instance can be up to 90% lower than the price of a pay-as-you-go instance. The market price of a preemptible instance fluctuates based on changes in supply and demand for the instance type. When you create a preemptible instance, you must select a bidding mode to bid for a specified instance type. If your bid price is higher than the market price and the stock of the instance type is sufficient, a preemptible instance is created.

After a preemptible instance is created, you can use the preemptible instance in the same way you use a pay-as-you-go instance. You can also use the preemptible instance together with other cloud resources such as cloud disks and elastic IP addresses (EIPs). The default protection period of a preemptible instance is 1 hour. After the protection period ends, the system checks the market price and resource stock of the instance type every 5 minutes. If the market price is higher than your bid price or if the stock of the instance type is insufficient, the preemptible instance is released.

Scenarios

  • Preemptible instance-based node pools

    Preemptible instances in a preemptible instance-based node pool may be unexpectedly reclaimed. Therefore, preemptible instance-based node pools are suitable for stateless applications with high fault tolerance. You can use preemptible instance-based node pools for workloads that use batch processing, machine learning training jobs, queued transaction processing applications, applications that use the REST API, and extract, transform, load (ETL) workloads of big data computing, such as Apache Spark jobs.

    If you want to deploy workloads in a preemptible instance-based node pool, you must make sure that the workloads are tolerant to node resource unavailability. If the workloads are not tolerant to node resource unavailability, we recommend that you deploy the workloads in node pools that consist of pay-as-you-go instances or subscription instances. Workloads that are not tolerant to node resource unavailability include but not limited to the following types:

    • Cluster management tools, such as monitoring tools and O&M tools.

    • Stateful workloads or applications, such as database services.

  • Preemptible instance-based node pools that have auto scaling enabled

    If your workloads are deployed in a preemptible instance-based node pool and process user traffic that fluctuates in a specific pattern, we recommend that you enable auto scaling for the preemptible instance-based node pool in which your workloads are deployed.

    After you enable auto scaling, the auto scaling component automatically checks whether the node pool needs to be scaled out to host more pods and automatically scales in the node pool when the scale-in threshold is reached. Preemptible instance-based node pools that have auto scaling enabled can scale out faster than preemptible instance-based node pools that have auto scaling disabled. Idle resources are released more promptly after you enable auto scaling for a preemptible instance-based node pool. Faster scale-out helps ensure sufficient node resources when preemptible instances are reclaimed. The automatic release of idle resources helps reduce the resource cost.

Select instance types for preemptible instances

We recommend that you select instance types based on your business requirements and balance the instance resource stock, resource cost, and instance performance. Elastic Compute Service (ECS) provides a large variety of instance types to meet the requirement of different workloads. When you use a preemptible instance-based node pool, you must select a combination of instance types that have minimal impacts on your workloads when preemptible instances are reclaimed.

You can use one of the following methods to select instance types:

  • Use the ACK console

    After you select instance types in the Container Service for Kubernetes (ACK) console console, suggestions on the selected instance types automatically appear on the page. When you create or modify a node pool in the ACK console, instance types that are available in the region that you selected are displayed on the page. You can select instance types based on your business requirements. After you select instance types, the ACK console automatically provides suggestions on the selected instance types, and displays the scalability of the node pool and the price range of each preemptible instance. You can change the instance types and specify the price upper limit based on the information. For more information about how to create or modify a node pool, see Procedure. Preemptible instance.png

  • Use the spot-instance-advisor command-line tool

    spot-instance-advisor is an open source command-line tool provided by ACK. You can use spot-instance-advisor to query the historical prices and current price of a preemptible instance. spot-instance-advisor calls API operations to obtain the historical prices of the instance types in a region. spot-instance-advisor calculates the hourly vCPU unit price based on the obtained statistics and lists the instance types with the lowest hourly vCPU unit prices. spot-instance-advisor also calculates the entropy of the hourly vCPU unit price of each instance type. A higher value of entropy indicates more frequent fluctuation in the price. We recommend that you select instance types whose hourly vCPU unit prices have a low value of entropy.

    Note

    Download spot-instance-advisor. For more information, see spot-instance-advisor.

    spot-instance-advisor supports the following parameters:

    Usage of ./spot-instance-advisor:
      -accessKeyId string
            Your accessKeyId of cloud account
      -accessKeySecret string
            Your accessKeySecret of cloud account
      -cutoff int
            Discount of the spot instance prices (default 2)
      -family string
            The spot instance family you want (e.g. ecs.n1,ecs.n2)
      -limit int
            Limit of the spot instances (default 20)
      -maxcpu int
            Max cores of spot instances  (default 32)
      -maxmem int
            Max memory of spot instances (default 64)
      -mincpu int
            Min cores of spot instances (default 1)
      -minmem int
            Min memory of spot instances (default 2)
      -region string
            The region of spot instances (default "cn-hangzhou")
      -resolution int
            The window of price history analysis (default 7)

    Run the following command to query the prices of the instance types in the current region:

    ./spot-instance-advisor --accessKeyId=<id> --accessKeySecret=<secret> --region=<cn-zhangjiakou>
    Note

    The accessKeyId, accessKeySecret, and region parameters are required. Set the parameters to the actual values.

    Expected output:

    Initialize cache ready with 619 kinds of instanceTypes
    Filter 93 of 98 kinds of instanceTypes.
    Fetch 93 kinds of InstanceTypes prices successfully.
    Successfully compare 199 kinds of instanceTypes
          InstanceTypeId               ZoneId     Price(Core)        Discount           ratio
            ecs.c6.large     cn-zhangjiakou-c          0.0135             1.0             0.0
            ecs.c6.large     cn-zhangjiakou-a          0.0135             1.0             0.0
          ecs.c6.2xlarge     cn-zhangjiakou-a          0.0136             1.0             0.0
          ecs.c6.2xlarge     cn-zhangjiakou-c          0.0136             1.0             0.0
          ecs.c6.3xlarge     cn-zhangjiakou-a          0.0137             1.0             0.0
          ecs.c6.3xlarge     cn-zhangjiakou-c          0.0137             1.0             0.0
           ecs.c6.xlarge     cn-zhangjiakou-c          0.0138             1.0             0.0
           ecs.c6.xlarge     cn-zhangjiakou-a          0.0138             1.0             0.0
         ecs.hfc6.xlarge     cn-zhangjiakou-a          0.0158             1.0             0.0
          ecs.hfc6.large     cn-zhangjiakou-a          0.0160             1.0             0.0
          ecs.hfc6.large     cn-zhangjiakou-c          0.0160             1.0             0.0
          ecs.g6.3xlarge     cn-zhangjiakou-a          0.0175             1.0             0.0
          ecs.g6.3xlarge     cn-zhangjiakou-c          0.0175             1.0             0.0
            ecs.g6.large     cn-zhangjiakou-a          0.0175             1.0             0.0
           ecs.g6.xlarge     cn-zhangjiakou-a          0.0175             1.0             0.0
          ecs.g6.2xlarge     cn-zhangjiakou-a          0.0175             1.0             1.0
          ecs.g6.2xlarge     cn-zhangjiakou-c          0.0175             1.0             3.0
            ecs.g6.large     cn-zhangjiakou-c          0.0175             1.0             30.8
           ecs.g6.xlarge     cn-zhangjiakou-c          0.0175             1.0             9.7
          ecs.hfg6.large     cn-zhangjiakou-c          0.0195             1.0             0.2

    For the top-ranking instance types in the output, the hourly vCPU unit price and the value of the price entropy are low. The value in the ratio column indicates the value of the price entropy. Instance types other than the top-ranking instance types offer a 90% discount but have a higher value of the price entropy. When you select instance types, we recommend that you select the top-ranking instance types with lower prices and lower values of the price entropy.

Set the ratio of preemptible instances to pay-as-you-go instances

You can set the ratio of preemptible instances to pay-as-you-go instances in a node pool. This allows you to reduce costs by controlling the number of preemptible instances when the node pool contains sufficient pay-as-you-go instances.

Important

You must first create an ACK cluster and the Kubernetes version of the cluster is later than 1.9. If the Kubernetes version of your cluster is 1.9 or earlier, you must update the Kubernetes version for your cluster. Fore more information, see Update the Kubernetes version of an ACK cluster.

  • Make sure that you have a sufficient node quota in the cluster. If you want to increase the node quota, submit an application. For more information, see Limits.

  • When you add an existing Elastic Compute Service (ECS) instance to a node pool, make sure that the ECS instance is associated with an elastic IP address (EIP) or a NAT gateway is configured for the virtual private cloud (VPC) in which ECS instance resides. Make sure that the ECS instance can access the Internet. Otherwise, the ECS instance cannot be added to the cluster.

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.

  2. On the Clusters page, click the name of a cluster and choose Nodes > Node Pools in the left-side navigation pane.

  3. In the upper-right corner of the Node Pools page, click Create Node Pool.

  4. In the Create Node Pool dialog box, select multiple vSwitches for vSwitch and select Preemptible Instance for Billing Method.

    For more information about other node pool parameters, see Procedure.

  5. Click Show Advanced Options. Select Cost Optimization for Scaling Policy and set the Percentage of Pay-as-you-go Instances and Enable Supplemental Pay-as-you-go Instances parameters based on your business requirements.

    Scaling policy.png

    Parameter

    Description

    Scaling Policy

    • Priority: scales the node pool based on the priorities of the vSwitches that you specify. If Auto Scaling fails to create ECS instances in the zone of the vSwitch with the highest priority, Auto Scaling attempts to create ECS instances in the zone of the vSwitch with a lower priority.

    • Cost Optimization: creates ECS instances based on the ascending order of vCPU unit prices. The system preferably creates preemptible instances when multiple instance types are specified. If Auto Scaling fails to create preemptible instances for reasons such as that preemptible instances are out of stock, Auto Scaling attempts to create pay-as-you-go ECS instances. If you set the scaling policy to cost optimization, you can configure the following parameters:

      • Percentage of Pay-as-you-go Instances: Specify the percentage of pay-as-you-go instances in the node pool. Valid values: 0 to 100.

      • Enable Supplemental Preemptible Instances: After you enable this feature, Auto Scaling automatically creates the same number of preemptible instances 5 minutes before the system reclaims the existing preemptible instances. The system sends a notification to Auto Scaling 5 minutes before it reclaims preemptible instances.

      • Enable Supplemental Pay-as-you-go Instances: After you enable this feature, Auto Scaling attempts to create pay-as-you-go ECS instances to meet the scaling requirement if Auto Scaling fails to create preemptible instances for reasons such as that the unit price is too high or preemptible instances are out of stock.

    • Distribution Balancing: evenly distributes ECS instances across the zones of the vSwitches that are specified for the scaling group. If the distribution of ECS instances across zones is not balanced due to reasons such as that ECS resources are out of stock, you can select this policy to evenly distribute the ECS instances across zones.

      Note

      This policy takes effect only when you have specified multiple vSwitches in the VPC.

    Note

    After you click Confirm Order, take note of the following items:

    • The value of Scaling Policy cannot be changed.

    • If you select Cost Optimization for Scaling Policy, you can change the values of Percentage of Pay-as-you-go Instances and Enable Supplemental Pay-as-you-go Instances.

  6. Click Confirm Order.

After you create the node pool, find the node pool on the Node Pools page and click Details in the Actions column. Then, click the Overview tab. In the Node Configurations, you can view the percentage of pat-as-you-go instances in the node pool.

Check whether a preemptible instance is about to expire

To avoid the unexpected expiration and release of preemptible instances, ACK provides the ack-node-problem-detector component to send you notifications when preemptible instances are about to be released.

Note

You must first create an ACK cluster and install the latest version of ack-node-problem-detector in the cluster. Fore more information, see Create an ACK managed cluster.

  • If you choose to create a new cluster, make sure that Install node-problem-detector and Create Event Center is selected.

  • If you choose to use an existing cluster, make sure that the latest version of ack-node-problem-detector is installed. Fore more information, see Manage system components.

In ACK clusters, ECS instances are used as nodes to host services in the clusters. If you specify preemptible instance or subscription as the billing method when you create an ECS instance in a cluster, the instance will be automatically released at the specified time of expiration. When the instance is automatically released, if pod eviction, node draining, or node replacement is not completed in advance, services that run on the instance may be interrupted. If the instance of a master node is released, cluster-level exceptions may occur. To prevent such issues caused by the automatic release of preemptible instances, ACK uses ack-node-problem-detector to send you notifications when preemptible instances are about to be released.

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.

  2. On the Clusters page, click the name of a cluster and choose Nodes > Nodes in the left-side navigation pane.

  3. On the Nodes page, find the node that you want to check and click the node name or choose More > Details in the Actions column.

  4. On the node details page, check the status of the InstanceExpired condition.

    In the Status section, check the status of the InstanceExpired condition. Whether the instance is about to expire

    The following table describes the states of InstanceExpired condition.

    State of InstanceExpired

    Description

    True

    If the InstanceExpired condition is in the True state and the content is InstanceToBeTerminated, it indicates that the preemptible instance is about to expire and be released.

    False

    If the InstanceExpired condition is in the False state and the content is InstanceNotToBeTerminated, it indicates that the preemptible instance is not about to expire.

    Unknown

    This state indicates that an error occurred on ack-node-problem-detector. Submit a ticket for solutions.

    If the InstanceExpired condition is in the True state, an event is generated in the Events section, as shown in the following figure. Instance expiry event

If the InstanceExpired condition is in the True state, it indicates that the preemptible instance is about to expire and be released. To prevent applications that run on this node from being interrupted, schedule the applications to other nodes. For more information, see Schedule pods to specific nodes.

Graceful shutdown of preemptible instances

The graceful shutdown of preemptible instances includes the following processes: monitoring and notification, supplementation of preemptible instances before the preemptible instances are reclaimed, and custom operations on nodes to be released.

  • Monitoring and notification

    ACK uses Node Problem Detector (NPD) to monitor the status of preemptible instances and sends notifications when preemptible instances are about to expire.

    • If a preemptible instance is not about to expire, the value of InstanceExpired is False. Status of preemptible instances.png

    • If a preemptible instance is about to expire, the value of InstanceExpired is True. In this case, ACK generates a cluster event to remind you that the preemptible instance is about to expire. Release preemptible instances.png

  • Supplementation of preemptible instances before the preemptible instances are reclaimed

    After a preemptible instance expires and is released, services that are deployed on the instances are suspended. ACK provides various methods to help you respond to the expiration and release of preemptible instances at the earliest opportunity. You can configure auto scaling for preemptible instances, monitor the status of preemptible instances, and configure the system to send notifications when preemptible instances are about to expire. However, these methods are implemented after preemptible instances are reclaimed. Available resources in the cluster are not increased until new instances are added to the cluster. To resolve this issue, ACK supplements preemptible instances before they are reclaimed. This feature enables ACK to create new preemptible instances before existing preemptible instances expire.

    After you enable this feature, ACK automatically checks whether preemptible instances are about to expire. If a preemptible instance is about to expire, ACK automatically triggers a scale-out activity to add a new preemptible instance. The newly created instance is known as a supplemental instance. After the supplemental instance starts running, the release process of the preemptible instance starts. The status of the preemptible instance is set to unschedulable. Then, the preemptible instance is drained and removed. This way, workloads on the preemptible instance are smoothly migrated to other nodes in the cluster to avoid service interruptions.

    Note

    The supplementation of preemptible instances does not interrupt the reclaim of preemptible instances that are about to expire. A preemptible instance will be reclaimed 5 minutes after ACK notifies that the instance is about to expire, regardless of whether supplemental instances are created.

    Supplementation of preemptible instances.png
  • Custom operations on nodes to be released

    When you gracefully shut down a node, you may need to perform other operations on the node, for example, deleting the node information from the DNS records. To perform custom operations on nodes to be released, we recommend that you monitor the value of the InstanceExpired field in the node status or configure a listener to listen for InstanceToBeTerminated events. If you receive a notification that a preemptible instance is about to expire or be released, you can perform a graceful shutdown for the instance and then perform custom operations on the instance. For more information about how to check whether a preemptible instance is about to expire, see Check whether a preemptible instance is about to expire.