All Products
Search
Document Center

Container Service for Kubernetes:Recommended configurations for high availability of disk volumes

Last Updated:Apr 01, 2025

When you deploy a StatefulSet that has a disk volume mounted in a Container Service for Kubernetes (ACK) cluster, the system may fail to create the StatefulSet due to issues related to zone configurations or disk categories specified for the cluster. This topic provides the recommended configurations for applications that are deployed across zones. The recommended configurations help you prevent issues caused by underlying configurations and minimize the risks of application release interruptions.

Background information

Kubernetes provides powerful container orchestration capabilities to help users develop large-scale stateful applications by using StatefulSets on Kubernetes with ease. Kubernetes greatly simplifies application distribution and deployment. However, this hides the underlying hardware logic from users and may cause the following issues:

  • Your application in a cross-zone cluster is accidentally deployed in Zone B instead of Zone A, which is the desired zone.

  • You fail to create a dynamically provisioned PV used to mount a disk, and the system prompts an error, such as InvalidDataDiskCatagory.NotSupported.

  • When you mount a disk to an application, the system prompts the following error message: The instanceType of the specified instance does not support this disk category.

  • When you debug an application, the system prompts the following error message: 0/x node are available, x nodes had volume node affinity conflict.

The preceding issues can interrupt application releases. To reduce the risks of these issues, you can use the recommended configurations provided by this topic.

Recommended configurations

Requirements

  • Use disks instead of File Storage NAS (NAS) file systems to persist data. Compared with NAS file systems, disks are more stable and provide higher bandwidth for data transfer.

  • Deploy your cluster across three zones to ensure sufficient node and storage resources.

  • Enable node auto scaling so that nodes can be added when all nodes in the cluster become unavailable.

  • To avoid mount failures, we recommend that you specify multiple disk categories in the StorageClass that you use.

  • Make sure that the pods of your application can be evenly distributed to the nodes in different zones.

image

Recommended node pool configurations

  • Deploy each node pool only in a single zone.

    • If you want to add nodes in a new zone to the cluster, we recommend that you create a new node pool in the new zone. For more information, see Create and manage a node pool.

    • When you create node pools in a cluster, make sure that each node pool is deployed in a separate zone. To help you identify the zone of a node pool, we recommend that you specify the zone ID in the node pool name.

  • Enable auto scaling for node pools. For more information, see Enable node auto scaling.

    After you enable auto scaling for a node pool, the system automatically adds a node to the node pool when all nodes in the node pool become unavailable for pod scheduling. The following figure shows an example of node auto scaling.pod

  • Use the same type of Elastic Compute Service (ECS) instance to deploy node pools in different zones, or use ECS instances that support the same type of block storage to deploy node pools in different zones.

    The ECS instance types to which a cloud disk can be attached depend on the category of the disk. Therefore, you may fail to launch a pod because the category of the disk mounted to the pod is not supported by the ECS instance that hosts the pod, even though the disk and the ECS instance reside in the same zone.

  • Add taints to all nodes in a node pool to prevent irrelevant applications from being scheduled to the nodes.污点

Recommend cluster configurations

  • Make sure that the Kubernetes version of the cluster is 1.20 or later.

  • Make sure that the version of the Container Storage Interface (CSI) plug-in installed in the cluster is 1.22 or later. For more information, see Manage the CSI plug-in.

  • Specify multiple disk categories in the StorageClass that you use.

    Sample YAML template:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: alicloud-disk-topology-alltype
    parameters:
      type: cloud_essd,cloud_ssd,cloud_efficiency
    provisioner: diskplugin.csi.alibabacloud.com
    reclaimPolicy: Delete
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true
    allowedTopologies:
    - matchLabelExpressions:
      - key: topology.diskplugin.csi.alibabacloud.com/zone
        values:
        - cn-beijing-a
        - cn-beijing-b

    Parameter description:

    • type: cloud_essd,cloud_ssd,cloud_efficiency: The system attempts to create a disk of a category in the following sequence: Enterprise SSD (ESSD), standard SSD, and ultra disk. The system first attempts to create an ESSD. If ESSDs are out of stock, the system attempts to create a standard SSD. If standard SSDs are out of stock, the system attempts to create an ultra disk. This reduces the risks of disk creation failures caused by insufficient inventory and helps you prevent pod startup failures.

    • volumeBindingMode: WaitForFirstConsumer: After a pod that uses the StorageClass is scheduled to a node, the system attempts to create a disk in the zone where the node is deployed based on the StorageClass. This reduces the risks of disk mount failures caused by zone inconsistency and helps you prevent pod startup failures.

    • allowedTopologies: You can use this parameter to restrict the topology domains of the volumes provisioned by using the StorageClass in specific regions and zones. If you set volumeBindingMode to WaitForFirstConsumer, the scheduler schedules the pods that use the StorageClass in the specified topology domains to meet the requirements for disk creation.

Recommended application configurations

The following sample code provides an example of a standard StatefulSet template. You can customize the template based on your business requirements.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: mysql
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      containers:
      - image: mysql:5.6
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "mysql"
        volumeMounts:
        - name: disk-csi
          mountPath: /var/lib/mysql
      tolerations:
      - key: "app"
        operator: "Exists"
        effect: "NoSchedule"
  volumeClaimTemplates:
  - metadata:
      name: disk-csi
    spec:
      accessModes: [ "ReadWriteMany" ]
      storageClassName: alicloud-disk-topology-alltype
      resources:
        requests:
          storage: 40Gi

Parameter description:

  • topologySpreadConstraints: The system attempts to spread the pods provisioned by the application to different zones. For more information, see Topology Spread Constraints.

  • volumeClaimTemplates: The system automatically creates a disk for each replicated pod. This helps you quickly scale out the application.

Important

When a persistent volume (PV) is dynamically provisioned, the YAML file of the PV contains information about the zones of the nodes to which the PV is mounted. The PV and the persistent volume claim (PVC) bound to the PV can be associated only with pods in the zones of the nodes that host the pods. This ensures that disks can be successfully mounted to pods.

References