Use Regional ESSDs for Zero-Downtime Cross-Zone Recovery - ACK

Regional Enterprise SSD (ESSD) disks automatically and synchronously replicate data across multiple zones within the same region, so your stateful applications keep running when a zone fails — without any code changes. If a node or an entire zone becomes unavailable, Kubernetes reschedules the affected pods to another zone, where they remount the same volume and resume immediately.

Before you begin

Review these topics before proceeding:

Disk overview — introduction to regional ESSDs
Limits — supported regions and other restrictions
Elastic Block Storage billing — regional ESSDs are billed by disk capacity on a pay-as-you-go basis when used as Kubernetes volumes

When to use regional ESSDs

	Standard disk	Regional ESSD
Replication	Single zone	Synchronous across multiple zones
Zone failure handling	—	Pod is rescheduled to another zone automatically
Code changes required	—	None
Billing model	—	Pay-as-you-go only (as Kubernetes volume)

Use a regional ESSD when your stateful application must stay available across zone failures without manual intervention.

Prerequisites

Before you begin, ensure that you have:

An ACK managed cluster running Kubernetes 1.26 or later
csi-plugin and csi-provisioner at version 1.33.4 or later

Use regional ESSDs in ACK

Step 1: Confirm node support

List all nodes that support regional ESSDs:

kubectl get node -lnode.csi.alibabacloud.com/disktype.cloud_regional_disk_auto=available

To verify cross-zone failover, you need at least two supported nodes in different zones. The steps below use cn-beijing-i and cn-beijing-l as examples.

Step 2: Create a StorageClass

Create sc-regional.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: alibabacloud-disk-regional
parameters:
  type: cloud_regional_disk_auto
provisioner: diskplugin.csi.alibabacloud.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Key parameters:

Parameter	Value	Purpose
`type`	`cloud_regional_disk_auto`	Provisions a regional ESSD that replicates data synchronously across multiple zones
`volumeBindingMode`	`WaitForFirstConsumer`	Delays volume creation until a pod is scheduled, so the disk is provisioned in the correct zone; without this, the disk may be locked to the wrong zone and block cross-zone failover
`reclaimPolicy`	`Delete`	Deletes the underlying disk when the Persistent Volume Claim (PVC) is deleted
`allowVolumeExpansion`	`true`	Enables disk capacity expansion

Apply the StorageClass:
```
kubectl apply -f sc-regional.yaml
```

Step 3: Deploy a stateful application

Create disk-test.yaml to define a StatefulSet that uses the StorageClass:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: disk-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
        ports:
        - containerPort: 80
        volumeMounts:
        - name: pvc-disk
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: pvc-disk
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: alibabacloud-disk-regional
      resources:
        requests:
          storage: 20Gi

Deploy the application:
```
kubectl apply -f disk-test.yaml
```
When the pod is scheduled, the CSI driver provisions a 20 GiB regional ESSD, creates a Persistent Volume (PV), and mounts it to the pod. The disk then begins synchronously replicating data across multiple zones.

Step 4: Verify the application is running

Check the PVC and pod status:

kubectl get pvc pvc-disk-disk-test-0
kubectl get pod disk-test-0

Expected output:

NAME                   STATUS   VOLUME                   CAPACITY   ACCESS MODES   STORAGECLASS                 VOLUMEATTRIBUTESCLASS   AGE
pvc-disk-disk-test-0   Bound    d-2ze5xxxxxxxxxxxxxxxx   20Gi       RWO            alibabacloud-disk-regional   <unset>                 14m
NAME          READY   STATUS    RESTARTS   AGE
disk-test-0   1/1     Running   0          14m

The PVC is bound and the pod is running, confirming that the regional ESSD was provisioned and mounted successfully.

Identify the node and zone where the pod is running:

kubectl get node $(kubectl get pod disk-test-0 -ojsonpath='{.spec.nodeName}') -Ltopology.kubernetes.io/zone

Expected output:

NAME                       STATUS   ROLES    AGE     VERSION            ZONE
cn-beijing.172.25.xxx.xx   Ready    <none>   6m32s   v1.32.1-aliyun.1   cn-beijing-i

The pod is scheduled to cn-beijing-i.

Step 5: Simulate a zone failure and verify cross-zone failover

Warning

This operation affects all pods running in the target zone. Do not perform this in a production environment.

Taint all nodes in the pod's current zone to simulate a zone failure:
```
kubectl taint node -ltopology.kubernetes.io/zone=cn-beijing-i testing=regional:NoExecute
```
The Kubernetes Controller Manager (KCM) detects the taint, evicts the pod from the affected nodes, and reschedules it to a node in another zone.

Check the pod's new status and location:

kubectl get pod disk-test-0
kubectl get node $(kubectl get pod disk-test-0 -ojsonpath='{.spec.nodeName}') -Ltopology.kubernetes.io/zone

Expected output:

NAME          READY   STATUS    RESTARTS   AGE
disk-test-0   1/1     Running   0          20s
NAME                       STATUS   ROLES    AGE   VERSION            ZONE
cn-beijing.172.26.xxx.xx   Ready    <none>   32m   v1.32.1-aliyun.1   cn-beijing-l

The pod is now running in cn-beijing-l. The regional ESSD is reattached automatically and the data remains intact — no manual synchronization required.

Step 6: Clean up

Remove the taint to restore normal scheduling in cn-beijing-i:

kubectl taint node -ltopology.kubernetes.io/zone=cn-beijing-i testing-

Delete the test resources:

kubectl delete sts disk-test
kubectl delete pvc pvc-disk-disk-test-0