All Products
Search
Document Center

Container Service for Kubernetes:Migrate stateful applications with cloud disks across zones

Last Updated:Mar 26, 2026

The storage-operator component automates cross-zone migration and multi-zone spreading of StatefulSets that use disk volumes. When exceptions occur, it restores the application in the original zone through precheck and rollback mechanisms. This topic describes how to migrate stateful applications that use disk volumes across zones.

Use cases

Scenario Description
Zone planning changes Move workloads to a different zone due to infrastructure or capacity updates.
Multi-zone spreading Distribute replicas and their disks across multiple zones to improve availability.
Resource constraints The current zone has insufficient capacity to support continued operation or scale-out.

NAS and OSS support cross-zone and multi-mount usage. Disks are zone-bound — they cannot float across zones or reuse existing persistent volume claims (PVCs) and persistent volumes (PVs). When your StatefulSet uses disk volumes, you must create new disks in the target zone from snapshots.

Key constraints

Before you start, note the following constraints:

  • Business interruption required: Cross-zone migration scales the StatefulSet to 0 replicas before migration, then restores all replicas at once after disk migration completes — not as a rolling update. Plan for downtime. The duration depends on replica count, container startup time, and disk capacity used.

  • ESSD disks required: All storage used by the StatefulSet must be ESSD disks. The migration feature uses instant access snapshots to minimize snapshot creation time, which only support ESSD disks. For more information, see Snapshot instant access.

  • Target zone requirements: The target zone must support ESSD disks, and the cluster must have nodes in that zone available for scheduling.

If your application uses non-ESSD disks, do one of the following before migrating:

How it works

Cross-zone migration relies on disk snapshots and uses instant access snapshots to minimize snapshot creation time. For more information, see Introduction to snapshots and Snapshot billing.

The storage-operator component runs the following steps:

  1. Precheck: Verifies that the application is running properly and identifies disks to migrate. Migration stops if the precheck fails.

  2. Scale to zero: Scales the StatefulSet to 0 replicas, pausing the application.

  3. Create snapshots: Creates instant access snapshots for all mounted disks. Snapshots are zone-agnostic and can be used in any zone.

  4. Provision new disks: After confirming snapshots are available, creates new disks in the target zone with the same data.

  5. Rebuild PVCs and PVs: Rebuilds PVCs with the same names and their corresponding PVs, bound to the new disks.

  6. Restore replicas: Restores the StatefulSet to its original replica count. Replicas automatically bind to the rebuilt PVCs and mount the new disks.

  7. (Optional) Delete original resources: After confirming the application is healthy, delete the original PVs and disks. For disk billing details, see Block storage billing.

Important

Each step after the precheck has a specific rollback strategy. Confirm the StatefulSet is running correctly after migration before deleting the original disks — this ensures the application can remount the original disks if a rollback is needed.

Prerequisites

Before you begin, ensure that you have:

  • A cluster running Kubernetes 1.20 or later with the Container Storage Interface (CSI) driver installed

  • storage-operator v1.26.2-1de13b6-aliyun or later installed in the cluster. For installation instructions, see Manage the storage-operator component

  • The csi-plugin and csi-provisioner components installed, with csi-provisioner using the non-managed version

    If the managed version of csi-provisioner is currently installed, uninstall it and install the non-managed version. After switching, restart the storage controller: kubectl delete pod -n kube-system <storage-controller-pod-name>
  • (ACK dedicated clusters only) The worker RAM role and master RAM role of your cluster must have permission to call the ModifyDiskSpec operation of the Elastic Compute Service (ECS) API. For instructions, see Create a custom policy. <details> <summary>View the required RAM policy</summary>

    ACK managed clusters do not require ModifyDiskSpec permission.
    {
        "Version": "1",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ecs:CreateSnapshot",
                    "ecs:DescribeSnapshot",
                    "ecs:DeleteSnapshot",
                    "ecs:ModifyDiskSpec",
                    "ecs:DescribeTaskAttribute"
                ],
                "Resource": "*"
            }
        ]
    }

    </details>

Migrate a StatefulSet across zones

Step 1: Enable the storage controller

Patch the ConfigMap to activate the storage controller:

kubectl patch configmap/storage-operator \
  -n kube-system \
  --type merge \
  -p '{"data":{"storage-controller":"{\"imageRep\":\"acs/storage-controller\",\"imageTag\":\"\",\"install\":\"true\",\"template\":\"/acs/templates/storage-controller/install.yaml\",\"type\":\"deployment\"}"}}'

Step 2: Create a migration task

Create a ContainerStorageOperator resource to define the migration task:

cat <<EOF | kubectl apply -f -
apiVersion: storage.alibabacloud.com/v1beta1
kind: ContainerStorageOperator
metadata:
  name: default
spec:
  operationType: APPMIGRATE
  operationParams:
    stsName: web
    stsNamespace: default
    stsType: kube
    targetZone: cn-beijing-h,cn-beijing-j
    checkWaitingMinutes: "1"
    healthDurationMinutes: "1"
    snapshotRetentionDays: "2"
    retainSourcePV: "true"
EOF

Parameters:

Parameter Required Default Description
operationType Required Set to APPMIGRATE for stateful application migration.
stsName Required Name of the StatefulSet to migrate. Only a single StatefulSet can be specified. When multiple migration tasks are deployed, the component migrates them sequentially in deployment order.
stsNamespace Required Namespace of the StatefulSet.
targetZone Required Comma-separated list of target zones, for example, cn-beijing-h,cn-beijing-j. If a disk is already in a listed zone, it is not migrated. When multiple zones are listed, remaining disks are distributed across zones in list order.
stsType Optional kube StatefulSet type. Valid values: kube (native StatefulSet) and kruise (Advanced StatefulSet provided by OpenKruise).
checkWaitingMinutes Optional "1" Polling interval (minutes) for checking replica availability after migration. Increase this value for StatefulSets with many replicas, slow image pulls, or long startup times to avoid premature rollback.
healthDurationMinutes Optional "0" After available replicas match the expected count, the system waits this many minutes before running a secondary health check. Set to "0" to skip the secondary check.
snapshotRetentionDays Optional "1" Retention period for instant access snapshots created during migration. Valid values: "1" (one day) and "-1" (permanent).
retainSourcePV Optional "false" Whether to keep the original disk and PV after migration. "false" deletes both. "true" retains them — the disk remains visible in the ECS console and the PV enters the Released state.

Examples

The following examples use an ACK Pro cluster with nodes in three zones:

  • Zone B: cn-shanghai.192.168.5.245

  • Zone G: cn-shanghai.192.168.2.214

  • Zone M: cn-shanghai.192.168.3.236, cn-shanghai.192.168.3.237

节点可用区

Step 1: Create a StatefulSet with ESSD disks

Create a test StatefulSet with ESSD disks. Skip this step if you already have a StatefulSet to migrate.

  1. Deploy the StatefulSet: <details> <summary>View the YAML for the Nginx StatefulSet</summary>

    cat << EOF | kubectl apply -f -
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: web
    spec:
      selector:
        matchLabels:
          app: nginx
      serviceName: "nginx"
      replicas: 2
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
            - name: nginx
              image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
              ports:
                - containerPort: 80
                  name: web
              volumeMounts:
                - name: www
                  mountPath: /usr/share/nginx/html
      volumeClaimTemplates:
        - metadata:
            name: www
            labels:
              app: nginx
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: "alicloud-disk-essd"
            resources:
              requests:
                storage: 20Gi
    EOF

    </details>

  2. Verify that both pods are running:

    kubectl get pod -o wide -l app=nginx

    The output shows both pods scheduled to zone M (actual placement depends on the scheduler):

    NAME       READY   STATUS    RESTARTS   AGE   IP              NODE                        NOMINATED NODE   READINESS GATES
    web-0      1/1     Running   0          2m    192.168.3.243   cn-shanghai.192.168.3.237   <none>           <none>
    web-1      1/1     Running   0          2m    192.168.3.246   cn-shanghai.192.168.3.236   <none>           <none>

Step 2: Create a migration task

Example 1: Cross-zone migration

Migrate all pods to a single target zone (zone B in this example).

Important

Before migration, confirm that the target zone has sufficient node resources and that the node specifications support ESSD disks.

  1. Create the migration task:

    cat <<EOF | kubectl apply -f -
    apiVersion: storage.alibabacloud.com/v1beta1
    kind: ContainerStorageOperator
    metadata:
      name: migrate-to-b
    spec:
      operationType: APPMIGRATE
      operationParams:
        stsName: web
        stsNamespace: default
        stsType: kube
        targetZone: cn-shanghai-b     # Target zone for migration.
        healthDurationMinutes: "1"    # Wait 1 minute after migration to confirm the application is running properly.
        snapshotRetentionDays: "-1"   # Retain snapshots permanently until manually deleted.
        retainSourcePV: "true"        # Retain the original disks and PVs.
    EOF
  2. Check the migration status:

    If the status is FAILED, see FAQ for troubleshooting.
    kubectl describe cso migrate-to-b | grep Status

    A SUCCESS status confirms the migration completed:

      Status:
        Status:   SUCCESS
  3. Verify pod placement after migration:

    kubectl get pod -o wide -l app=nginx

    Both pods are now on the cn-shanghai.192.168.5.245 node in zone B:

    NAME    READY   STATUS    RESTARTS   AGE     IP              NODE                        NOMINATED NODE   READINESS GATES
    web-0   1/1     Running   0          2m36s   192.168.5.250   cn-shanghai.192.168.5.245   <none>           <none>
    web-1   1/1     Running   0          2m14s   192.168.5.2     cn-shanghai.192.168.5.245   <none>           <none>
  4. Confirm the results in the ECS console:

    • Snapshots page: 2 new snapshots created with permanent retention.

    • Block Storage page: 2 new disks in zone B; the 2 original disks in zone M are retained (because retainSourcePV is "true").

Example 2: Multi-zone spreading

Distribute pods across two zones (zones B and G) to improve availability.

  1. Create the migration task:

    cat <<EOF | kubectl apply -f -
    apiVersion: storage.alibabacloud.com/v1beta1
    kind: ContainerStorageOperator
    metadata:
      name: migrate
    spec:
      operationType: APPMIGRATE
      operationParams:
        stsName: web
        stsNamespace: default
        stsType: kube
        targetZone: cn-shanghai-b,cn-shanghai-g   # Target zones. Multiple zones trigger automatic spreading.
        healthDurationMinutes: "1"                # Wait 1 minute after migration to confirm the application is running properly.
        snapshotRetentionDays: "-1"               # Retain snapshots permanently until manually deleted.
        retainSourcePV: "true"                    # Retain the original disks and PVs.
    EOF
  2. Check the migration status:

    If the status is FAILED, see FAQ for troubleshooting.
    kubectl describe cso migrate | grep Status

    A SUCCESS status confirms the migration completed:

      Status:
        Status:   SUCCESS
  3. Verify pod placement after migration:

    kubectl get pod -o wide -l app=nginx

    The pods are spread across zone B (cn-shanghai.192.168.5.245) and zone G (cn-shanghai.192.168.2.214):

    NAME    READY   STATUS    RESTARTS   AGE     IP              NODE                        NOMINATED NODE   READINESS GATES
    web-0   1/1     Running   0          4m59s   192.168.2.215   cn-shanghai.192.168.2.214   <none>           <none>
    web-1   1/1     Running   0          4m38s   192.168.5.250   cn-shanghai.192.168.5.245   <none>           <none>
  4. Confirm the results in the ECS console:

    • Snapshots page: 2 new snapshots created with permanent retention.

    • Block Storage page: 2 new disks across zones B and G; the 2 original disks in zone M are retained.

FAQ

If a migration task returns FAILED, run the following command to get the error message:

kubectl describe cso <ContainerStorageOperator-name> | grep Message -A 1

Example output:

  Message:
    Consume: failed to get target pvc, err: no pvc mounted in statefulset or no pvc need to migrated web

This error means the component could not find a PVC to migrate. Common causes:

  • The StatefulSet has no mounted storage.

  • All disks are already in the target zone — no migration is needed.

  • The component could not retrieve PVC information.

Resolve the issue based on the error message, then reapply the migration task.