When multiple disks back a stateful application in Container Service for Kubernetes (ACK), taking individual VolumeSnapshots from each disk one at a time can produce inconsistent recovery points — each snapshot captures a slightly different moment in time. Group snapshots solve this by triggering all snapshots simultaneously, so every disk in the group is captured at the same instant. This reduces the risk of data inconsistency when you need to restore an application across multiple volumes.
How it works
Group snapshots are built on ECS snapshot-consistent groups. ACK provides the following CustomResourceDefinitions (CRDs) that you can use to manage group snapshots:
| CRD | Analogous to | Purpose |
|---|---|---|
| VolumeGroupSnapshotClass | StorageClass | Defines the driver and deletion policy for group snapshots |
| VolumeGroupSnapshot | PersistentVolumeClaim (PVC) | A request to snapshot a set of disks, identified by a label selector |
| VolumeGroupSnapshotContent | PersistentVolume (PV) | Records the ECS snapshot-consistent group created for the request |
| VolumeSnapshot | — | A request for a volume snapshot; automatically created for each disk after a VolumeGroupSnapshot is created |
| VolumeSnapshotContent | — | Records information about an ECS snapshot |
When you create a VolumeGroupSnapshot, the Container Storage Interface (CSI) plug-in creates an ECS snapshot-consistent group and automatically generates a VolumeSnapshot and VolumeSnapshotContent for each selected disk. If a disk fails, restore it from the corresponding VolumeSnapshot.
Billing
Snapshot-consistent groups are free. The snapshots inside the group are billed based on storage consumed. For details, see Snapshot billing.
Limitations
-
Group snapshots follow the same limits as ECS snapshots. See Limits.
-
Expiration times are not supported for ECS snapshot-consistent groups. Delete group snapshots manually when they are no longer needed.
Prerequisites
Before you begin, ensure that you have:
-
An ACK managed cluster running Kubernetes 1.28 or later. See Create an ACK managed cluster.
-
The CSI plug-in installed at version 1.31.4 or later. To update csi-plugin and csi-provisioner, see Update csi-plugin and csi-provisioner.
-
The Elastic Compute Service (ECS) Snapshot service activated (free to activate). See Activate ECS Snapshot.
If the cluster uses FlexVolume (deprecated in ACK), upgrade to CSI before creating group snapshots. See Upgrade from FlexVolume to CSI. To check which volume plug-in is installed, go to the cluster details page in the ACK console, choose Operations > Add-ons in the left-side navigation pane, and click the Storage tab.
Step 1: Enable the feature gate
Enable the EnableVolumeGroupSnapshots feature gate in csi-provisioner before creating group snapshots.
-
Log on to the ACK console. In the left-side navigation pane, click Clusters.
-
On the Clusters page, click the name of the cluster. In the left-side navigation pane, choose Operations > Add-ons.
-
On the Add-ons page, find the csi-provisioner card and click Configuration in the lower-right corner.
-
In the csi-provisioner Parameters dialog box, set the FeatureGate field to
EnableVolumeGroupSnapshots=trueand click OK. If other feature gates are already enabled, append the new gate:xxxxxx=true,yyyyyy=false,EnableVolumeGroupSnapshots=true.
Step 2: Deploy a MySQL StatefulSet
This example uses a two-replica MySQL StatefulSet. Each replica mounts a 20 GiB ESSD disk as a PVC, giving you two disks to snapshot as a group.
-
Create
mysql.yamlwith the following content: <details><summary>mysql.yaml</summary>apiVersion: apps/v1 kind: StatefulSet metadata: name: mysql spec: selector: matchLabels: app: mysql serviceName: "mysql" replicas: 2 template: metadata: labels: app: mysql spec: containers: - name: mysql image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/mysql:8.0.30-8.6 env: - name: MYSQL_ROOT_PASSWORD valueFrom: secretKeyRef: name: mysql-pass key: password imagePullPolicy: IfNotPresent volumeMounts: - name: data mountPath: /var/lib/mysql # MySQL data directory subPath: mysql volumeClaimTemplates: - metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "alicloud-disk-essd" resources: requests: storage: 20Gi --- apiVersion: v1 kind: Secret metadata: name: mysql-pass type: Opaque data: password: MTIzNDU2 # Base64-encoded value of "123456" username: cm9vdA== # Base64-encoded value of "root"</details>
-
Deploy the StatefulSet:
kubectl apply -f mysql.yaml -
Write test data to both pod replicas. Log on to the
mysql-0pod and insert a row:kubectl exec -it mysql-0 -- bash # Open a shell in the pod mysql -uroot -p123456 # Connect to MySQLCREATE DATABASE test; USE test; CREATE TABLE scores ( name VARCHAR(50) NOT NULL, score INT AUTO_INCREMENT PRIMARY KEY ); INSERT INTO scores(name, score) VALUES ("Amy", 95); SELECT * FROM scores;Expected output:
+------+-------+ | name | score | +------+-------+ | Amy | 95 | +------+-------+Repeat the same steps for the
mysql-1pod.
Step 3: Create group snapshots
Create a VolumeGroupSnapshotClass
A default VolumeGroupSnapshotClass named alibabacloud-disk-group-snapshot is already available in the cluster. To use a custom class, create group-snapshot-class-demo.yaml:
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshotClass
metadata:
name: group-snapshot-class-demo
deletionPolicy: Delete
driver: diskplugin.csi.alibabacloud.com
The deletionPolicy field controls what happens to the underlying ECS snapshot-consistent group when you delete the VolumeGroupSnapshot:
| Value | Behavior |
|---|---|
Delete |
Deletes the VolumeGroupSnapshotContent and the ECS snapshot-consistent group |
Retain |
Keeps the VolumeGroupSnapshotContent and the ECS snapshot-consistent group |
Apply the class:
kubectl apply -f group-snapshot-class-demo.yaml
Create a VolumeGroupSnapshot
Create group-snapshot-demo.yaml to snapshot both PVCs mounted to the MySQL application:
apiVersion: groupsnapshot.storage.k8s.io/v1alpha1
kind: VolumeGroupSnapshot
metadata:
name: group-snapshot-demo
namespace: default
spec:
source:
selector:
matchLabels:
app: mysql # Selects all PVCs with this label
volumeGroupSnapshotClassName: group-snapshot-class-demo
The source.selector field uses a label selector to identify the PVCs to back up. PVCs created from a StatefulSet's volumeClaimTemplates are automatically labeled with the StatefulSet's pod labels. If you created your PVCs manually, add the corresponding labels before running this step.
Apply the VolumeGroupSnapshot:
kubectl apply -f group-snapshot-demo.yaml
Step 4: Verify the group snapshots
-
Watch the VolumeGroupSnapshot until
READYTOUSEchanges fromfalsetotrue:kubectl get vgs group-snapshot-demo -w -
Inspect the snapshot details to see which PVC maps to which VolumeSnapshot:
Field Description Bound Volume Group Snapshot Content NameThe VolumeGroupSnapshotContent bound to this VolumeGroupSnapshot Persistent Volume Claim RefThe name of the backed-up PVC Volume Snapshot RefThe VolumeSnapshot created for that PVC kubectl describe vgs group-snapshot-demoExpected output (abbreviated):
Status: Bound Volume Group Snapshot Content Name: groupsnapcontent-adcef6ef-811a-4e9d-ba51-3927caxxxxxx Creation Time: 2024-11-27T06:02:56Z Pvc Volume Snapshot Ref List: Persistent Volume Claim Ref: Name: disk-mysql-0 Volume Snapshot Ref: Name: snapshot-1c2c5bcaf47ee2bffcc5b2f52dff65a4aacaaea38032c05d75acd536f7xxxxxx-2024-11-27-6.4.7 Persistent Volume Claim Ref: Name: disk-mysql-1 Volume Snapshot Ref: Name: snapshot-37a2fbf634d68cd2103f261313c2ed781fbd2bd52b5a0d0e0c0ef7c339xxxxxx-2024-11-27-6.4.9 -
Confirm both VolumeSnapshots are ready:
kubectl get volumesnapshotBoth snapshots should show
READYTOUSE: true. -
Get the ECS snapshot-consistent group ID:
kubectl describe vgsc groupsnapcontent-adcef6ef-811a-4e9d-ba51-3927caxxxxxxExpected output (abbreviated):
Volume Group Snapshot Handle: ssg-2zeg72d1qym6vnxxxxxxTo view the snapshot-consistent group in the ECS console, go to the ECS console, choose Storage & Snapshots > Snapshots in the left-side navigation pane, click the Snapshot-consistent Groups tab, and search for
ssg-2zeg72d1qym6vnxxxxxx.
Step 5: Restore disk volumes from snapshots
Two restore options are available — restore a single volume manually or restore all volumes in one batch using a script.
Option 1: Restore a single volume
The following example restores the disk volume attached to the disk-mysql-0 pod.
Create disk-mysql-0-copy.yaml. The dataSource field tells CSI to provision a new disk from the specified VolumeSnapshot:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: disk-mysql-0-copy
spec:
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: "alicloud-disk-essd"
dataSource:
name: snapshot-1c2c5bcaf47ee2bffcc5b2f52dff65a4aacaaea38032c05d75acd536f7xxxxxx-2024-11-27-6.4.7 # [1]
kind: VolumeSnapshot # [2]
apiGroup: snapshot.storage.k8s.io # [3]
-
[1] The name of the VolumeSnapshot that backs up
disk-mysql-0(from thekubectl describe vgsoutput above). -
[2] The resource kind — always
VolumeSnapshotwhen restoring from a snapshot. -
[3] The API group for the CSI snapshot resources.
Apply the PVC:
kubectl apply -f disk-mysql-0-copy.yaml
Verify the PVC is bound:
kubectl get pvc disk-mysql-0-copy
Expected output:
disk-mysql-0-copy Bound d-2ze0iwwqg0s6b0xxxxxx 20Gi RWO alicloud-disk-essd <unset> 69s
To confirm the disk was created from the snapshot, go to the ECS console, choose Storage & Snapshots > Block Storage in the left-side navigation pane, click the Cloud Disk tab, and search for d-2ze0iwwqg0s6b0xxxxxx. Open the disk details page to see that it was created from a snapshot.
Option 2: Restore all volumes in one batch
Use the generate_pvc.sh script to automatically generate PVC manifests from a VolumeGroupSnapshot and apply them all at once.
-
Install the
jqcommand-line tool:-
CentOS:
yum install jq -
Ubuntu:
apt-get install jq
-
-
Scale the MySQL StatefulSet to zero replicas to stop all writes before restoring:
kubectl scale sts mysql --replicas=0 -
Delete the existing PVCs:
kubectl delete pvc data-mysql-0 data-mysql-1 -
Create
generate_pvc.shwith the following content: <details><summary>generate_pvc.sh</summary>Parameter Description Example 1 Namespace of the VolumeGroupSnapshot default2 Name of the VolumeGroupSnapshot group-snapshot-demo3 StorageClass name for the restored PVCs alicloud-disk-essd4 Disk capacity 20Gi5 Path to the kubeconfig file .kube/config6 Output path for the generated PVC YAML ./output.yaml#!/bin/bash # Input parameters NAMESPACE=$1 # Namespace of the VolumeGroupSnapshot VGS_NAME=$2 # Name of the VolumeGroupSnapshot STORAGE_CLASS_NAME=$3 # StorageClass for the restored PVCs CAPACITY=$4 # Disk capacity (e.g., 20Gi) KUBECONFIG_PATH=$5 # Path to the kubeconfig file OUTPUT_FILE=$6 # Output path for the generated YAML # Get VolumeGroupSnapshot details VGS_INFO=$(kubectl --kubeconfig=${KUBECONFIG_PATH} -n ${NAMESPACE} get vgs ${VGS_NAME} -o json) # Validate that the PVC-snapshot mapping exists if ! echo ${VGS_INFO} | jq -e '.status.pvcVolumeSnapshotRefList' &>/dev/null; then echo "Error: .status.pvcVolumeSnapshotRefList not found in VolumeGroupSnapshot." exit 1 fi # Extract PVC names and their corresponding snapshot names PVCS=($(echo ${VGS_INFO} | jq -r '.status.pvcVolumeSnapshotRefList[].persistentVolumeClaimRef.name')) SNAPSHOTS=($(echo ${VGS_INFO} | jq -r '.status.pvcVolumeSnapshotRefList[].volumeSnapshotRef.name')) # Clear the output file > ${OUTPUT_FILE} # Generate one PVC manifest per disk for i in "${!PVCS[@]}"; do PVC_NAME=${PVCS[$i]} SNAPSHOT_NAME=${SNAPSHOTS[$i]} cat <<EOF >> ${OUTPUT_FILE} --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: ${PVC_NAME} spec: volumeMode: Filesystem accessModes: - ReadWriteOnce resources: requests: storage: ${CAPACITY} storageClassName: "${STORAGE_CLASS_NAME}" dataSource: name: ${SNAPSHOT_NAME} kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io EOF done echo "PVC YAML files have been written to ${OUTPUT_FILE}"</details> The script accepts six positional parameters:
-
Run the script to generate the PVC manifests:
bash generate_pvc.sh default group-snapshot-demo alicloud-disk-essd 20Gi .kube/config ./output.yamlThe generated
output.yamllooks similar to:--- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: disk-mysql-0 spec: volumeMode: Filesystem accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: "alicloud-disk-essd" dataSource: name: snapshot-1c2c5bcaf47ee2bffcc5b2f52dff65a4aacaaea38032c05d75acd536f7adb850-2024-11-27-6.4.7 kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: disk-mysql-1 spec: volumeMode: Filesystem accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: "alicloud-disk-essd" dataSource: name: snapshot-37a2fbf634d68cd2103f261313c2ed781fbd2bd52b5a0d0e0c0ef7c3396fea1c-2024-11-27-6.4.9 kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.ioEdit
output.yamlif needed, then apply it:kubectl apply -f output.yaml -
Scale the StatefulSet back to two replicas:
kubectl scale sts mysql --replicas=2 -
After the pods restart, verify that the data is restored:
kubectl exec -it mysql-0 -- bash # Open a shell in the pod mysql -uroot -p123456 # Connect to MySQL use test; select * from scores;Expected output:
+------+-------+ | name | score | +------+-------+ | Amy | 95 | +------+-------+