Use multi-attach to share a single ESSD, ESSD AutoPL, or other NVMe-capable cloud disk across up to 16 ECS instances in the same zone simultaneously — or share a zone-redundant storage ESSD across nodes in the same region. Combined with NVMe Persistent Reservation (PR), multi-attach gives your workloads shared storage access with precise write-permission control, enabling efficient data sharing and fast failover in an ACK cluster.
Use cases
-
Data sharing: After one node writes data to a shared NVMe disk, all other attached nodes can read it immediately. A single container image stored on an NVMe disk can be loaded by multiple instances running the same OS, reducing storage costs and improving read/write performance.
-
High-availability failover: Traditional clustered databases — including Oracle Real Application Clusters (RAC), SAP High-performance ANalytic Appliance (HANA), and cloud-native high-availability (HA) databases — are vulnerable to single points of failure (SPOFs). A shared NVMe disk keeps storage accessible when a compute node fails. Deploy your workload in primary/secondary mode: when the primary instance fails, run an NVMe PR command to revoke its write permissions, then promote the secondary instance. This prevents split-brain writes and ensures data consistency. The failover sequence:
-
The primary database instance (Database Instance 1) fails and stops serving traffic.
-
Run an NVMe PR command to block writes to Database Instance 1 and grant write access to Database Instance 2.
-
Restore Database Instance 2 to the same state as Database Instance 1 (for example, by replaying logs).
-
Database Instance 2 takes over as the primary instance.
PR is part of the NVMe specification. It controls read and write permissions at the disk level to ensure compute nodes write data as expected. For details, see NVM Express Base Specification.
-
-
Distributed data cache acceleration: Data lakes built on Object Storage Service (OSS) offer high append-write throughput but suffer from high latency and low random read/write performance. Attach a high-speed, multi-attach-enabled cloud disk as a shared cache layer across compute nodes to significantly improve access performance.
-
Machine learning: After sample data is labeled and written, distribute it across nodes for parallel training without copying data over the network. Each compute node reads directly from the shared disk, reducing transfer latency and accelerating large-scale model training.
Billing
The multi-attach feature does not incur additional fees. Resources that support the NVMe protocol are billed based on their original billing methods. For cloud disk pricing, see Elastic Block Storage volumes.
Limitations
-
A single NVMe cloud disk can be attached to a maximum of 16 ECS instances in the same zone at the same time.
-
To read and write to a cloud disk from multiple nodes concurrently, mount the cloud disk using
volumeDevices. This mounts the disk as a block device and does not support file system access. UsevolumeMode: BlockandaccessModes: ReadWriteManyin your PersistentVolumeClaim (PVC). -
For the full list of limits, see Limits of the multi-attach feature.
Prerequisites
Before you begin, ensure that you have:
-
An ACK managed cluster running Kubernetes 1.20 or later. To create one, see Create an ACK managed cluster.
-
csi-plugin and csi-provisioner at version v1.24.10-7ae4421-aliyun or later. To upgrade, see Manage the csi-plugin and csi-provisioner components.
-
At least two nodes in the same zone that support the multi-attach feature. For supported instance families, see Limits of the multi-attach feature.
-
A containerized application that meets both of the following requirements:
-
Supports concurrent access to the same cloud disk from multiple replicas.
-
Ensures data consistency using NVMe Reservation or an equivalent mechanism.
-
For background reading, see:
Application example
The following example application demonstrates lease-based leader election over a shared NVMe block device. Multiple replicas compete for a lease written directly to the disk. Only one replica holds the lease at a time — if it stops refreshing, another replica preempts it using NVMe Reservation commands.
Key design notes:
-
O_DIRECTis used to open the block device, bypassing the page cache and ensuring reads reflect what was actually written to disk. -
The example uses the Linux kernel's simplified Reservation interface (
<linux/pr.h>ioctls). Alternatives that require elevated privileges:-
C:
ioctl(fd, NVME_IOCTL_IO_CMD, &cmd); -
CLI:
nvme-cli
-
-
For the full NVMe Reservation specification, see NVMe Specification.
Step 1: Deploy the application and configure multi-attach
Create a StorageClass that enables multi-attach, a PVC configured as a block device, and a StatefulSet that uses the lease application image.
-
Create a file named
lease.yamlwith the following content. Replace the container image address with your actual image address.Important-
NVMe Reservation takes effect at the node level. If multiple pods run on the same node, they can interfere with each other. This example uses
podAntiAffinityto prevent that. -
If your cluster has nodes that do not use the NVMe protocol, configure node affinity to restrict scheduling to NVMe-capable nodes.
The following table summarizes the key differences between multi-attach and standard mounting configurations:
Resource Field Multi-attach Standard mounting StorageClass parameters.multiAttach"true"Not required PVC accessModesReadWriteManyReadWriteOncePVC volumeModeBlockFilesystemVolume mounting Method volumeDevices— direct block device accessvolumeMounts— file system mount -
-
Deploy the application:
kubectl apply -f lease.yaml
Step 2: Verify multi-attach and Reservation
Verify that multiple nodes can read and write to the same disk
Run the following command to view pod logs:
kubectl logs -l app=lease-test --prefix -f
Expected output:
[pod/lease-test-0/lease] Register as key 4745d0c5cd9a2fa4
[pod/lease-test-0/lease] Refreshed lease
[pod/lease-test-0/lease] Refreshed lease
[pod/lease-test-1/lease] Remote lease-test-0 refreshed lease
[pod/lease-test-0/lease] Refreshed lease
[pod/lease-test-1/lease] Remote lease-test-0 refreshed lease
[pod/lease-test-0/lease] Refreshed lease
[pod/lease-test-1/lease] Remote lease-test-0 refreshed lease
[pod/lease-test-0/lease] Refreshed lease
[pod/lease-test-1/lease] Remote lease-test-0 refreshed lease
lease-test-1 immediately reads the data written by lease-test-0, confirming multi-attach is working.
Verify that NVMe Reservation is active
-
Get the cloud disk ID:
kubectl get pvc data-disk -ojsonpath='{.spec.volumeName}' -
Log in to either of the two nodes and run the following command. Replace
2zxxxxxxxxxxxwith the portion afterd-in the disk ID from the previous step.nvme resv-report -c 1 /dev/disk/by-id/nvme-Alibaba_Cloud_Elastic_Block_Storage_2zxxxxxxxxxxxExpected output:
NVME Reservation status: gen : 3 rtype : 1 regctl : 1 ptpls : 1 regctlext[0] : cntlid : ffff rcsts : 1 rkey : 4745d0c5cd9a2fa4 hostid : 4297c540000daf4a4*****rtype: 1(write exclusive) andregctl: 1confirm that NVMe Reservation is active.
Verify that Reservation blocks writes from a failed node
-
Log in to the node running
lease-test-0and pause the process to simulate a failure:pkill -STOP -f /usr/local/bin/lease -
Wait 30 seconds, then check the logs:
kubectl logs -l app=lease-test --prefix -fExpected output:
[pod/lease-test-1/lease] Remote lease-test-0 refreshed lease [pod/lease-test-1/lease] Remote is dead, preempting [pod/lease-test-1/lease] Register as key 4745d0c5cd9a2fa4 [pod/lease-test-1/lease] Refreshed lease [pod/lease-test-1/lease] Refreshed lease [pod/lease-test-1/lease] Refreshed leaselease-test-1has preempted the lease and taken over as the primary. -
Resume the paused process on the
lease-test-0node:pkill -CONT -f /usr/local/bin/lease -
Check the logs again:
kubectl logs -l app=lease-test --prefix -fExpected output:
[pod/lease-test-0/lease] failed to write lease: Invalid exchangelease-test-0can no longer write to the disk. TheInvalid exchangeerror confirms that Reservation has successfully blocked the write I/O from the formerly primary node, and the lease container restarts automatically.
What's next
If your NVMe cloud disk runs out of space, see Expand a cloud disk volume.