Cloud Paralleled File System (CPFS) is a type of parallel file system provided by Alibaba Cloud. CPFS stores data across multiple data nodes in a cluster and allows data to be simultaneously accessed by multiple clients. Therefore, CPFS can provide data storage services with high input/output operations per second (IOPS), high throughput, and low latency for large, high-performance computing clusters. This topic describes how to configure a shared CPFS volume.
Background information
Data scientists may need to keep their data or retrieve the same copy of training data. To meet this requirement, we recommend that you configure a shared NAS volume and mount it to the runtime where you use Arena to submit jobs. This prevents data loss because code and data stored in the shared NAS volume are not deleted with containers.
You can allow the developers in a team to share the same storage pool. If you have declared a shared NAS volume and specified the path that is used to mount the volume, the shared NAS volume is automatically mounted to the specified path. After the shared NAS volume is mounted, you can use data and code stored in the volume by specifying the --data parameter each time you submit a job.
In Kubernetes, storage volumes are declared by using persistent volumes (PVs) and persistent volume claims (PVCs). As the administrator of a Kubernetes cluster, you must create a PVC for each data scientist in your team. For example, User A and User B can mount their PVs to the same NAS volume or Cloud Paralleled File System (CPFS). However, the PVs must be mounted to different subdirectories to isolate the runtimes of User A and User B.
Step 1: Create a CPFS file system
When you create a CPFS file system, set the region, Virtual Private Cloud (VPC) network and VSwitches to those of the target cluster of Alibaba Cloud Container Service for Kubernetes (ACK).