How to use EFC to accelerate access to NAS file systems - Container Service for Kubernetes

Fluid is an open source Kubernetes-native distributed dataset orchestrator and accelerator for data-intensive applications in cloud-native scenarios. Fluid enables observability, scalability, and access acceleration for datasets by managing and scheduling EFCRuntimes. This topic describes how to use Fluid EFCRuntime to accelerate access to Apsara File Storage NAS (NAS) file systems.

Prerequisites
Introduction to EFC
Use Fluid EFCRuntime to accelerate access to NAS file systems
- Limits
- How Fluid EFCRuntime works
Procedure

Prerequisites

Alibaba Cloud Linux 2 is used as the OS of an Elastic Compute Service (ECS) instance and the kernel version of the OS is 4.19.91-23 or later.
A Container Service for Kubernetes (ACK) Pro cluster that runs Kubernetes 1.18 or later is created. For more information, see Create an ACK Pro cluster.
NAS is activated. A Capacity NAS file system, Performance NAS file system is created.
Note
In AI training scenarios, we recommend that you select NAS file system types based on the throughput that is required by the training jobs. For more information, see How do I select file systems?.
A kubectl client is connected to the ACK Pro cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

Introduction to EFC

Elastic File Client (EFC) is a FUSE-based POSIX client developed by the NAS technical team. You can use EFC to replace the kernel space NFS client. EFC allows you to access data through multiple connections, cache metadata, and cache data in a distributed manner to increase read speeds. EFC also supports performance monitoring based on Managed Service for Prometheus. Compared with the kernel space NFS v3 and v4.x clients and other FUSE-based clients, EFC has the following advantages:

Strong semantic consistency: EFC uses the distributed locking mechanism to ensure strong consistency for files and directories. After you write data to a file, the data can be immediately read by other clients. After you create a file, the file can be immediately accessed by other clients. This advantage allows you to synchronize data among multiple nodes.
Cache reads and writes on individual servers: EFC optimizes the caching logic of FUSE and occupies a small amount of memory on a node to accelerate reads and writes on small files. Compared with traditional NFS clients, EFC improves the performance of cache reads and writes by more than 50%.
Distributed read-only caches: EFC supports distributed read-only caches and uses the memory of multiple nodes to create a cache pool that can automatically scale out to meet the increasing compute demand.
Small file prefetching: EFC prefetches hot data in frequently accessed directories to accelerate access.
Hot updates and failover capabilities: EFC can perform a failover within seconds and perform hot updates for clients without interrupting your services.

Use Fluid EFCRuntime to accelerate access to NAS file systems

Fluid uses custom Kubernetes resources related to Fluid EFCRuntime to interface with EFC. This helps implement dataset observability and scalability.

Limits

The following limits apply to Fluid EFCRuntime:

Fluid EFCRuntime does not support DataLoad cache prefetching. Fluid EFCRuntime caches data only when the data is accessed for the first time.
Fluid EFCRuntime does not expose the caching status of Datasets.
Fluid EFCRuntime is supported only in the following regions: China (Zhangjiakou), China (Beijing), China (Guangzhou), China (Shenzhen), and China (Shanghai).

How Fluid EFCRuntime works

The following figure shows how Fluid EFCRuntime caches data from NAS to the local storage to accelerate data access. The following section describes how Fluid EFCRuntime works:

You can create Datasets and EFCRuntimes to specify information about the source NAS file systems.
Fluid controllers deploy the EFC Cache Worker and EFC FUSE components based on the information about the source NAS file systems.
When you create a pod, you can use a persistent volume claim (PVC) to mount the mount target of a NAS file system exposed by the EFC FUSE client to the pod.
When you access the data in a mounted NAS file system, the EFC FUSE client forwards the request to EFC Cache Worker. EFC Cache Worker checks whether the data is cached in the local storage. If the data is cached in the local storage, you can directly access the cache. If the data is not cached in the local storage, EFC Cache Worker reads the data from the NAS file system and caches the data to the local storage. Then, you can access the cache in the local storage.

Dataset: a CustomResourceDefinition (CRD) defined by Fluid. A Dataset is a collection of logically related data that is used by upper-layer compute engines.
EFCRuntime: a runtime that accelerates access to Datasets. EFCRuntimes use EFC as the caching engine. The EFC caching engines include the EFC Cache Worker and EFC FUSE components.
- EFC Cache Worker: a server-side component that enables caching based on consistent hashing. You can disable this component based on your requirements. After you disable this component, distributed read-only caches are disabled. Other features are not affected.
- EFC FUSE: a client-side component of EFC that exposes data access interfaces through the POSIX protocol.

Procedure

Step 1: Install ack-fluid

Install the cloud-native AI suite and ack-fluid 0.9.10 or later.

Important

If you have already installed open source Fluid, uninstall Fluid and deploy the ack-fluid component.

Method 1: Install ack-fluid when the cloud-native AI suite is not installed

You can enable Fluid data acceleration when you install the cloud-native AI suite. For more information, see Deploy the cloud-native AI suite.

Method 2: Install ack-fluid when the cloud-native AI suite is installed

Log on to the ACK console and click Clusters in the left-side navigation pane.
On the Clusters page, click the name of a cluster and choose Applications > Cloud-native AI Suite in the left-side navigation pane.
On the Cloud-native AI Suite page, find ack-fluid and click Deploy in the Actions column.
In the Install Component message, click Confirm.

Method 3: Update ack-fluid to 0.9.10 or later

Log on to the ACK console and click Clusters in the left-side navigation pane.
On the Clusters page, click the name of a cluster and choose Applications > Cloud-native AI Suite in the left-side navigation pane.
On the Cloud-native AI Suite page, find ack-fluid and click Upgrade in the Actions column.
In the Upgrade Component message, click Confirm.

Step 2: Write data to the NAS file system

Note

If data is already stored in the NAS file system, you can skip this step.

Mount the NAS file system to an ECS instance. For more information, see Mount a file system to an ECS instance by using the NAS console.

Run the following command to query the mount target of the NAS file system:

findmnt /mnt/nfs

Expected output:

TARGET   SOURCE                                         FSTYPE OPTIONS
/mnt/nfs xxxxxxxxxxx-xxxxx.cn-beijing.nas.aliyuncs.com:/ nfs    rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,no

Run the following command to create a file of 10 GB in size in the mount directory of the NAS file system:

dd if=/dev/zero of=/mnt/nfs/allzero-demo count=1024 bs=10M

Expected output:

1024+0 records in
1024+0 records out
10737418240 bytes (11 GB) copied, 50.9437 s, 211 MB/s

Step 3: Create a Dataset and an EFCRuntime

Create a file named dataset.yaml and copy the following content to the file:

Sample template of a NAS file system:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: efc-demo
spec:
  mounts:
    - mountPoint: "nfs://<nas_url>:<nas_dir>"
      name: efc
      path: "/"
---
apiVersion: data.fluid.io/v1alpha1
kind: efcRuntime
metadata:
  name: efc-demo
spec:
  replicas: 3
  master:
    networkMode: ContainerNetwork
  worker:
    networkMode: ContainerNetwork
  fuse:
    networkMode: ContainerNetwork
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 15Gi

The dataset.yaml file is used to create a Dataset and an EFCRuntime.

The Dataset specifies information about the NAS file system, such as the URL of the NAS file system and the directory that you want to mount.
The EFCRuntime launches an EFC caching system to provide caching services. You can specify the number of replicated pods for the worker component of the EFC caching system and the cache capacity of each worker component.

Parameter	Description
placement	Valid values: Shared and Exclusive. Shared: allows multiple pods of the EFC Cache Worker component to run on one ECS instance. This way, the EFCRuntime can fully utilize the bandwidth of the ECS instance and accelerate data access. We recommend that you set the value to Shared. Exclusive: ensures that an ECS instance can run at most one pod of the EFC Cache worker component.
mountPoint	If you use a NAS file system, set the value in the `nfs://<nas_url>:<nas_dir>` format: nas_url: the URL of the NAS file system. Log on to the NAS console and choose File System > File System List in the left-side navigation pane. On the File System List page, find the NAS file system that you want to mount and click Manage in the Actions column. On the page that appears, click Mount Targets to obtain the URL of the NAS file system. For more information, see Manage mount targets. nas_dir: the subdirectory that you want to mount. In most cases, you can set the value to the root directory. For example, a value of `efc://xxxxxxxxxxx-xxxxx.cn-beijing.nas.aliyuncs.com:/` specifies the root directory of a NAS file system.
replicas	The number of replicated pods that are created for the worker component of the EFC caching system. You can set the value based on the memory size of the compute node and the size of the dataset. We recommend that you ensure that the product of the value of quota and the value of replicas is greater than the size of the dataset.
networkMode	Valid values: ContainerNetwork and HostNetwork. In the ACK environment, we recommend that you set the value to ContainerNetwork. This network mode does not compromise the network performance.
mediumtype	The cache type. Valid values: HDD, SSD, and MEM. A value of MEM indicates memory. In AI training scenarios, we recommend that you set the value to MEM. If you set the value to MEM, you must set the path parameter to a memory file system, such as tmpfs.
path	The cache directory in the pods of the worker components of the EFC caching system. We recommend that you set the value to /dev/shm.
quota	The cache capacity of each worker component. You can set the value based on the memory size of the compute node and the size of the dataset. We recommend that you ensure that the product of the value of quota and the value of replicas is greater than the size of the dataset.

Run the following command to create an EFCRuntime and a Dataset:
```
kubectl create -f resource.yaml
```
Run the following command to check whether the Dataset is deployed:
```
kubectl get dataset efc-demo
```
Expected output:
```
NAME       UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
efc-demo                                                                  Bound   24m
```
The Dataset is in the Bound state. This indicates that the EFC caching system runs as expected in the cluster and application pods can access the data provided by the Dataset.
Run the following command to check whether the EFCRuntime is deployed:
```
kubectl get efcruntime
```
Expected output:
```
NAME       MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
efc-demo   Ready          Ready          Ready        27m
```
The output shows that the master, worker, FUSE components are in the Ready state.

Run the following commands to check whether the persistent volume (PV) and PVC are created:

After the Dataset and the EFC caching system are ready, Fluid automatically creates a PVC and a PV.

kubectl get pv,pvc

Expected output:

NAME                                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS   REASON   AGE
persistentvolume/default-efc-demo   100Gi      ROX            Retain           Bound    default/efc-demo         fluid                   94m

NAME                                   STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/efc-demo         Bound    default-efc-demo   100Gi      ROX            fluid          94m

Step 4: Create an application to access data

Create an application to check whether access to data is accelerated. In this example, an application that provisions two pods is created and used to access the NAS file system multiple times from two nodes. You can evaluate the acceleration performance of Fluid EFCRuntime based on the time required for accessing data.

Create a file named app.yaml and copy the following content to the file:

The following content defines a StatefulSet named efc-app. The StatefulSet provisions two pods that each has the ac-demo PVC mounted to the /data directory.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: efc-app
  labels:
    app: nginx
spec:
  serviceName: nginx
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        command: ["/bin/bash"]
        args: ["-c", "sleep inf"]
        volumeMounts:
        - mountPath: "/data"
          name: data-vol
      volumes:
        - name: data-vol
          persistentVolumeClaim:
            claimName: efc-demo

Run the following command to create a StatefulSet named efc-app:
```
kubectl create -f app.yaml
```
Run the following command to query the size of the specified file:
```
kubectl exec -it efc-app-0 -- du -h /data/allzero-demo
```
Expected output:
```
10G     /data/allzero-demo
```
Query the time required for reading the specified file from the application.
Note
The time and throughput may vary based on the runtime environment and measuring method. In this topic, the cluster has three ECS instances of the ecs.g7ne.8xlarge type. The efc-demo EFCRuntime has three worker pods that run on the same ECS instance. The efc-app StatefulSet has two pods that separately run on the other two ECS instances. The time required for data access is not affected by the kernel cache of the node on which the EFC FUSE client runs.
1. Run the following command to check the time required for reading the specified file from the efc-app-0 pod of the StatefulSet:
  Note
  If you want to read another file, replace /data/allzero-demo with the path of the file.
```
kubectl exec -it efc-app-0 -- bash -c "time cat /data/allzero-demo > /dev/null"
```
  Expected output:
```
real    0m15.792s
user    0m0.023s
sys     0m2.404s
```
  The output shows that 15.792 seconds is required for reading a file of 10 GB in size and the read speed is 648 MiB/s.
2. Run the following command to check the time required for reading the same file of 10 GB in size from the other pod of the StatefulSet:
  Note
  If you want to read another file, replace /data/allzero-demo with the path of the file.
```
kubectl exec -it efc-app-1 -- bash -c "time cat /data/allzero-demo > /dev/null"
```
  Expected output:
```
real    0m9.970s
user    0m0.012s
sys     0m2.283s
```
  The output shows that 9.970 seconds is required for reading a file of 10 GB in size and the read speed is 1,034.3 MiB/s.
After Fluid EFCRuntime is used, the read speed increases from 648 MiB/s to 1,034.3 MiB/s. The read speed increases by about 100%.

Container Service for Kubernetes:Use EFC to accelerate access to NAS

Table of contents

Prerequisites

Introduction to EFC

Use Fluid EFCRuntime to accelerate access to NAS file systems

Limits

How Fluid EFCRuntime works

Procedure

Step 1: Install ack-fluid

Method 1: Install ack-fluid when the cloud-native AI suite is not installed

Method 2: Install ack-fluid when the cloud-native AI suite is installed

Method 3: Update ack-fluid to 0.9.10 or later

Step 2: Write data to the NAS file system

Step 3: Create a Dataset and an EFCRuntime

Sample template of a NAS file system:

Step 4: Create an application to access data