Non-volatile memory (NVM) is a type of persistent memory (PMEM) product that is provided by Intel. You can use NVM to expand the memory capacity at lower costs and access persistent data with lower latency. NVM provides the benefits of memory and storage products. This topic describes how to use NVM volumes in Container Service for Kubernetes (ACK) clusters and provides examples.

Background information

PMEM provides high-performance memory that supports data persistence. PMEM resides on the memory bus and allows you to access data in the same way as when you use dynamic random access memory (DRAM). PMEM provides almost the same speed and latency as DRAM and the non-volatility of NAND flash. PMEM provides the following benefits:

  • Lower latency than flash SSDs when you access data.
  • Higher throughput than flash storage.
  • Lower costs than DRAM.
  • Data caching. This resolves the issue that data transmitted through Peripheral Component Interconnect Express (PCIe) cannot be cached in the CPU.
  • Real-time access to data and ultra-high-speed access to large datasets.
  • Data is retained in memory after the machine is powered off. This provides the same benefit as flash memory.

The re6p Elastic Compute Service (ECS) instance family supports the first generation of PMEM and the re7p ECS instance family supports the second generation of PMEM.

The following ECS instance families support the first generation of PMEM:

How to use NVM volumes

You can use the Container Storage Interface (CSI) driver that is provided by Alibaba Cloud to manage the lifecycle of NVM devices in ACK clusters. This allows you to allocate, mount, and use NVM resources by using declarative claims.

You can use NVM volumes in ACK clusters by using one of the following methods:
  • PMEP-LVM (use NVM as non-intrusive block storage)

    You can directly claim NVM resources without the need to modify your applications. You can use Logical Volume Manager (LVM) to virtualize PMEM resources on a node into volume groups (VGs). Then, you can create persistent volume claims (PVCs) of the required type and capacities. You can use NVM without the need to modify the following types of applications: serverless applications, low-latency and high-throughput data computing applications, and short-CI/CD period applications that require high-speed temporary storage. This allows you to improve the I/O throughput by 2 to 10 times. For more examples, see Use AEP non-volatile memory to improve read and write performance.

  • PMEM-direct memory

    You can use PMEM as direct memory by making a specific number of modifications to the memory allocation functions. This allows you to access data in a similar way as using DRAM. This way, you can provision NVM resources as direct memory at TB-level and reduce 30% to 50% of the cost. This meets the requirements of in-memory databases such as Redis and SAP HANA in terms of large memory and cost-effectiveness. For more examples, see Deploy a Redis instance that has an NVM volume mounted as direct memory.

Note
  • PMEP-LVM: NVM resources can be used as block storage or file systems in ACK clusters without intrusion or modification to your applications. The I/O throughput is 2 to 10 times higher than SSDs.
  • PMEM-direct memory: NVM resources can be used as direct memory in ACK clusters. You must modify the applications so that they are adaptive to the logic in PMEM SDK for memory allocation. This offers high throughput and low latency that are comparable to DRAM.
Table 1. Comparison between PMEM and SSD
MethodSupport for fragmented storageSupport for online expansionSupport for memory persistenceSupport for application modificationLatency (4K/RW)Throughput (4K/RW)Maximum capacity of a single ECS instance (ecs.ebmre6p.26xlarge)
PMEM-LVMNoYesYesNo10 us10W1536 GB
PMEM-DirectYesNoNoYes1.2 us56W768 GB
SSDNoYesYesNo100 us1W32 TB

Deploy CSI components

To use NVM in ACK clusters, you must deploy the following components:
  • CSI-Plugin: initializes PMEM devices and creates, deletes, mounts, and unmounts volumes.
  • CSI-Provisioner: detects and initiates volume creation and deletion requests.
  • CSI-Scheduler: schedules storage (The ACK scheduler is a preinstalled component).
Note
When you deploy CSI-Plugin, take note of the following limits:
  • To enable automatic O&M for NVM devices, you must add the pmem.csi.alibabacloud.com label to the node that uses NVM.
  • To use the PMEP-LVM method, you must add the pmem.csi.alibabacloud.com: lvm label to the node that uses NVM.
  • To use the PMEM-direct memory method, you must add the pmem.csi.alibabacloud.com: direct label to the node that uses NVM.
  1. Create an ACK cluster.
    Create an ACK cluster that contains ECS instances with PMEM resources. For example, create an ACK cluster that contains ECS instances of ecs.ebmre6p.26xlarge. For more information, see Create an ACK managed cluster.
  2. Configure the node to use PMEM resources.
    To ensure that the CSI plug-in works as expected, you must add the required label to the node.
    Add the following label to the node:
    pmem.csi.alibabacloud.com/type: direct
    You can also add the following label to the node:
    pmem.csi.alibabacloud.com/type: lvm
  3. Deploy the CSI plug-in for PMEM.
    apiVersion: storage.k8s.io/v1
    kind: CSIDriver
    metadata:
      name: localplugin.csi.alibabacloud.com
    spec:
      attachRequired: false
      podInfoOnMount: true
    ---
    kind: DaemonSet
    apiVersion: apps/v1
    metadata:
      name: csi-local-plugin
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          app: csi-local-plugin
      template:
        metadata:
          labels:
            app: csi-local-plugin
        spec:
          tolerations:
            - operator: Exists
          serviceAccount: admin
          priorityClassName: system-node-critical
          hostNetwork: true
          hostPID: true
          containers:
            - name: driver-registrar
              image: registry.cn-hangzhou.aliyuncs.com/acs/csi-node-driver-registrar:v1.3.0-6e9fff3-aliyun
              imagePullPolicy: Always
              args:
                - "--v=5"
                - "--csi-address=/csi/csi.sock"
                - "--kubelet-registration-path=/var/lib/kubelet/csi-plugins/localplugin.csi.alibabacloud.com/csi.sock"
              env:
                - name: KUBE_NODE_NAME
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: spec.nodeName
              volumeMounts:
                - name: plugin-dir
                  mountPath: /csi
                - name: registration-dir
                  mountPath: /registration
    
            - name: csi-localplugin
              securityContext:
                privileged: true
                capabilities:
                  add: ["SYS_ADMIN"]
                allowPrivilegeEscalation: true
              image: registry.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.20.6-2be29b1-aliyun 
              imagePullPolicy: "Always"
              args :
                - "--endpoint=$(CSI_ENDPOINT)"
                - "--v=5"
                - "--nodeid=$(KUBE_NODE_NAME)"
                - "--driver=localplugin.csi.alibabacloud.com"
              env:
                - name: KUBE_NODE_NAME
                  valueFrom:
                    fieldRef:
                      apiVersion: v1
                      fieldPath: spec.nodeName
                - name: CSI_ENDPOINT
                  value: unix://var/lib/kubelet/csi-plugins/localplugin.csi.alibabacloud.com/csi.sock
              volumeMounts:
                - name: pods-mount-dir
                  mountPath: /var/lib/kubelet
                  mountPropagation: "Bidirectional"
                - mountPath: /dev
                  mountPropagation: "HostToContainer"
                  name: host-dev
                - mountPath: /var/log/
                  name: host-log
          volumes:
            - name: plugin-dir
              hostPath:
                path: /var/lib/kubelet/csi-plugins/localplugin.csi.alibabacloud.com
                type: DirectoryOrCreate
            - name: registration-dir
              hostPath:
                path: /var/lib/kubelet/plugins_registry
                type: DirectoryOrCreate
            - name: pods-mount-dir
              hostPath:
                path: /var/lib/kubelet
                type: Directory
            - name: host-dev
              hostPath:
                path: /dev
            - name: host-log
              hostPath:
                path: /var/log/
      updateStrategy:
        rollingUpdate:
          maxUnavailable: 10%
        type: RollingUpdate
    	  
    kind: Deployment
    apiVersion: apps/v1
    metadata:
      name: csi-local-provisioner
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          app: csi-local-provisioner
      replicas: 2
      template:
        metadata:
          labels:
            app: csi-local-provisioner
        spec:
          tolerations:
          - operator: "Exists"
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 1
                preference:
                  matchExpressions:
                  - key: node-role.kubernetes.io/master
                    operator: Exists
          priorityClassName: system-node-critical
          serviceAccount: admin
          hostNetwork: true
          containers:
            - name: external-local-provisioner
              image: registry.cn-hangzhou.aliyuncs.com/acs/csi-provisioner:v1.6.0-b6f763a43-ack
              args:
                - "--csi-address=$(ADDRESS)"
                - "--feature-gates=Topology=True"
                - "--volume-name-prefix=disk"
                - "--strict-topology=true"
                - "--timeout=150s"
                - "--extra-create-metadata=true"
                - "--enable-leader-election=true"
                - "--leader-election-type=leases"
                - "--retry-interval-start=500ms"
                - "--v=5"
              env:
                - name: ADDRESS
                  value: /socketDir/csi.sock
              imagePullPolicy: "Always"
              volumeMounts:
                - name: socket-dir
                  mountPath: /socketDir
            - name: external-local-resizer
              image: registry.cn-hangzhou.aliyuncs.com/acs/csi-resizer:v0.3.0
              args:
                - "--v=5"
                - "--csi-address=$(ADDRESS)"
                - "--leader-election"
              env:
                - name: ADDRESS
                  value: /socketDir/csi.sock
              imagePullPolicy: "Always"
              volumeMounts:
                - name: socket-dir
                  mountPath: /socketDir/
          volumes:
            - name: socket-dir
              hostPath:
                path: /var/lib/kubelet/csi-plugins/localplugin.csi.alibabacloud.com
                type: DirectoryOrCreate
    	  
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
        name: csi-pmem-direct
    provisioner: localplugin.csi.alibabacloud.com
    mountOptions:
    - dax
    parameters:
        volumeType: PMEM
        pmemType: "direct"
    reclaimPolicy: Delete
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true
    
    ---
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
        name: pmem-lvm
    provisioner: localplugin.csi.alibabacloud.com
    mountOptions:
    - dax
    parameters:
        volumeType: PMEM
        nodeAffinity: "true"
        pmemType: "lvm"
    reclaimPolicy: Delete
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true
    	  	  
    	  

Examples

Use AEP as block storage volumes

  1. Create a PVC with the following template:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      annotations:
        volume.kubernetes.io/selected-node: cn-zhangjiakou.192.168.XX.XX
      name: pmem-lvm
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: pmem-lvm

    To schedule the PVC to a specific NVM node, add the following annotation to the PVC configurations: annotations: volume.kubernetes.io/selected-node.

  2. Deploy a workload with the following template:
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: sts-lvm
      labels:
        app: busybox-lvm
    spec:
      selector:
        matchLabels:
          app: busybox-lvm
      serviceName: "busybox"
      template:
        metadata:
          labels:
            app: busybox-lvm
        spec:
          containers:
          - name: busybox
            image: busybox
            command: ["sh", "-c"]
            args: ["sleep 10000"]
            volumeMounts:
              - name: pmem-pvc
                mountPath: "/data"
          volumes:
            - name: pmem-pvc
              persistentVolumeClaim:
                claimName: pmem-lvm
  3. View the results.
    • Run the following command to query the created PVC:
      kubectl get pvc 

      Expected output:

      NAME               STATUS    VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS              AGE
      pmem-lvm           Bound    disk-****   10Gi       RWO            pmem-lvm                  10m
    • Run the following command to query the created pod:
      kubectl get pod

      Expected output:

      NAME                                READY   STATUS    RESTARTS   AGE
      sts-lvm-0                           1/1     Running   0          10m
  4. Run the following command to access the application and check the mount path of the volume:
    kubectl exec -ti sts-lvm-0 -- df /data

    Expected output:

    Filesystem                            1K-blocks  Used   Available Use% Mounted on
    /dev/mapper/pmemvgregion0-disk--****  10255636   36888  10202364  1%   /data

    The output shows that a block storage volume is created and mounted to the application pod.

Use NVM as direct memory volumes

  1. Create a PVC with the following template:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      annotations:
        volume.kubernetes.io/selected-node: cn-zhangjiakou.192.168.XX.XX
      name: pmem-direct
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 9Gi
      storageClassName: pmem-direct

    To schedule the PVC to a specific NVM node, add the following annotation to the PVC configurations: annotations: volume.kubernetes.io/selected-node.

  2. Deploy a workload with the following template:
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: sts-direct
      labels:
        app: busybox-direct
    spec:
      selector:
          matchLabels:
            app: busybox-direct
      serviceName: "busybox"
      template:
        metadata:
          labels:
            app: busybox-direct
        spec:
          containers:
          - name: busybox
            image: busybox
            command: ["sh", "-c"]
            args: ["sleep 1000"]
            volumeMounts:
              - name: pmem-pvc
                mountPath: "/data"
          volumes:
            - name: pmem-pvc
              persistentVolumeClaim:
                claimName: pmem-direct
  3. View the results.
    • Run the following command to query information about the PVC:
      kubectl get pvc pmem-direct

      Expected output:

      NAME          STATUS   VOLUME      CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      pmem-direct   Bound    disk-****   9Gi        RWO            pmem-direct    17m
    • Run the following command to query the pod:
      kubectl get pod

      Expected output:

      NAME                                READY   STATUS    RESTARTS   AGE
      sts-direct-0                        1/1     Running   0          17m
  4. Run the following command to access the application and check the mount path of the volume:
    kubectl exec -ti sts-lvm-0 -- df /data

    Expected output:

    Filesystem     1K-blocks  Used    Available  Use%  Mounted on
    /dev/pmem0     9076344    36888   9023072    1%    /data

    The output shows that a PMEM volume is created and mounted to the application pod.