Kubernetes Extended Resource and Device Plugin Modules

Recent reports on Kubernetes GPU scheduling and operation mechanisms suggest that the company will deprecate the traditional alpha.kubernetes.io/nvidia-gpu main code in version 1.11, and will completely remove the GPU-related scheduling and deployment code from the main code.

Instead, the two Kubernetes built-in modules, Extended Resource and Device Plugin, together with the device plugins developed by device providers, will implement scheduling from a device cluster to working nodes, and then bind devices with containers.

Let me briefly introduce the two modules of Kubernetes:

Extended Resource:
This is a custom resource expansion method. Developers need to report the names and the total number of resources to the API server. The scheduler increases or decreases the number of available resources based on the creation or deletion in the resource pod and determines nodes that satisfy resource requirements during scheduling. The increment and decrement of Extended Resource must be integers. For example, you can allocate 1 GPU but cannot allocate 0.5 GPUs. This function is stable in version 1.8 because it only replaces opaque integer resources and changes some names. If one removes the keyword integer, is it possible to allocate 0.5 GPUs in the future?
Device Plugin:
it provides a general device plugin mechanism and standard device API interface. Equipment vendors can expand devices such as the GPU, FPGA, high-performance NIC, and InfiniBand by implementing APIs, without modifying the Kubelet main code. This feature is in Alpha versions of Kubernetes 1.8 and 1.9 and will be in the Beta version of Kubernetes 1.10. This feature is still new and needs to be enabled by setting --feature-gates=DevicePlugins=true.

Device Plugin Design

API Design

Actually, Device Plugin is a simple gRPC server that implements the methods ListAndWatch and Allocate and listens to Unix sockets under /var/lib/kubelet/device-plugins/, such as /var/lib/kubelet/device-plugins/nvidia.sock.

service DevicePlugin {
    // returns a stream of []Device
    rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
    rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
}

Of them,

ListAndWatch: Kubelet calls this API to discover devices and update device status (for example, a device becomes unhealthy).
Allocate: When Kubelet creates a container to use a device, Kubelet calls this API to perform operations on the device and obtain the device, volume, and environment variable configurations required for container initialization.

Plugin Lifecycle Management

When the plugin starts, it registers with Kubelet /var/lib/kubelet/device-plugins/kubelet.sock in GRPS format and provides the plugin listening Unix socket, API version, and device name (for example, nvidia.com/gpu). Kubelet exposes the devices to the Node status and sends them to the API server in an Extended Resource request. The scheduler schedules the devices based on the information.

After the plugin starts, Kubelet establishes a persistent listAndWatch connection to the plugin. When detecting an unhealthy device, the plugin automatically notifies the Kubelet. If the device is idle, Kubelet moves it out of the allocatable list; if the device is used by a pod, Kubelet kills the pod.

The plugin monitors the Kubelet status by using the Kubelet socket. If Kubelet restarts, the plugin also restarts and registers with Kubelet again.
kublet

Deployment Methods

Typically, it supports daemonset deployment and non-containered deployment. However, the company officially recommends the deamonset deployment.

Implementation Example

Nvidia Official GPU Plugin
NVIDIA provides a user-friendly GPU device plugin NVIDIA/k8s-device-plugin that is based on the Device Plugins interface. You do not need to use volumes to specify the library required by CUDA as you do for the traditional alpha.kubernetes.io/nvidia-gpu.

apiVersion: apps/v1
kind: Deployment

metadata:
  name: tf-notebook
  labels:
    app: tf-notebook

spec:

  template: # define the pods specifications
    metadata:
      labels:
        app: tf-notebook

    spec:
      containers:
      - name: tf-notebook
        image: tensorflow/tensorflow:1.4.1-gpu-py3
        resources:
          limits:
            nvidia.com/gpu: 1

Conclusion

As Kubernetes has gained its position in the ecosystem, extensibility will be its main battlefield. Heterogeneous computing is an important new battlefield for Kubernetes. However, heterogeneous computing requires powerful computing and high-performance networks. Therefore, it needs to integrate with high-performance hardware such as GPU, FPGA, NIC and InfiniBand in a unified manner. Kubernetes Device Plugin is simple, elegant, and still evolving. Alibaba Cloud Container Service will launch the Kubernetes GPU 1.9.3 cluster based on the Device Plugin.

Read similar articles and learn more about Alibaba Cloud products and solutions at www.alibabacloud.com.

Community

Kubernetes Extended Resource and Device Plugin Modules

Device Plugin Design

API Design

Plugin Lifecycle Management

Deployment Methods

Implementation Example

Conclusion

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

Container Service for Kubernetes

ROS(Resource Orchestration Service)