All Products
Search
Document Center

Container Service for Kubernetes:Deploy KServe

Last Updated:Sep 13, 2023

This topic introduces KServe and describes how to deploy KServe.

Table of contents

Introduction to KServe

KServe is Kubernetes-based machine learning model serving framework. KServe provides simple Kubernetes CustomResourceDefinitions (CRDs) to allow you to deploy one or more trained models, such as TFServing, TorchServe, and Triton inference servers, to a model serving runtime. ModelServer and MLServer are two model serving runtimes used in KServe to deploy and manage machine learning models. These model serving runtimes allow you to use out-of-the-box model serving. ModelServer is a Python model serving runtime implemented with KServe prediction protocol v1. MLServer implements KServe prediction protocol v2 with REST and gRPC. You can also build custom model servers for complex use cases. In addition, KServe provides basic API primitives to allow you to build custom model serving runtimes with ease. You can use other tools such as BentoML to build custom model serving images.

After you use Knative InferenceService to deploy models, you can use the following serverless features provided by KServe.

  • Scale to zero

  • Auto scaling based on requests per second (RPS), concurrency, and CPU and GPU metrics

  • Version management

  • Traffic management

  • Security authentication

  • Out-of-the-box metrics

KServe Controller

The KServe controller is a key component of KServe. The KServe controller manages custom InferenceService resources, and creates and deploys Knative Services to automate resource scaling. The KServe controller can scale the Deployment of a Knative Service based on the traffic volume. When no requests are sent to the Knative Service, the KServe controller automatically scales the Service pods to zero. Auto scaling can make the use of model serving resources more efficient and avoid resource waste.

image.png

Deploy KServe

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster that you want to manage and choose Applications > Knative in the left-side navigation pane.

  3. On the Components tab, find KServe and click Deploy. In the message that appears, click Confirm.

    If the Status column of the KServe component displays Deployed, the component is deployed.

References

Quickly deploy an inference Service based on KServe