KServe is a Kubernetes-based framework for serving machine learning models. It lets you deploy trained models -- such as those using TFServing, TorchServe, or Triton inference servers -- as Kubernetes CustomResourceDefinitions (CRDs), which simplifies and accelerates deploying, updating, and scaling models. The core component, KServe Controller, can be installed through the ACS console.
How KServe works
The KServe controller manages InferenceService custom resources and creates Knative Services to automate resource scaling.
When traffic increases, the KServe controller scales the Deployment of a Knative Service accordingly. When no requests are received, it scales the Service pods to zero. This auto scaling mechanism maximizes resource efficiency and reduces waste.
Model serving runtimes
KServe includes two built-in model serving runtimes:
| Runtime | Description |
|---|---|
| ModelServer | A Python-based runtime that implements KServe prediction protocol v1 |
| MLServer | A runtime that implements KServe prediction protocol v2 with both REST and gRPC support |
Both runtimes provide out-of-the-box model serving. For more complex use cases, build a custom model server using KServe's API primitives or tools such as BentoML.
Serverless features
After you deploy models with Knative InferenceService, the following serverless features become available:
Scale to zero
Auto scaling based on requests per second (RPS), concurrency, and CPU and GPU metrics
Version management
Traffic management
Security authentication
Out-of-the-box metrics
Deploy KServe
Prerequisites
Before you begin, ensure that you have:
Knative deployed in your ACS cluster. For more information, see Deploy Knative
Procedure
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the target cluster and click its ID. In the left-side navigation pane, choose Applications > Knative.
On the Components tab, find KServe and click Deploy in the Actions column. Click Confirm in the dialog box. The deployment may take several minutes to complete.
Verify the deployment
After the deployment completes, check the Status column of the KServe component on the Components tab. A status of Deployed confirms that the component is installed.