All Products
Search
Document Center

Container Service for Kubernetes:KServe overview

Last Updated:Feb 27, 2025

KServe is an open source cloud-native model service platform designed to simplify the process of deploying and running machine learning models on Kubernetes. KServe supports multiple machine learning frameworks and provides the auto scaling feature. KServe allows you to deploy models by defining simple YAML configuration files with declarative APIs. This way, you can easily configure and manage model services.

Framework

KServe provides a series of CustomResourceDefinitions (CRDs) to manage and deliver machine learning model services. KServe provides easy-to use advanced interfaces and standardized data plane protocols for a wide range of models such as TensorFlow, XGBoost, scikit-learn, PyTorch, and Huggingface Transformer/LLM. In addition, KServe encapsulates the complex operations of AutoScaling, networking, health checking, and server configuration to implement features including GPU auto scaling, Scale to Zero, and Canary Rollouts. These features simplify the deployment and maintenance process of AI models.

For more information, see KServe.

image

Deployment modes

KServe provides the following three deployment modes: Raw Deployment, Serverless, and ModelMesh. The supported KServe features vary based on the deployment mode.

Deployment mode

Description

References

Raw Deployment

Raw Deployment is the simplest deployment mode of KServe that depends on only cert-manager and gateways. This deployment mode supports features such as AutoScaling, Prometheus monitoring, Canary Rollouts with specific gateways, and GPU auto scaling.

Serverless

The Serverless deployment mode depends on cert-manager, gateways, and Knative. This deployment mode supports features such as autoscaling, Scale to Zero, Canary Rollouts, and GPU autoscaling.

Deploy a Serverless mode model as an inference service

ModelMesh

The ModelMesh deployment mode depends on cert-manager, Knative, and ModelMesh. For example, ModelMesh is used to deploy Service Mesh (ASM). This deployment mode supports features such as AutoScaling, Scale to Zero, Canary Rollouts, and GPU auto scaling.

N/A

ack-kserve installation

For more information about how to deploy and manage ack-kserve in a Container Service for Kubernetes (ACK) cluster, see Install ack-kserve.