Introduction to Knative Service - Container Service for Kubernetes

The Knative application model, also known as Knative Service, simplifies service deployment and management on Kubernetes. You can use it to manage the lifecycle of workloads.

Introduction

Knative Service is a resource object directly operated by users. A Knative Service includes:

Configuration: Defines workload configurations, including container images, environment variables, and resource limits. Each modification (such as updating the image version or adjusting environment variables) triggers a new revision.
Revision: Each update to the configuration creates a snapshot. A snapshot is a revision, preserving the corresponding configuration state. Knative uses revisions for multi-version application management.
Route: Configures routing rules to distribute traffic to multiple revisions. It allows you to configure the percentage of traffic forwarded to different revisions for canary releases.
Tag: After assigning a unique tag to a specific revision, Knative automatically generates an independent endpoint for that revision, which can be used for canary validation.

Essentially, Knative Service abstracts Kubernetes resources (Deployment, Service, and Ingress) with advanced capabilities, such as auto scaling, multi-version management, and traffic control.

YAML example

The following YAML template shows a simple Knative Service example:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go
spec:
  template:
    spec:
      containers:
      - image: registry-vpc.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
        env:
        - name: TARGET
          value: "Knative"

Features

By operating Knative Service resource objects, you can implement the following functions in a simpler way.

Request-based auto scaling

Traditional CPU/memory-based scaling often fails to reflect actual business load. For web services, scaling by concurrent requests (concurrency) or records per second (RPS) directly correlates with performance.

Knative Serving injects a queue-proxy container into each pod to collect metrics such as container concurrency or RPS. The autoscaler periodically fetches these metrics and automatically adjusts the number of pods in the Deployment based on its algorithms, achieving request-based auto scaling. The implementation flow is as follows:

A user initiates a request, and the HTTP Router forwards the request to the Serverless Service (SKS) of Knative.
SKS is Knative's abstraction of Kubernetes Service resources, which can route requests to different endpoints at the backend.
SKS makes routing decisions based on the number of running instances in the application.
- Serve mode: When running instances exist in the current application, requests are directly forwarded to them.
- Proxy mode: When the number of instances in the current application has been scaled to 0, traffic is routed to the activator. The activator acts as a request proxy buffer.
The activator receives requests, records concurrency metrics, and sends metrics to the autoscaler.
The autoscaler determines whether scaling is needed based on preset thresholds, and sends a scale-out request to the API server when needed.
Based on the request from the autoscaler, the API server updates the Deployment and creates new pods.
After the activator detects that new pods are ready, it forwards the request to them.

For operation details, see Enable auto scaling to withstand traffic fluctuations.

Automatic scale-to-zero

Knative enables automatic scaling of pods to zero during no-traffic periods, with instant scale-up upon request arrival. Knative defines two request access modes: Proxy mode and Serve mode. The autoscaler is responsible for switching the mode:

If the number of requests is zero, the autoscaler switches from Serve mode to Proxy mode.
When a request is received, the autoscaler receives a notification to scale out the number of pods. After new pods become ready, the autoscaler forwards the request to them, and switches from Proxy mode to Serve mode.

For configuration details, see Enable auto scaling to withstand traffic fluctuations.

Multi-version management and canary releases

Each configuration update generates a unique and immutable revision. Routes direct requests to specific revisions and distribute traffic across them at configurable percentages. Revisions enable version management capabilities such as rollbacks and canary releases based on traffic. For example, after creating revision V1, update the configuration object of the Service to create Revision V2. Configure route traffic splitting for revision V1 and revision V2 (such as V1:70% / V2:30%), the traffic is distributed based on the specified ratios.

For operation details, see Perform a canary release based on traffic splitting.