Deploying serverless workloads on Kubernetes typically requires managing Deployments, Services, and Ingress objects separately—and wiring them together for each update. Knative Service consolidates this into a single resource that handles deployment, traffic routing, request-based auto scaling, and multi-version management automatically.
Key concepts
A Knative Service automatically manages four sub-resources:
| Resource | Role |
|---|---|
| Configuration | Holds the workload spec—container image, environment variables, and resource limits. Every change triggers a new Revision. |
| Revision | An immutable, point-in-time snapshot of a Configuration. Each update produces a new Revision, giving you a version history to roll back to or split traffic across. |
| Route | Maps incoming traffic to one or more Revisions at configurable percentages. |
| Tag | A named label on a specific Revision. Assigning a tag causes Knative to generate an independent access URL for that Revision—useful for canary validation without exposing the Revision to live traffic. |
How it works
Request-based auto scaling
CPU and memory metrics often lag behind real user load. Knative scales on concurrency and requests per second (RPS) instead, which directly reflect service throughput.
Knative Serving injects a queue-proxy container into every pod. The queue-proxy collects concurrency and RPS metrics; the autoscaler periodically reads them and adjusts the Deployment's pod count accordingly.
The request flow:
-
A request arrives at the HTTP Router, which forwards it to the Serverless Service (SKS) of Knative. SKS is Knative's abstraction of Kubernetes Service resources and routes requests to different backend endpoints.
-
SKS chooses a routing mode based on the number of active pods:
-
Serve mode: Active pods exist. Requests go directly to them.
-
Proxy mode: The pod count has been scaled to zero. Requests are routed to the activator, which receives them and buffers them.
-
-
The activator receives requests, records concurrency metrics, and reports them to the autoscaler.
-
The autoscaler compares the metrics against preset thresholds. When scale-out is needed, it sends a request to the API server.
-
The API server updates the Deployment and creates new pods.
-
Once the activator detects the new pods are ready, it forwards the buffered requests to them.
For configuration details, see Enable auto scaling to withstand traffic fluctuations.
Automatic scale-to-zero
When a service receives no traffic, Knative automatically scales the pod count to zero, freeing resources. When the next request arrives, the autoscaler scales pods back up and the activator holds the request until a pod is ready.
Mode transitions:
-
Serve to Proxy: When the request rate drops to zero, the autoscaler switches to Proxy mode and scales pods to zero.
-
Proxy to Serve: When a new request arrives, the autoscaler scales out. After pods become ready, the mode switches back to Serve and the activator forwards the request.
For configuration details, see Enable auto scaling to withstand traffic fluctuations.
Multi-version management and canary releases
Each Configuration update produces a unique, immutable Revision. Routes distribute traffic across Revisions at configurable percentages, giving you rollback and canary release capabilities without redeployment.
Example: Create Revision V1, then update the Configuration to produce Revision V2. Configure a Route that sends 70% of traffic to V1 and 30% to V2. Gradually shift the percentage until V2 handles 100% of traffic.
YAML examples
The following examples cover common Knative Service configurations.
Basic service
A minimal Knative Service with a single container:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
spec:
containers:
- image: registry-vpc.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
env:
- name: TARGET
value: "Knative"
Traffic splitting (canary release)
Route 70% of traffic to a stable Revision and 30% to a new one:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
spec:
containers:
- image: registry-vpc.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
traffic:
- revisionName: helloworld-go-v1
percent: 70
- revisionName: helloworld-go-v2
percent: 30
Canary validation with a tag
Deploy a new Revision for testing without routing live traffic to it. The staging tag generates an independent endpoint you can use for validation:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
spec:
containers:
- image: registry-vpc.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
traffic:
- revisionName: helloworld-go-v1
percent: 100
- revisionName: helloworld-go-v2
percent: 0
tag: staging
Setting percent: 0 with a tag keeps the Revision out of live traffic while Knative generates a dedicated URL for that Revision that you can use to test it directly.
For canary release procedures, see Perform a canary release based on traffic splitting.