For applications with slower startup times, such as Java services, community Knative's default scale-to-zero policy introduces high cold start latency. ACK Knative solves this with reserved instances -- low-cost, always-on pods that serve requests immediately while standard instances spin up.
How it works
When a Knative Service scales to zero, the first incoming request triggers a cold start: resource scheduling, image pulling, and application startup must all complete before the request is served.
Reserved instances change this behavior by keeping one or more low-specification pods running during idle periods:
Scale-in with a safety net. When traffic ceases, the Service scales in, but at least one reserved instance stays online to handle potential new requests.
Parallel response and scale-out. When a request arrives, two things happen simultaneously:
The request is routed to the active reserved instance for immediate processing -- no cold start.
Knative creates standard-specification instances to handle ongoing traffic.
Traffic handover. Once the first standard instance is ready, all subsequent traffic is routed to it.
Cleanup. After the reserved instance finishes processing its initial request, it is automatically terminated.
Prerequisites
Before you begin, make sure that you have:
Knative deployed in your ACK cluster.
ACK Virtual Node installed (required for ECI and ACS reserved instances; not required for ECS). For more information, see Components.
Annotation reference
Enable reserved instances by adding annotations to your Knative Service manifest. The following table lists all supported annotations.
| Annotation | Description | Default |
|---|---|---|
knative.aliyun.com/reserve-instance | Enable or disable reserved instances. Set to "enable". | Disabled |
knative.aliyun.com/reserve-instance-type | Resource type for the reserved instance. Supported values: eci, ecs, acs. | eci |
knative.aliyun.com/reserve-instance-replicas | Number of reserved instances to maintain. | 1 |
knative.aliyun.com/reserve-instance-eci-use-specs | Instance type or CPU-memory specification for ECI. Accepts instance type names (e.g., ecs.t6-c1m1.large) or a CPU-memory format (e.g., 1-2Gi). Separate multiple instance types with commas. | - |
knative.aliyun.com/reserve-instance-ecs-use-specs | ECS instance type for the reserved instance (e.g., ecs.gn6i-c4g1.xlarge). | - |
knative.aliyun.com/reserve-instance-acs-compute-class | Compute class for the ACS pod (e.g., general-purpose). | - |
knative.aliyun.com/reserve-instance-acs-compute-qos | Compute quality for the ACS pod (e.g., default). | - |
knative.aliyun.com/reserve-instance-cpu-resource-request | CPU request for the reserved instance (e.g., "1"). | - |
knative.aliyun.com/reserve-instance-cpu-resource-limit | CPU limit for the reserved instance (e.g., "1"). | - |
knative.aliyun.com/reserve-instance-memory-resource-request | Memory request for the reserved instance (e.g., "2Gi"). | - |
knative.aliyun.com/reserve-instance-memory-resource-limit | Memory limit for the reserved instance (e.g., "2Gi"). | - |
Configure ECI reserved instances
Elastic Container Instance (ECI) is the default resource type for reserved instances. ECI pods run on serverless infrastructure, so no node capacity planning is required.
Specify by instance type
To use specific ECS instance types for the underlying ECI, add the knative.aliyun.com/reserve-instance-eci-use-specs annotation. You can specify multiple instance types separated by commas.
The following example specifies the ecs.t6-c1m1.large and ecs.t5-lc1m2.small instance types:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello-spec-1
spec:
template:
metadata:
annotations:
knative.aliyun.com/reserve-instance: enable
knative.aliyun.com/reserve-instance-eci-use-specs: "ecs.t6-c1m1.large,ecs.t5-lc1m2.small"
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8Specify by CPU and memory
If you are unsure about the specific instance types, define the required CPU and memory resources.
The following example specifies a 1-core, 2 GiB instance:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello-spec-2
spec:
template:
metadata:
annotations:
knative.aliyun.com/reserve-instance: enable
knative.aliyun.com/reserve-instance-eci-use-specs: "1-2Gi"
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8Configure ACS reserved instances
To use Alibaba Cloud Container Compute Service (ACS) for reserved instances, first install ACK Virtual Node. For more information, see Components, then add the knative.aliyun.com/reserve-instance-type: acs annotation.
Specify by compute class and quality
The following is a basic configuration for an ACS reserved instance. You can specify the compute class and compute quality:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
metadata:
annotations:
knative.aliyun.com/reserve-instance: enable
knative.aliyun.com/reserve-instance-type: acs
# (Optional) Compute class for the ACS pod
knative.aliyun.com/reserve-instance-acs-compute-class: "general-purpose"
# (Optional) Compute quality for the ACS pod
knative.aliyun.com/reserve-instance-acs-compute-qos: "default"
spec:
containers:
- image: registry-vpc.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
env:
- name: TARGET
value: "Knative"Specify by CPU and memory
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go-resource
spec:
template:
metadata:
annotations:
knative.aliyun.com/reserve-instance: enable
knative.aliyun.com/reserve-instance-type: acs
knative.aliyun.com/reserve-instance-cpu-resource-request: "1"
knative.aliyun.com/reserve-instance-memory-resource-request: "2Gi"
spec:
containers:
- image: registry-vpc.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
env:
- name: TARGET
value: "Knative"Configure ECS reserved instances
Specify a lower-cost Elastic Compute Service (ECS) instance type for your reserved instance to reduce costs during idle periods.
GPU workloads
The following example configures a low-specification GPU-accelerated instance as a reserved instance for a GPU inference service:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
labels:
release: qwen
name: qwen
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/metric: "concurrency"
# Enable and configure an ECS reserved instance. You can configure one or more instance types.
knative.aliyun.com/reserve-instance: enable
knative.aliyun.com/reserve-instance-type: ecs
knative.aliyun.com/reserve-instance-ecs-use-specs: ecs.gn6i-c4g1.xlarge
labels:
release: qwen
spec:
containers:
- command:
- sh
- -c
- python3 -m vllm.entrypoints.openai.api_server --port 8080 --trust-remote-code
--served-model-name qwen --model /mnt/models/Qwen-7B-Chat-Int8 --gpu-memory-utilization
0.95 --quantization gptq --max-model-len=6144
image: kube-ai-registry.cn-shanghai.cr.aliyuncs.com/kube-ai/vllm:0.4.1
imagePullPolicy: IfNotPresent
name: vllm-container
resources:
# Resource configuration for the standard instance
limits:
cpu: "16"
memory: 60Gi
nvidia.com/gpu: "1"
requests:
cpu: "8"
memory: 36Gi
nvidia.com/gpu: "1"
volumeMounts:
- mountPath: /mnt/models/Qwen-7B-Chat-Int8
name: qwen-7b-chat-int8
volumes:
- name: qwen-7b-chat-int8
persistentVolumeClaim:
claimName: qwen-7b-chat-int8-datasetCPU workloads
The following example specifies a 1-core, 2 GiB instance:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-resource
spec:
template:
metadata:
annotations:
knative.aliyun.com/reserve-instance: enable
knative.aliyun.com/reserve-instance-type: ecs
knative.aliyun.com/reserve-instance-cpu-resource-request: "1"
knative.aliyun.com/reserve-instance-cpu-resource-limit: "1"
knative.aliyun.com/reserve-instance-memory-resource-request: "2Gi"
knative.aliyun.com/reserve-instance-memory-resource-limit: "2Gi"
spec:
containers:
- image: registry-vpc.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
env:
- name: TARGET
value: "Knative"Configure a reserved instance pool
To handle high burst traffic, expand a single reserved instance into a resource pool by specifying the number of replicas with the knative.aliyun.com/reserve-instance-replicas annotation.
The following example creates a reserved pool of 3 low-specification instances:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello-reserve-pool
spec:
template:
metadata:
annotations:
knative.aliyun.com/reserve-instance: enable
knative.aliyun.com/reserve-instance-replicas: "3"
knative.aliyun.com/reserve-instance-eci-use-specs: "ecs.t6-c1m1.large,ecs.t5-lc1m2.small"
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8Verify the configuration
After applying the Service manifest, verify that the reserved instance is running:
# Check if the reserved instance pod is running
kubectl get pods -l serving.knative.dev/service=<your-service-name>
# View the annotations on the Knative Service
kubectl get ksvc <your-service-name> -o yaml | grep reserve-instanceA successfully configured reserved instance remains in Running state even when no traffic is being served.
Apply in production
Choose the right specification. Select the lowest-cost instance type for your reserved instance that can reliably run your application and serve at least one request.
Use a reserved pool for high bursts. If your service is likely to experience sudden, high-traffic events, configure a reserved instance pool to better absorb the initial load.
Billing
Reserved instances run continuously and incur charges. See the following for details:
References
To implement automatic workload scaling in Knative, see Use HPA in Knative, Automatically scale Services based on the number of traffic requests, and Use AHPA to implement scheduled auto scaling.