Knative Architecture & Core Concepts Overview - Container Service for Kubernetes

Common questions about using Knative in an ACK (Container Service for Kubernetes) cluster.

What are the differences between Alibaba Cloud Knative and open-source Knative?

Alibaba Cloud Knative extends the open-source project across six dimensions: operations and maintenance, ease of use, elasticity, gateways, event-driven capabilities, and monitoring and alerting. See Comparison of Alibaba Cloud Knative and open-source Knative for a full comparison.

Which gateway should I choose when installing Knative?

Alibaba Cloud Knative supports three gateways:

Gateway	Best for
Application Load Balancer (ALB)	Application-layer load balancing with advanced traffic management
ASM (Alibaba Cloud Service Mesh)	Fine-grained traffic policies or cross-service observability, built on Istio
Kourier	Basic routing with no specific requirements for ALB or ASM

If you have no specific requirements, choose Kourier. See Choose a gateway for Knative for configuration details.

What permissions are required to use Knative with a RAM user or role?

The Resource Access Management (RAM) user or role must have access to all namespaces in the cluster.

Log on to the Container Service Management Console. In the left navigation pane, click Authorizations.
Click the RAM Users tab. Next to the target RAM user, click Manage Permissions.
In the Add Permissions area, select the cluster, set the namespace to all namespaces, and complete the authorization.

How long does it take for a Knative pod to scale down to zero?

Three parameters control scale-to-zero timing:

Parameter	What it controls
`stable-window`	Observation window before scaling begins. The autoscaler monitors metrics during this period without acting.
`scale-to-zero-grace-period`	Timeout for scaling down to zero. During this period, the system retains the last pod even without new requests, to handle sudden traffic spikes.
`scale-to-zero-pod-retention-period`	Retention period for the last pod before scaling to zero. This enables rapid response to traffic spikes without starting a new pod from scratch.

All three conditions must be met before a pod scales to zero:

No requests are received within the stable-window.
The scale-to-zero-pod-retention-period has expired.
The time for SKS (Serverless Kubernetes Service) to switch to proxy mode exceeds the scale-to-zero-grace-period.

The maximum time a pod can be retained is stable-window + max(scale-to-zero-grace-period, scale-to-zero-pod-retention-period). To enforce a specific minimum retention time, set scale-to-zero-pod-retention-period.

How do I use GPU resources in Knative?

In the Knative Service spec, add the k8s.aliyun.com/eci-use-specs annotation under spec.template.metadata.annotations to specify the GPU instance type. Then declare the GPU resource limit using nvidia.com/gpu under spec.containers.resources.limits.

See Use GPUs for an example.

How do I use shared GPUs in Knative?

Enable shared GPU scheduling for the nodes. See Run a shared GPU scheduling example.
In your Knative Service, configure the GPU memory limit using aliyun.com/gpu-mem under spec.containers.resources.limits.

See Enable shared GPU scheduling for details.

How do I reduce cold start latency when scaling to zero?

When no requests arrive, Knative scales instances to zero to reduce costs. When a new request comes in, the startup sequence involves IaaS resource allocation, container scheduling, image layer downloads, and application startup—each step adds latency.

Two approaches can reduce this latency:

Reserved instances (recommended for latency-sensitive workloads)

Reserve a low-cost burstable instance that stays running while your main instances are at zero. When the first request arrives, the reserved instance handles it immediately and triggers a scale-out of full-spec instances. Once those instances are ready, traffic shifts to them and the reserved instance goes offline. This balances cost savings with fast initial response.

See Configure reserved instances.

ECI image cache (reduces image pull time)

Pre-create cache snapshots of your application images. ECI (Elastic Container Instance) uses these snapshots when creating pods, skipping or shortening the image layer download step. This cuts instance creation time regardless of whether reserved instances are used.

See Use image acceleration.

Is the Activator component of ACK Knative billed?

Yes. The Activator runs as a pod in the data plane and consumes instance resources, which are billed accordingly.

How do I configure the listening port for Knative services?

The port your application listens on must match the containerPort field in the Knative Service definition. The default is 8080. To use a different port, see Configure a custom listening port.