You can use Horizontal Pod Autoscaler (HPA) in Knative to automatically scale pods. You can set the threshold of the CPU metric for a Knative Service. This ensures that pods are automatically scaled to match the fluctuations of user traffic. This topic describes how to use HPA in Knative.
Prerequisites
Deploy Knative in an ASK clusterProcedure
- Create the ksvc-hpa.yaml file.
Configure an HPA scaling policy for a Knative Service. The following code block is an example:
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go-hpa spec: template: metadata: labels: app: helloworld-go-hpa annotations: autoscaling.knative.dev/class: "hpa.autoscaling.knative.dev" autoscaling.knative.dev/metric: "cpu" autoscaling.knative.dev/target: "75" autoscaling.knative.dev/minScale: "1" autoscaling.knative.dev/maxScale: "10" spec: containers: - image: registry.cn-hangzhou.aliyuncs.com/knative-samples/helloworld-go:160e4dc8 resources: requests: cpu: '200m'
- The
autoscaling.knative.dev/class: "hpa.autoscaling.knative.dev"
annotation specifies the HPA component for the Knative Service. - The
autoscaling.knative.dev/metric
annotation specifies the CPU metric. - The
autoscaling.knative.dev/target
annotation specifies the threshold of the CPU metric. - The
autoscaling.knative.dev/minScale: "1"
annotation specifies the minimum number of pods that must be guaranteed. - The
autoscaling.knative.dev/maxScale: "10"
annotation specifies the maximum number of pods that are allowed.
- The
- Apply the HPA scaling policy.
kubectl apply -f ksvc-hpa.yaml
Result
Check the changes in the number of pods after HPA is enabled for the Knative Service. The following trend chart shows an example.