All Products
Search
Document Center

Container Compute Service:Use MSE Ingresses in Knative to implement auto scaling

Last Updated:May 29, 2025

We recommend that you MSE Ingresses in Knative to distribute and route traffic in scenarios where a microservices architecture is used. MSE Ingresses are O&M-free, fully-managed Ingresses. You can use MSE Ingresses to implement auto scaling based on requests. You can precisely control the number of concurrent requests processed by a single pod, which helps you meet the traffic governance demands of large-scale cloud-native distributed applications.

Prerequisites

How it works

In the following figure, the Knative Pod Autoscaling (KPA) obtains the total number of concurrent requests from the MSE Ingress, calculates the number of pods required for processing the requests, and then scales pods. This implements load-aware auto scaling. The MSE Ingress can route requests to different services or versions based on routing rules and conditions.

image

Step 1: Deploy an MSE Ingress

  1. Log on to the ACS console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose Applications > Knative.

  3. On the Components tab, click Deploy Knative, select MSE for the Gateway parameter, and finish the deployment as prompted.

Step 2: Use the MSE Ingress to access Services

  1. Log on to the ACS console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose Applications > Knative.

  3. On the Services tab of the Knative page, set Namespace to default, click Create from Template, copy the following YAML content to the template editor, and then click Create.

    The template creates a Service named helloworld-go.

    Important

    Replace {REGION-ID} with the region where your cluster resides so that the Knative Service can pull images as expected.

    apiVersion: serving.knative.dev/v1
    kind: Service
    metadata:
      name: helloworld-go
    spec:
      template:
        metadata:
          annotations:
            autoscaling.knative.dev/class: kpa.autoscaling.knative.dev # Scale pods based on MSE metrics. Pods can be scaled to zero. 
            autoscaling.knative.dev/max-scale: '20' # Set the maximum number of pods allowed to 20. 
        spec:
          containerConcurrency: 5 # Set the maximum number of concurrent requests that each pod can process to 5. 
          containers:
          - image: registry-vpc.{REGION-ID}.aliyuncs.com/knative-sample/helloworld-go:73fbdd56 # {REGION-ID} is the region where your cluster resides, such as cn-hangzhou.
            env:
            - name: TARGET
              value: "Knative"

    If the Status column of the Service displays Created, the Service is deployed.

  4. On the Services page, record the domain name and gateway IP address of the helloworld-go Service in the Default Domain and Gateway columns, respectively.

  5. Run the following command to access the helloworld-go Service:

    curl -H "Host: helloworld-go.default.example.com" http://8.141.XX.XX # Replace the gateway IP and domain name with your actual data.

    Expected output:

    Hello Knative!

Step 3: Perform auto scaling based on the number of concurrent requests

  1. Install the load testing tool hey.

    For more information about hey, see Hey.

  2. Run the following command to perform a stress test on the Service:

    # Send 100,000 requests, and set the concurrency to 50 and request timeout period to 180 seconds. 
    hey -n 100000 -c 50 -t 180 -host "helloworld-go.default.example.com" "http://8.141.XX.XX"

    Expected output:

    Summary:
      Total:        86.0126 secs
      Slowest:      0.1672 secs
      Fastest:      0.0276 secs
      Average:      0.0337 secs
      Requests/sec: 1162.6199
      
      Total data:   1500000 bytes
      Size/request: 15 bytes
    
    Response time histogram:
      0.028 [1]     |
      0.042 [95291] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      0.056 [4573]  |■■
      0.069 [64]    |
      0.083 [19]    |
      0.097 [2]     |
      0.111 [0]     |
      0.125 [0]     |
      0.139 [18]    |
      0.153 [23]    |
      0.167 [9]     |
    
    
    Latency distribution:
      10% in 0.0294 secs
      25% in 0.0305 secs
      50% in 0.0327 secs
      75% in 0.0367 secs
      90% in 0.0386 secs
      95% in 0.0405 secs
      99% in 0.0433 secs
    
    Details (average, fastest, slowest):
      DNS+dialup:   0.0000 secs, 0.0276 secs, 0.1672 secs
      DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
      req write:    0.0000 secs, 0.0000 secs, 0.0009 secs
      resp wait:    0.0336 secs, 0.0276 secs, 0.1671 secs
      resp read:    0.0000 secs, 0.0000 secs, 0.0009 secs
    
    Status code distribution:
      [200] 100000 responses

    The output indicates that 100,000 are sent. All requests are processed.

  3. Run the following command to query the scaling of pods.

    Note

    The command runs permanently until you manually terminate it. You can press Ctrl + C to terminate the command.

    kubectl get pods --watch

    image.png

(Optional) Step 3: View the Knative monitoring dashboard

Knative provides out-of-the-box monitoring features. On the Knative page, click the Monitoring Dashboards tab to view the monitoring data of the specified Service. For more information about the Knative dashboard, see View the Knative dashboard.

image.png

References