Configure an MSE cloud-native gateway for an end-to-end canary release - Microservices Engine

In a microservices architecture, service-to-service calls are load-balanced randomly. When you release a new version of one service, traffic with specific characteristics may not stay on that version throughout the entire call chain. For example, a request tagged for canary testing might reach the canary version of Service A at the gateway but then be routed to the base version of Service B downstream -- breaking the end-to-end test.

The end-to-end canary release feature in Microservices Engine (MSE) solves this by grouping services into lanes -- isolated runtime environments tagged by version. You define routing rules that steer matching traffic to the tagged (canary) versions across every service in the chain, without any code changes. If a service has no canary version deployed, traffic automatically falls back to its base version.

The following example shows how to configure an MSE cloud-native gateway to route canary traffic across three backend Spring Cloud services.

How it works

End-to-end canary release combines the MSE cloud-native gateway (north-south traffic control) with MSE Microservices Governance (east-west traffic control). The gateway routes incoming requests to the correct lane based on your rules. Within the cluster, MSE Microservices Governance ensures that tagged traffic stays in the same lane as it flows through downstream services. This gives you consistent canary coverage across the full call chain.

Sample scenario

An e-commerce platform has three backend Spring Cloud services behind an MSE cloud-native gateway:

Application A -- Transaction center
Application B -- Commodity center
Application C -- Inventory center

The call chain is: Client > MSE cloud-native gateway > A > B > C.

A new feature requires updated versions of Application A and Application C. Before the full rollout, you want to canary-test both new versions simultaneously, while Application B stays on its current version.

With end-to-end canary release, traffic that matches your canary rules flows through the canary versions of A and C, while still hitting the base version of B. Traffic that does not match the rules stays entirely on base versions.

Key concepts

Term	Description
MSE cloud-native gateway	A Kubernetes Ingress-compatible gateway that supports service discovery from ACK clusters and Nacos instances, with built-in authentication and security.
Lane	An isolated runtime environment for a specific version of your applications. Only traffic that matches routing rules reaches the tagged applications in a lane. Applications and lanes have a many-to-many relationship.
Lane group	A collection of lanes, typically scoped to a team or scenario.
Base environment	The environment where untagged (default) application versions run. Serves as a fallback when a lane has no tagged version of a service.

Limitations

End-to-end canary release is integrated with MSE tag-based routing. Do not configure separate canary release rules and tag-based routing rules for the same applications.
For supported Java frameworks, see Java frameworks supported by Microservices Governance.
The cloud-native gateway must be version 2.0.6 or later. To upgrade, see Upgrade an MSE cloud-native gateway.

Prerequisites

Before you begin, make sure that you have:

An ACK managed cluster. See Create an ACK managed cluster
MSE Microservices Governance Professional Edition, activated on the Microservices Governance page
Microservices Governance enabled for your ACK applications. See Enable Microservices Governance for Java microservice applications in an ACK or ACS cluster
An MSE cloud-native gateway, created and associated with a service source (ACK cluster or MSE Nacos instance). See Create an MSE cloud-native gateway and Add a service source

Important

Make sure the MSE Java agent version is 3.2.3 or later. Earlier versions may cause unexpected behavior.

Note

End-to-end canary release supports ACK clusters or MSE Nacos instances as service sources.
The MSE cloud-native gateway must be in the same VPC as your ACK cluster or MSE Nacos instance.

Step 1: Deploy the base versions

Log on to the ACK console, navigate to your cluster, and choose Workloads > Deployments. Select the target namespace and click Create from YAML.

Use the YAML below to deploy the base versions of Application A, B, and C. Choose the section that matches your service source.

MSE Nacos instance as the service source

Replace {nacos server address} with the internal endpoint of your MSE Nacos instance (remove the curly braces).

Show YAML code

Important

The YAML uses the MSE Nacos instance endpoint for service registration.

# Base version of Application A (Transaction center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-a
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-a
  template:
    metadata:
      labels:
        app: spring-cloud-a
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-a
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-a:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20001
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 30
          periodSeconds: 60
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: {nacos server address}
        - name: dubbo.registry.address
          value: 'nacos://{nacos server address}:8848'
---
# Base version of Application B (Commodity center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-b
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-b
  template:
    metadata:
      labels:
        app: spring-cloud-b
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-b
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-b:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20002
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 30
          periodSeconds: 60
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: {nacos server address}
        - name: dubbo.registry.address
          value: 'nacos://{nacos server address}:8848'
---
# Base version of Application C (Inventory center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-c
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-c
  template:
    metadata:
      labels:
        app: spring-cloud-c
        msePilotCreateAppName: spring-cloud-c
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-c
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-c:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20003
        livenessProbe:
          tcpSocket:
            port: 20003
          initialDelaySeconds: 30
          periodSeconds: 60
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: {nacos server address}
        - name: dubbo.registry.address
          value: 'nacos://{nacos server address}:8848'

ACK cluster as the service source

When using an ACK cluster as the service source, first deploy a self-managed Nacos instance as the service registry, then deploy the three applications.

1. Deploy the self-managed Nacos instance:

Show YAML code for the self-managed Nacos instance

Important

The applications register with this self-managed Nacos instance.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nacos-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nacos-server
  template:
    metadata:
      labels:
        msePilotAutoEnable: "off"
        app: nacos-server
    spec:
      containers:
        - name: nacos-server
          image: 'registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/nacos-server:v2.1.2'
          env:
            - name: MODE
              value: standalone
            - name: JVM_XMS
              value: 512M
            - name: JVM_XMX
              value: 512M
            - name: JVM_XMN
              value: 256M
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 15
            periodSeconds: 10
            successThreshold: 1
            tcpSocket:
              port: 8848
            timeoutSeconds: 3
          readinessProbe:
            failureThreshold: 5
            initialDelaySeconds: 15
            periodSeconds: 15
            successThreshold: 1
            tcpSocket:
              port: 8848
            timeoutSeconds: 3
          resources:
            requests:
              cpu: '1'
              memory: 2Gi
      dnsPolicy: ClusterFirst
      restartPolicy: Always

---
apiVersion: v1
kind: Service
metadata:
  name: nacos-server
spec:
  type: ClusterIP
  ports:
    - name: nacos-server-8848-8848
      port: 8848
      protocol: TCP
      targetPort: 8848
    - name: nacos-server-9848-9848
      port: 9848
      protocol: TCP
      targetPort: 9848
  selector:
    app: nacos-server

2. Deploy the base versions of Application A, B, and C:

Show YAML code for base versions (ACK service source)

# Base version of Application A (Transaction center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-a
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-a
  template:
    metadata:
      labels:
        app: spring-cloud-a
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-a
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-a:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20001
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 30
          periodSeconds: 60
        # Connect to the self-managed Nacos instance
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: nacos-server
        - name: dubbo.registry.address
          value: 'nacos://nacos-server:8848'
---
# Base version of Application B (Commodity center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-b
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-b
  template:
    metadata:
      labels:
        app: spring-cloud-b
        msePilotCreateAppName: spring-cloud-b
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-b
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-b:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20002
        livenessProbe:
          tcpSocket:
            port: 20002
          initialDelaySeconds: 30
          periodSeconds: 60
        # Connect to the self-managed Nacos instance
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: nacos-server
        - name: dubbo.registry.address
          value: 'nacos://nacos-server:8848'
---
# Base version of Application C (Inventory center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-c
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-c
  template:
    metadata:
      labels:
        app: spring-cloud-c
        msePilotCreateAppName: spring-cloud-c
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-c
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-c:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20003
        livenessProbe:
          tcpSocket:
            port: 20003
          initialDelaySeconds: 30
          periodSeconds: 60
        # Connect to the self-managed Nacos instance
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: nacos-server
        - name: dubbo.registry.address
          value: 'nacos://nacos-server:8848'

3. Create a Kubernetes Service to expose Application A as the ingress application:

Show YAML code for the Application A service

apiVersion: v1
kind: Service
metadata:
  name: sc-a
  namespace: default
spec:
  ports:
    - port: 20001
      protocol: TCP
      targetPort: 20001
  selector:
    app: spring-cloud-a
  type: ClusterIP

Step 2: Expose Application A through the gateway

Add Application A as a backend service and create a route on the MSE cloud-native gateway.

Add a new service and route

If Application A has not been added to the gateway:

Log on to the MSE console. In the left-side navigation pane, choose Cloud-native Gateway > Gateways and click the gateway name. Then click Routes > Services tab > Add Service. For more information, see Add a service. Configure the service based on your service source: ACK cluster as service source: MSE Nacos instance as service source:
- Service Source: Select MSE Nacos.
- Namespace: Select public.
- Services: Select sc-A.
On the Routes tab, click Add Route and configure the route:
Parameter Value
Path Select Prefix and enter /a.
Route Point Select Single Service.
Backend Service Select sc-A.

Use an existing service

If Application A has already been imported, modify the existing route on the Routes tab:

Parameter	Value
Path	Select Prefix and enter `/a`.
Route Point	Select Single Service.
Backend Service	Select sc-A.

Step 3: Verify base traffic

In the MSE console, choose Cloud-native Gateway > Gateways and click the gateway name.
Click Overview. On the Gateway Ingress tab, note the Ingress IP Address of the SLB instance.
Run a cURL command to verify that traffic flows through the base versions of all three applications:

curl <SLB-IP-address>/a

Expected output:

A[10.0.3.178][config=base] -> B[10.0.3.195] -> C[10.0.3.201]

Applications B and C show no version suffix, confirming that all traffic routes to base versions.

Step 4: Deploy the canary versions

New versions of Application A and Application C need canary testing. Go back to the ACK console, navigate to Workloads > Deployments, select the namespace, and click Create from YAML.

The only difference from the base YAML: canary deployments include the label alicloud.service.tag: gray.

MSE Nacos instance as the service source

Replace {nacos server address} with the internal endpoint of your MSE Nacos instance.

Show YAML code

Important

Compared to the base version YAML, the canary version adds alicloud.service.tag: gray to spec.template.metadata.labels.

# Canary version of Application A (Transaction center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-a-gray
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-a-gray
  template:
    metadata:
      labels:
        alicloud.service.tag: gray
        app: spring-cloud-a-gray
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-a
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-a:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20001
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 30
          periodSeconds: 60
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: {nacos server address}
        - name: dubbo.registry.address
          value: 'nacos://{nacos server address}:8848'
---
# Canary version of Application C (Inventory center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-c-gray
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-c-gray
  template:
    metadata:
      labels:
        alicloud.service.tag: gray
        app: spring-cloud-c-gray
        msePilotCreateAppName: spring-cloud-c
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-c
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-c:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20003
        livenessProbe:
          tcpSocket:
            port: 20003
          initialDelaySeconds: 30
          periodSeconds: 60
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: {nacos server address}
        - name: dubbo.registry.address
          value: 'nacos://{nacos server address}:8848'

ACK cluster as the service source

Show YAML code

Important

Compared to the base version YAML, the canary version adds alicloud.service.tag: gray to spec.template.metadata.labels.

# Canary version of Application A (Transaction center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-a-gray
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-a
  template:
    metadata:
      labels:
        alicloud.service.tag: gray
        app: spring-cloud-a
        msePilotCreateAppName: spring-cloud-a
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-a
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-a:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20001
        livenessProbe:
          tcpSocket:
            port: 20001
          initialDelaySeconds: 30
          periodSeconds: 60
        # Connect to the self-managed Nacos instance
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: nacos-server
        - name: dubbo.registry.address
          value: 'nacos://nacos-server:8848'

---
# Canary version of Application C (Inventory center)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-cloud-c-gray
  namespace: default
spec:
  selector:
    matchLabels:
      app: spring-cloud-c
  template:
    metadata:
      labels:
        alicloud.service.tag: gray
        app: spring-cloud-c
        msePilotCreateAppName: spring-cloud-c
        msePilotAutoEnable: 'on'
    spec:
      containers:
      - name: spring-cloud-c
        image: registry.cn-hangzhou.aliyuncs.com/mse-governance-demo/spring-cloud-c:3.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 20003
        livenessProbe:
          tcpSocket:
            port: 20003
          initialDelaySeconds: 30
          periodSeconds: 60
        # Connect to the self-managed Nacos instance
        env:
        - name: spring.cloud.nacos.discovery.server-addr
          value: nacos-server
        - name: dubbo.registry.address
          value: 'nacos://nacos-server:8848'

Step 5: Create a lane group

Log on to the MSE console and select a region.
In the left-side navigation pane, choose Microservices Governance > Full link grayscale.
Click Create Lane Group and Lane (or Create Lane Group if a lane group already exists).
Configure the lane group:

Parameter	Value
Lane Group Name	Enter a name for the lane group.
Ingress Type	Select MSE Cloud-native Gateway.
Ingress Gateway	Select your cloud-native gateway.
Lane Group Application	Select spring-cloud-a, spring-cloud-b, and spring-cloud-c.

After the lane group is created, it appears in the Lane Group section of the Full link grayscale page. Click the Edit icon icon to modify it.

Step 6: Create a lane and configure routing rules

Note

Tag canary application nodes with alicloud.service.tag: ${tag} in spec.template.metadata.labels (container environment) or -Dalicloud.service.tag=${tag} as a Java startup parameter (ECS environment).
MSE cloud-native gateways support two lane routing modes:
- Routing by request content -- Routes traffic based on request attributes (headers, parameters). Best when requests carry identifiers that distinguish canary traffic.
- Routing by percentage -- Routes a percentage of all traffic to the canary lane. Use this when requests carry no distinguishing attributes. Requests from the same source may hit different lanes.
All lanes in a lane group must use the same routing mode. The mode is locked when the first lane is created.

On the Full link grayscale page, click Create First Split Lane (or Create Lane if a lane already exists).

Option A: Route by request content

Parameter	Value
Add Node Tag	Tag your canary application nodes.
Lane Tag	Enter the tag for this lane (for example, `gray`). Use Confirm Matching Relationship to verify the number of tagged nodes.
Canary Release Mode	Select Canary Release by Content.
Canary Release Condition	Select Meet All Conditions.
Condition details	Parameter Type: Header, Parameter: `canary`, Condition: `==`, Value: `gray`

Condition matching modes

Mode	Behavior
Meet All Conditions	Traffic must match every condition to reach the lane.
Meet Any Condition	Traffic that matches at least one condition reaches the lane.

Supported condition operators

Operator	Description
`==`	Exact match. Traffic value must equal the condition value.
`!=`	Not-equal match. Traffic value must differ from the condition value.
`in`	Inclusion match. Traffic value must be in the specified list.
Percentage	Hash-based match. Evaluates `hash(get(key)) % 100 < value`.
Regular expression	Regex match. Traffic value must match the specified pattern.

Option B: Route by percentage

Note Routing by percentage requires ack-onepilot version 3.0.18 or later and agent version 3.2.3 or later.

Parameter	Value
Add Node Tag	Tag your canary application nodes.
Lane Tag	Enter the tag for this lane. Use Confirm Matching Relationship to verify the number of tagged nodes.
Canary Release Mode	Select Canary Release by Ratio.
Flow ratio	Enter `30` (percent).

Note You can also configure different traffic percentages for each gateway base route. If enabled, the sum of traffic percentages across all lane groups for a given base route must not exceed 100%.

Manage lanes after creation

After the lane is created, it appears in the Traffic Distribution section of the Full link grayscale page. Available actions:

Enable -- Activates the lane. Matching traffic routes to the tagged application versions. If no tagged version exists for a service, traffic falls back to untagged versions.
Disable -- Deactivates the lane. All traffic routes to untagged versions.
Click the icon to view the traffic percentage.
Click the icon to configure application status within the lane.

Step 7: Verify canary traffic

Verify routing by request content

Send a request with the canary: gray header:

curl -H "canary: gray" <SLB-IP-address>/a

Expected output:

Agray[10.0.3.177][config=base] -> B[10.0.3.195] -> Cgray[10.0.3.180]

The output confirms that traffic flows through the canary versions of Application A (Agray) and Application C (Cgray), while Application B (no canary version) stays on the base version.

Verify routing by percentage

Run the following Python script to test traffic distribution. Replace x.x.x.x with the SLB IP address of your cloud-native gateway.

Expand to view the Python script

# Install the requests package: pip3 install requests
# Run the script: python3 traffic.py
import requests


TOTAL_REQUEST = 100
ENTRY_URL = 'http://x.x.x.x/a'

def parse_tag(text: str):
    '''
    Parse the version tag from the response.
    Example responses:
      A[10.0.23.64][config=base] -> B[10.0.23.65] -> C[10.0.23.61]
      Agray[10.0.23.64][config=base] -> B[10.0.23.65] -> Cgray[10.0.23.61]
    '''
    print(text)
    app_parts = text.split(' -> ')
    tag_app = app_parts[-1]
    splits = tag_app.split('[')
    tag_part = splits[0]
    tag = tag_part[1:]
    return tag if len(tag) > 0 else 'base'

def get_tag(url: str):
    resp = requests.get(url)
    resp.encoding = resp.apparent_encoding
    return parse_tag(resp.text)

def cal_tag_count(url: str, total_request: int):
    count_map = {}
    for i in range(total_request):
        tag = get_tag(url)
        if tag not in count_map:
            count_map[tag] = 1
        else:
            count_map[tag] += 1

    print()
    print('Total Request:', total_request)
    print('Traffic Distribution:', count_map)


if __name__ == '__main__':
    cal_tag_count(ENTRY_URL, TOTAL_REQUEST)

The output shows approximately 30% of requests routed to the canary environment:

(Optional) Monitor canary traffic

Use MSE observability features to monitor canary traffic and quickly locate issues.

Gateway-level monitoring

In the MSE console, open your cloud-native gateway and go to Routes > Services tab. Click the service name to view per-service metrics on the Monitor tab.

Microservices Governance monitoring

On the Full link grayscale page, click the target application. The QPS Data section shows traffic metrics for both base (untagged) and canary versions:

Total QPS -- Total queries per second for the application.
Exception QPS -- Error request rate.
GrayQPS -- Queries per second for the canary version.

ARMS monitoring

If your application is connected to Application Real-Time Monitoring Service (ARMS), view canary traffic data on the Full link grayscale tab under Scenario-based Analysis in the ARMS console. Use these metrics to decide whether to proceed with the full rollout or roll back.