Create service groups and configure traffic distribution for canary releases - Platform For AI - Alibaba Cloud - Platform For AI

Use cases

Canary release

Add a production service and a canary service to the same group. Then, allocate a small amount of traffic to the canary service. When you release a new version, update the canary service first and monitor its performance. If issues occur, roll back the canary service or stop it and shift its traffic to the production service. If the new version runs as expected, you can deploy it to the production service. After the update, you can scale the canary service down to zero replicas or keep it to handle a small amount of traffic.
Subscription-based and pay-as-you-go elastic scaling

To handle baseline demand, deploy subscription-based services in a dedicated resource group with a fixed replica count. To manage traffic spikes and reduce costs, use pay-as-you-go services in a public resource group.
Use heterogeneous hardware resources

In GPU-accelerated scenarios, if a specific GPU model becomes unavailable in a region, it can prevent a service from scaling out. A service group allows you to overcome this by dynamically creating services with different GPU models that are compatible with various CUDA environments. This allows multiple services to use heterogeneous resources to support the same business scenario. Because the service group's traffic endpoint remains unchanged, these changes are transparent to the front end.

Create a service group

You can assign a service to a service group when you create the service.

Note

When you create a service and assign it to a group, the system creates the group if it does not exist. If the group already exists, the service is simply added to it. The group is automatically deleted when its last member service is removed.

The following example shows how to create a service group named pmml and assign two services, pmml_prod and pmml_grey, to it.

Console

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
On the Canary Release tab, click Create Group and Service.
On the Custom Deployment page, configure the parameters and click Deploy.
Key parameters:
- Service Name: Configure the service name based on the on-screen prompts. In this example, set it to pmml_prod.
- Group: Select New Group and set the value to pmml for this example.
For information about other parameters, see Custom Deployment.

Repeat steps 2 and 3 to create the pmml_grey service and assign it to the pmml service group.

After the services are created, go to the Canary Release tab and click the group name pmml to go to the group details page. This page shows the list of services that belong to the group.

The group details page lists two services: The pmml_grey service has traffic allocation disabled and no assigned weight, while the pmml_prod service has it enabled with a weight of 100.

Important

By default, newly added services do not receive traffic from the group. To enable traffic, see Modify the traffic allocation policy.

EASCMD

Prepare the service.json configuration file for the service. Examples are provided below.

Click to view the configuration file for the pmml_prod service

{
  "name":"pmml_prod",
  "model_path":"http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr_xxxx.pmml",
  "processor":"pmml",
  "metadata":{
    "cpu":1,
    "instance":4,
    "group":"pmml"
  }
}

Click to view the configuration file for the pmml_grey service

{
  "name":"pmml_grey",
  "model_path":"http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr_xxxx.pmml",
  "processor":"pmml",
  "metadata":{
    "cpu":1,
    "instance":1,
    "group":"pmml"
  }
}

The group field specifies the name of the service group to which the service belongs. For information about other parameters, see JSON deployment.

Create the services and the service group.
After you log on to the EASCMD client, run the create command to create the services and the service group. For more information, see Download and authenticate the client. The following example shows how to use the command:
```
$ eascmd create service.json
```

View the details of the services and the group.

You can run the ls command to view the details of the services and the group. The following example shows how to use the command:

$ eascmd ls

The following output is returned:

[RequestId]: 716BEBFC-E8A4-51FD-A3F7-56376B167923
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
|        SERVICENAME        | INSTANCE | CPU | MEMORY |      CREATETIME      |      UPDATETIME      | STATUS  | WEIGHT | TRAFFICSTATE |       SERVICEGROUP        |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| pmml_prod                 |        4 |   1 | 1000M  | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running |     80 | grouping     | pmml                      |
| pmml_grey                 |        1 |   1 | 1000M  | 2022-06-05T14:31:38Z | 2022-06-05T14:31:38Z | Running |     20 | grouping     | pmml                      |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+

Description:

The value of SERVICEGROUP is pmml, which indicates that both services belong to the pmml service group.
The value of TRAFFICSTATE is grouping, which indicates that both services are receiving traffic. The traffic is split 80/20 between the services, proportional to their replica counts.

View traffic endpoints

A service group has a unified traffic endpoint, and each service within the group also has its own individual endpoint.

Service group endpoint

<endpoint>/api/predict/<group_name>

Example:

http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml

On the Canary Release tab, you can view the traffic endpoint for the service group. Traffic to this endpoint is distributed to different services based on the allocation policy. The endpoint address remains constant even when services are added or removed from the group, which makes it reliable for online debugging.

Click Call Information for the group. In the dialog box that appears, you can switch between the Public Endpoint for the Service Group and VPC Endpoint for the Service Group tabs to view the corresponding invocation addresses and token information.

Individual service endpoint

<endpoint>/api/predict/<group_name>.<service_name>

Example:

http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml.pmml_prod

On the Inference Service tab, you can view the traffic endpoint for an individual service. This endpoint directs traffic exclusively to that service and is deleted when the service is removed. You can use this individual endpoint for targeted debugging, even if the service is not receiving traffic from the group.

In the Actions column of the target service, click Call Information. In the dialog box that appears, select the Public Endpoint Invocation tab to obtain the public invocation URL and token.

Modify the traffic allocation policy

EAS currently supports two traffic allocation methods:

Replica Allocation: Traffic is dynamically allocated based on the replica count of each service. For example, if Service A has 1 replica and Service B has 3 replicas, Service A receives 25% of the traffic, and Service B receives 75%.
Custom Allocation: Traffic is allocated based on a custom weight assigned to each service. For example, if Service A has a traffic weight of 100 and Service B has a traffic weight of 400, Service A receives 20% of the traffic, and Service B receives 80%.

Important

Regardless of the allocation method, a service with traffic allocation disabled stops receiving traffic from the group but remains accessible via its individual endpoint.

The following sections describe how to modify the traffic allocation policy.

Note

You can also adjust the service traffic weight and state by using an API. For more information, see Adjust service traffic weight and traffic state.

Replica count

Console

Turn on the switch in the traffic allocation column to include the service in traffic distribution. Turn it off to exclude the service.

At the top of the page, click Allocate Traffic by Instance Count to distribute traffic based on replica count, instead of using Traffic Weight.

EASCMD

Run the release command in the following format. For information about how to log on to the EASCMD client, see Download and authenticate the client.

$ eascmd release <service_name> -s grouping|standalone

Parameters:

<service_name>: The name of the service to be modified.
grouping|standalone: The new state. grouping indicates that the service receives group traffic. standalone indicates that the service does not receive group traffic.

Examples:

Change the state of the pmml_grey service to standalone so that it no longer receives traffic. Run the following command:

$ eascmd release pmml_grey -s standalone

The following output is returned:

Confirmed to release service [pmml_grey] to group traffic [Y/n]yes
[RequestId]: 40C787DF-8900-5F7A-8A01-30F7D5A8BF3B
[OK] Service [pmml_grey] has entered the traffic state: standalone

Run the eascmd ls command to view the running status of the service. The following output is returned:

[RequestId]: 83BE3FBB-8CE2-5008-B435-1938A20B13AA
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
|        SERVICENAME        | INSTANCE | CPU | MEMORY |      CREATETIME      |      UPDATETIME      | STATUS  | WEIGHT | TRAFFICSTATE |       SERVICEGROUP        |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| pmml_prod                 |        4 |   1 | 1000M  | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running |    100 | grouping     | pmml                      |
| pmml_grey                 |        1 |   1 | 1000M  | 2022-06-05T14:42:41Z | 2022-06-05T14:42:41Z | Running |     0  | standalone   | pmml                      |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+

The pmml_grey service's TRAFFICSTATE is updated to standalone, and its WEIGHT is 0. This indicates that all group traffic is now handled by the pmml_prod service.

Change the state of the pmml_grey service to grouping so that it receives traffic. Run the following command:

$ eascmd release pmml_grey -s grouping

The following output is returned:

Confirmed to release service [pmml_grey] to group traffic [Y/n]yes
[RequestId]: 40C787DF-8900-5F7A-8A01-30F7D5A8BF3B
[OK] Service [pmml_grey] has entered the traffic state: grouping

Run the eascmd ls command to view the running status of the service. The following output is returned:

[RequestId]: 83BE3FBB-8CE2-5008-B435-1938A20B13AA
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
|        SERVICENAME        | INSTANCE | CPU | MEMORY |      CREATETIME      |      UPDATETIME      | STATUS  | WEIGHT | TRAFFICSTATE |       SERVICEGROUP        |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| pmml_prod                 |        4 |   1 | 1000M  | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running |     80 | grouping     | pmml                      |
| pmml_grey                 |        1 |   1 | 1000M  | 2022-06-05T14:42:41Z | 2022-06-05T14:42:41Z | Running |     20 | grouping     | pmml                      |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+

The pmml_grey service's TRAFFICSTATE is updated to grouping, and it now receives 20% of the traffic, proportional to its replica count.

Custom weight

You can directly edit the values in the Traffic Weight column.

Click the edit icon next to a service's Traffic Weight value, or click Allocate Traffic by Custom Weight to enter batch allocation mode.