All Products
Search
Document Center

Platform For AI:Service groups and traffic management

Last Updated:Apr 01, 2026

A service group gives multiple inference services a single, shared traffic entry point. Traffic is distributed across services based on an allocation policy, letting you run canary releases, mix billing models for elastic scaling, and schedule heterogeneous GPU resources — all without changing the endpoint your clients call.

Use cases

Canary release

Add a production service and a canary service to the same group, with the canary receiving a small fraction of traffic. Deploy new model versions to the canary service first. If issues arise, roll back the canary service or set it to standalone so all traffic shifts to production. When the canary is stable, update the production service and scale the canary down to zero replicas or retain it at a low traffic share.

Mixed billing for elastic scaling

Run subscription-based services in dedicated resource groups with a fixed replica count to cover baseline demand. Add pay-as-you-go services in public resource groups to absorb traffic spikes. Both services share the same group endpoint, so clients are unaffected as the mix scales.

Heterogeneous GPU scheduling

In GPU-accelerated inference, a specific GPU type can become unavailable or run out of inventory in a region. Create services that target different GPU types within the same group to adapt to various CUDA environments. Because the group entry point stays constant, the frontend is unaware of the backend resource changes.

How service groups work

  • Each service group has one shared endpoint that routes traffic to member services according to the active allocation policy.

  • Each member service also has its own direct endpoint that bypasses group traffic rules — useful for zero-traffic testing of a new version before it receives any live traffic.

  • A service group is created automatically when you specify a new group name on a service. It is deleted automatically when all member services are removed.

  • Services added to a group do not receive group traffic by default. Enable traffic participation explicitly after adding a service.

Endpoint formats:

ScopeFormatExample
Service group<endpoint>/api/predict/<group_name>http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml
Individual service<endpoint>/api/predict/<group_name>.<service_name>http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml.pmml_prod

Use the group endpoint in production so that traffic rules apply. Use the individual service endpoint to test a specific service directly — for example, to validate a new version while it is still in standalone mode (zero traffic). Once a service is deleted, its individual endpoint is destroyed; the group endpoint remains unchanged.

Prerequisites

Before you begin, ensure that you have:

Create a service group

Specify the group name when creating a service. If the group does not exist, EAS creates it. If it already exists, the service is added to it.

The following steps create a group named pmml and add two services to it: pmml_prod (4 replicas) and pmml_grey (1 replica).

PAI console

  1. Log in to the PAI console. Select a region, then select your workspace and click Elastic Algorithm Service (EAS).

  2. On the Canary Release tab, click Create Group and Service.

  3. On the Custom Deployment page, set the following parameters and click Deploy: For all other parameters, see Deploy a model service in the PAI console.

    ParameterDescriptionExample
    Service NameA valid name for the service.pmml_prod
    GroupThe service group to assign this service to. Select New Group and enter the group name.pmml
  4. Repeat steps 2–3 to create the pmml_grey service, assigning it to the same pmml group.

After both services are created, click pmml on the Canary Release tab to open the group details page and verify that both services appear.

image
Important

Newly added services do not receive group traffic by default. Adjust the traffic allocation policy to include them.

EASCMD

  1. Create two service configuration files. Configuration for pmml_prod:

    {
      "name": "pmml_prod",
      "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr_xxxx.pmml",
      "processor": "pmml",
      "metadata": {
        "cpu": 1,
        "instance": 4,
        "group": "pmml"
      }
    }

    Configuration for pmml_grey:

    {
      "name": "pmml_grey",
      "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr_xxxx.pmml",
      "processor": "pmml",
      "metadata": {
        "cpu": 1,
        "instance": 1,
        "group": "pmml"
      }
    }

    The group field specifies which service group the service belongs to. For other parameters, see Parameters of model services.

  2. Create each service by running the create command with its configuration file:

    $ eascmd create service.json
  3. Verify the services and confirm the group was created:

    $ eascmd ls

    Expected output:

    [RequestId]: 716BEBFC-E8A4-51FD-A3F7-56376B167923
    +---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
    |        SERVICENAME        | INSTANCE | CPU | MEMORY |      CREATETIME      |      UPDATETIME      | STATUS  | WEIGHT | TRAFFICSTATE |       SERVICEGROUP        |
    +---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
    | pmml_prod                 |        4 |   1 | 1000M  | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running |     80 | grouping     | pmml                      |
    | pmml_grey                 |        1 |   1 | 1000M  | 2022-06-05T14:31:38Z | 2022-06-05T14:31:38Z | Running |     20 | grouping     | pmml                      |
    +---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+

    Both services show pmml in SERVICEGROUP and grouping in TRAFFICSTATE. Traffic is split 80%/20% because pmml_prod has 4 replicas and pmml_grey has 1. Each service receives traffic proportional to its replica count relative to the group total: 4 ÷ 5 = 80%, 1 ÷ 5 = 20%.

Modify traffic allocation

EAS supports two traffic allocation methods:

MethodHow traffic is calculatedWhen to use
Instance-based allocationEach service receives traffic proportional to its replica count. Traffic % = replicas ÷ total replicas in group.Default behavior; no configuration needed.
Custom weight-based allocationEach service receives traffic proportional to its assigned weight. Traffic % = weight ÷ sum of all weights in group.When you need precise control over traffic shares independent of replica counts.

Example — instance-based: Service A has 1 replica, Service B has 3 replicas. Total = 4. Service A gets 25%, Service B gets 75%.

Example — weight-based: Service A has weight 100, Service B has weight 400. Total = 500. Service A gets 20%, Service B gets 80%.

Important

When a service's traffic participation is disabled (set to standalone), it no longer receives group traffic but remains accessible via its individual service endpoint. This applies to both allocation methods.

Note

To adjust traffic weights and traffic state via the API, see ReleaseService — Adjust service traffic weights and traffic state.

Instance-based allocation

PAI console

Toggle the traffic switch in the Traffic Weight column for the target service. Turn it on to include the service in group traffic distribution; turn it off to exclude it.

image

EASCMD

Run the release command to change a service's traffic state:

$ eascmd release <service_name> -s grouping|standalone

Parameters:

ParameterDescription
<service_name>Name of the service to update.
groupingThe service participates in group traffic distribution.
standaloneThe service is excluded from group traffic distribution but remains accessible via its individual endpoint.

Example: exclude a service from group traffic

$ eascmd release pmml_grey -s standalone

Output:

Confirmed to release service [pmml_grey] to group traffic [Y/n]yes
[RequestId]: 40C787DF-8900-5F7A-8A01-30F7D5A8BF3B
[OK] Service [pmml_grey] has entered the traffic state: standalone

Run eascmd ls to verify. pmml_grey now shows TRAFFICSTATE=standalone and WEIGHT=0; all traffic goes to pmml_prod.

[RequestId]: 83BE3FBB-8CE2-5008-B435-1938A20B13AA
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
|        SERVICENAME        | INSTANCE | CPU | MEMORY |      CREATETIME      |      UPDATETIME      | STATUS  | WEIGHT | TRAFFICSTATE |       SERVICEGROUP        |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| pmml_prod                 |        4 |   1 | 1000M  | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running |    100 | grouping     | pmml                      |
| pmml_grey                 |        1 |   1 | 1000M  | 2022-06-05T14:42:41Z | 2022-06-05T14:42:41Z | Running |      0 | standalone   | pmml                      |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+

While pmml_grey is in standalone mode, use its direct endpoint to test it without affecting production traffic:

http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml.pmml_grey

Example: re-include a service in group traffic

$ eascmd release pmml_grey -s grouping

Output:

Confirmed to release service [pmml_grey] to group traffic [Y/n]yes
[RequestId]: 40C787DF-8900-5F7A-8A01-30F7D5A8BF3B
[OK] Service [pmml_grey] has entered the traffic state: grouping

Run eascmd ls to verify. pmml_grey returns to TRAFFICSTATE=grouping and resumes receiving 20% of traffic.

[RequestId]: 83BE3FBB-8CE2-5008-B435-1938A20B13AA
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
|        SERVICENAME        | INSTANCE | CPU | MEMORY |      CREATETIME      |      UPDATETIME      | STATUS  | WEIGHT | TRAFFICSTATE |       SERVICEGROUP        |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| pmml_prod                 |        4 |   1 | 1000M  | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running |     80 | grouping     | pmml                      |
| pmml_grey                 |        1 |   1 | 1000M  | 2022-06-05T14:42:41Z | 2022-06-05T14:42:41Z | Running |     20 | grouping     | pmml                      |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+

Custom weight-based allocation

Edit the value directly in the Traffic Weight column on the Canary Release tab.

image

What's next