A service group gives multiple inference services a single, shared traffic entry point. Traffic is distributed across services based on an allocation policy, letting you run canary releases, mix billing models for elastic scaling, and schedule heterogeneous GPU resources — all without changing the endpoint your clients call.
Use cases
Canary release
Add a production service and a canary service to the same group, with the canary receiving a small fraction of traffic. Deploy new model versions to the canary service first. If issues arise, roll back the canary service or set it to standalone so all traffic shifts to production. When the canary is stable, update the production service and scale the canary down to zero replicas or retain it at a low traffic share.
Mixed billing for elastic scaling
Run subscription-based services in dedicated resource groups with a fixed replica count to cover baseline demand. Add pay-as-you-go services in public resource groups to absorb traffic spikes. Both services share the same group endpoint, so clients are unaffected as the mix scales.
Heterogeneous GPU scheduling
In GPU-accelerated inference, a specific GPU type can become unavailable or run out of inventory in a region. Create services that target different GPU types within the same group to adapt to various CUDA environments. Because the group entry point stays constant, the frontend is unaware of the backend resource changes.
How service groups work
Each service group has one shared endpoint that routes traffic to member services according to the active allocation policy.
Each member service also has its own direct endpoint that bypasses group traffic rules — useful for zero-traffic testing of a new version before it receives any live traffic.
A service group is created automatically when you specify a new group name on a service. It is deleted automatically when all member services are removed.
Services added to a group do not receive group traffic by default. Enable traffic participation explicitly after adding a service.
Endpoint formats:
| Scope | Format | Example |
|---|---|---|
| Service group | <endpoint>/api/predict/<group_name> | http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml |
| Individual service | <endpoint>/api/predict/<group_name>.<service_name> | http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml.pmml_prod |
Use the group endpoint in production so that traffic rules apply. Use the individual service endpoint to test a specific service directly — for example, to validate a new version while it is still in standalone mode (zero traffic). Once a service is deleted, its individual endpoint is destroyed; the group endpoint remains unchanged.
Prerequisites
Before you begin, ensure that you have:
A PAI workspace with Elastic Algorithm Service (EAS) enabled
EASCMD client installed and authenticated (required for CLI steps). See Download and authenticate the client
Create a service group
Specify the group name when creating a service. If the group does not exist, EAS creates it. If it already exists, the service is added to it.
The following steps create a group named pmml and add two services to it: pmml_prod (4 replicas) and pmml_grey (1 replica).
PAI console
Log in to the PAI console. Select a region, then select your workspace and click Elastic Algorithm Service (EAS).
On the Canary Release tab, click Create Group and Service.
On the Custom Deployment page, set the following parameters and click Deploy: For all other parameters, see Deploy a model service in the PAI console.
Parameter Description Example Service Name A valid name for the service. pmml_prodGroup The service group to assign this service to. Select New Group and enter the group name. pmmlRepeat steps 2–3 to create the
pmml_greyservice, assigning it to the samepmmlgroup.
After both services are created, click pmml on the Canary Release tab to open the group details page and verify that both services appear.

Newly added services do not receive group traffic by default. Adjust the traffic allocation policy to include them.
EASCMD
Create two service configuration files. Configuration for
pmml_prod:{ "name": "pmml_prod", "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr_xxxx.pmml", "processor": "pmml", "metadata": { "cpu": 1, "instance": 4, "group": "pmml" } }Configuration for
pmml_grey:{ "name": "pmml_grey", "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr_xxxx.pmml", "processor": "pmml", "metadata": { "cpu": 1, "instance": 1, "group": "pmml" } }The
groupfield specifies which service group the service belongs to. For other parameters, see Parameters of model services.Create each service by running the
createcommand with its configuration file:$ eascmd create service.jsonVerify the services and confirm the group was created:
$ eascmd lsExpected output:
[RequestId]: 716BEBFC-E8A4-51FD-A3F7-56376B167923 +---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+ | SERVICENAME | INSTANCE | CPU | MEMORY | CREATETIME | UPDATETIME | STATUS | WEIGHT | TRAFFICSTATE | SERVICEGROUP | +---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+ | pmml_prod | 4 | 1 | 1000M | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running | 80 | grouping | pmml | | pmml_grey | 1 | 1 | 1000M | 2022-06-05T14:31:38Z | 2022-06-05T14:31:38Z | Running | 20 | grouping | pmml | +---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+Both services show
pmmlinSERVICEGROUPandgroupinginTRAFFICSTATE. Traffic is split 80%/20% becausepmml_prodhas 4 replicas andpmml_greyhas 1. Each service receives traffic proportional to its replica count relative to the group total: 4 ÷ 5 = 80%, 1 ÷ 5 = 20%.
Modify traffic allocation
EAS supports two traffic allocation methods:
| Method | How traffic is calculated | When to use |
|---|---|---|
| Instance-based allocation | Each service receives traffic proportional to its replica count. Traffic % = replicas ÷ total replicas in group. | Default behavior; no configuration needed. |
| Custom weight-based allocation | Each service receives traffic proportional to its assigned weight. Traffic % = weight ÷ sum of all weights in group. | When you need precise control over traffic shares independent of replica counts. |
Example — instance-based: Service A has 1 replica, Service B has 3 replicas. Total = 4. Service A gets 25%, Service B gets 75%.
Example — weight-based: Service A has weight 100, Service B has weight 400. Total = 500. Service A gets 20%, Service B gets 80%.
When a service's traffic participation is disabled (set to standalone), it no longer receives group traffic but remains accessible via its individual service endpoint. This applies to both allocation methods.
To adjust traffic weights and traffic state via the API, see ReleaseService — Adjust service traffic weights and traffic state.
Instance-based allocation
PAI console
Toggle the traffic switch in the Traffic Weight column for the target service. Turn it on to include the service in group traffic distribution; turn it off to exclude it.

EASCMD
Run the release command to change a service's traffic state:
$ eascmd release <service_name> -s grouping|standaloneParameters:
| Parameter | Description |
|---|---|
<service_name> | Name of the service to update. |
grouping | The service participates in group traffic distribution. |
standalone | The service is excluded from group traffic distribution but remains accessible via its individual endpoint. |
Example: exclude a service from group traffic
$ eascmd release pmml_grey -s standaloneOutput:
Confirmed to release service [pmml_grey] to group traffic [Y/n]yes
[RequestId]: 40C787DF-8900-5F7A-8A01-30F7D5A8BF3B
[OK] Service [pmml_grey] has entered the traffic state: standaloneRun eascmd ls to verify. pmml_grey now shows TRAFFICSTATE=standalone and WEIGHT=0; all traffic goes to pmml_prod.
[RequestId]: 83BE3FBB-8CE2-5008-B435-1938A20B13AA
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| SERVICENAME | INSTANCE | CPU | MEMORY | CREATETIME | UPDATETIME | STATUS | WEIGHT | TRAFFICSTATE | SERVICEGROUP |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| pmml_prod | 4 | 1 | 1000M | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running | 100 | grouping | pmml |
| pmml_grey | 1 | 1 | 1000M | 2022-06-05T14:42:41Z | 2022-06-05T14:42:41Z | Running | 0 | standalone | pmml |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+While pmml_grey is in standalone mode, use its direct endpoint to test it without affecting production traffic:
http://182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/pmml.pmml_greyExample: re-include a service in group traffic
$ eascmd release pmml_grey -s groupingOutput:
Confirmed to release service [pmml_grey] to group traffic [Y/n]yes
[RequestId]: 40C787DF-8900-5F7A-8A01-30F7D5A8BF3B
[OK] Service [pmml_grey] has entered the traffic state: groupingRun eascmd ls to verify. pmml_grey returns to TRAFFICSTATE=grouping and resumes receiving 20% of traffic.
[RequestId]: 83BE3FBB-8CE2-5008-B435-1938A20B13AA
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| SERVICENAME | INSTANCE | CPU | MEMORY | CREATETIME | UPDATETIME | STATUS | WEIGHT | TRAFFICSTATE | SERVICEGROUP |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+
| pmml_prod | 4 | 1 | 1000M | 2022-06-05T14:30:49Z | 2022-06-05T14:30:49Z | Running | 80 | grouping | pmml |
| pmml_grey | 1 | 1 | 1000M | 2022-06-05T14:42:41Z | 2022-06-05T14:42:41Z | Running | 20 | grouping | pmml |
+---------------------------+----------+-----+--------+----------------------+----------------------+---------+--------+--------------+---------------------------+Custom weight-based allocation
Edit the value directly in the Traffic Weight column on the Canary Release tab.
