Configure elastic scaling for PAI EAS dedicated resource groups - Platform For AI

An elastic resource pool is a service-level hybrid resource scheduling strategy provided by EAS. It allows services deployed in a dedicated resource group to automatically “overflow” new replicas to a pay-as-you-go public resource group during traffic spikes when dedicated resources are insufficient. This ensures your service retains elastic scaling capability.

Benefits

Overcome resource group capacity limits: When resources in the dedicated resource group run low, service scale-out can automatically use on-demand instances from the public resource pool. This ensures horizontal auto-scaling (based on metrics such as QPS or CPU) is not limited by the number of physical nodes in the dedicated resource group.
Optimize costs: Use low-cost subscription resources to handle baseline traffic and only consume on-demand resources during peak periods. This avoids holding redundant dedicated resources long-term just to cover peak loads, reducing overall resource costs.

How it works

The elastic resource pool uses a clear priority order for scheduling to balance cost and stability.

Scale-out
1. When a service triggers scale-out, the EAS scheduler first tries to create new instances within the service’s dedicated resource group.
2. If the dedicated resource group lacks sufficient resources—such as no available nodes or insufficient remaining CPU, memory, or GPU—the scheduler creates new instances in the public resource group using the resource specification (instance type) you defined when configuring the elastic resource pool.
Scale-in
1. When a service triggers scale-in, the system preferentially selects and terminates instances running in the public resource group.
2. Only after all elastic instances in the public resource group are scaled in does the system begin scaling in instances from the dedicated resource group. This ensures higher-cost elastic resources are released first while retaining more stable baseline instances.

Prerequisites

You have created a dedicated resource group. For more information, see Use EAS resource groups.

Configuration methods

Enable elastic resource pool during service deployment

Console configuration

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

In the Custom Deployment page, go to the Resource Information section and configure the following key parameters. For other parameters, see Custom deployment.

Parameter	Description
Resource Type	Select EAS Resource Group.
Resource Group	Select an existing dedicated resource group.
Elastic Resource Pool	Turn on the Elastic Resource Pool switch and select a Resource Specification. After configuring the elastic resource pool, if machine resources become insufficient during scale-out, newly created instances automatically start on the configured pay-as-you-go public resources and are billed under the public resource group. During scale-in, instances in the public resource group are reduced first.

Click Deploy.

JSON configuration

The key parameters for enabling the elastic resource pool are listed below. For other parameters, see JSON deployment.

resource_burstable: Set to true to enable the elastic resource pool.
cloud.networking: To ensure network availability during elastic scaling, configure the service’s virtual private cloud (VPC) using this field when enabling the elastic resource pool.

Important
When using a dedicated resource group, you can configure a virtual private cloud at the resource group level, but you can configure the dedicated resource group only at the service level (through the cloud.networking field).
cloud.computing: Specifies the instance types available for scale-out to public resources. For more information, see Use public resources.

The following example shows a JSON configuration file.

{
  "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/lr.pmml",
  "name": "test_burstable_service",
  "processor": "pmml",
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "resource": "eas-r-xxx",
    "resource_burstable": true
  },
   "cloud": {
        "computing": {
            "instance_type": "ecs.r7.2xlarge"
        },
        "networking": {
            "security_group_id": "sg-uf68iou5an8j7sxd****",
            "vswitch_id": "vsw-uf6nji7pzztuoe9i7****"
        }
    }
}

Enable or disable elastic resource pool after service deployment

Using the console

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
Click Update in the Actions column for your deployed service.
In the Update Service page, go to the Resource Information section to enable or disable elastic scaling.
- Enable elastic scaling
  
  In the Resource Information section, turn on the Elastic Resource Pool switch and configure the public resource group specification.
- Disable elastic scaling
  
  In the Resource Information section, turn off the Elastic Resource Pool switch.
Click Update.

Using a local client

Use the following commands to quickly enable or disable auto scaling for an already deployed service. The example uses the Windows 64-bit version:

Important

If you did not configure the cloud.networking parameter when deploying the service in a dedicated resource group, instances scaled out to the public resource group will not support direct network connectivity after enabling auto scaling.

# Enable auto scaling for service resources.
eascmdwin64.exe modify <service_name> -Dmetadata.resource_burstable=true
# Disable auto scaling for service resources.
eascmdwin64.exe modify <service_name> -Dmetadata.resource_burstable=false

Replace <service_name> with your deployed service name.

Important

Enabling or disabling auto scaling only affects newly created service instances. Existing instances are not automatically migrated. For example, if two instances are in pending state when scale-out completes, enabling auto scaling afterward will not move those pending instances to the public resource group. You can restart them in the console so they reschedule to public resources. Similarly, if you disable auto scaling for a service that already has instances running in public resources, those instances will not automatically move back to the dedicated resource group.

References

Enable horizontal auto-scaling to let the system automatically scale based on your configured metrics. For more information, see Horizontal auto-scaling.
To automatically scale the number of replicas to a specific value on a schedule, see Scheduled auto-scaling.