All Products
Search
Document Center

Platform For AI:Deploy services with JSON configuration files

Last Updated:Mar 24, 2026

Define and deploy Elastic Algorithm Service (EAS) online services using JSON configuration files with customizable resource, networking, and container parameters.

Quick start

Prepare a JSON configuration file

Create a JSON configuration file with required settings. For first-time users, configure parameters on the Custom Model Deployment > Custom Deployment page. The system automatically generates the corresponding JSON, which can then be modified and extended.

The following example shows a service.json file. For parameter details, see JSON parameters.

{
    "metadata": {
        "name": "demo",
        "instance": 1,
        "workspace_id": "your-workspace-id"
    },
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ecs.c7a.large"
                }
            ]
        }
    },
    "containers": [
        {
            "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/python-inference:py39-ubuntu2004",
            "script": "python app.py",
            "port": 8000
        }
    ]
}

Deploy the service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. On the Inference Service tab, click Deploy Service. In the Custom Model Deployment section, select JSON Deployment.

  3. Paste your JSON configuration file content and click Deploy. Deployment is complete when the service status changes to running.

JSON parameters

Parameter

Required

Description

metadata

Yes

Service metadata. For parameter details, see metadata parameter descriptions.

cloud

No

Compute resources and VPC configuration. For details, see cloud parameter descriptions.

containers

No

Container image configuration. For details, see containers parameter descriptions.

dockerAuth

No

Authentication credentials for a private repository. Value is a Base64-encoded string of the repository's username:password.

networking

No

Network and invocation configuration. For parameter details, see networking parameter descriptions.

storage

No

Storage mount configuration to mount data from OSS or NAS into the container. For configuration details, see Mount storage.

token

No

Access token for service authentication. If omitted, the system automatically generates a token.

aimaster

No

Enables Computing power check and fault tolerance for multi-node distributed inference services.

model_path

Yes

Required when deploying a service with a processor. The model_path and processor_path parameters specify paths of the input data sources for the model and processor. Supported address formats:

  • OSS address: Points to a specific file or directory.

  • HTTP address: File must be a compressed package, such as TAR.GZ, TAR, BZ2, or ZIP.

  • Local path: Use a local path if using the test command for local debugging.

oss_endpoint

No

OSS endpoint. Example: oss-cn-beijing.aliyuncs.com. For other values, see Regions and endpoints.

Note

By default, this parameter can be omitted. The system uses the internal OSS endpoint of the current region to download model or processor files. This parameter is required for cross-region OSS access. For example, if you deploy a service in the China (Hangzhou) region and specify an OSS address in the China (Beijing) region for the model_path parameter, specify the public OSS endpoint for the China (Beijing) region.

model_entry

No

Entry file of the model. Can be any file. If unspecified, the system uses the file name from model_path. The system passes the main file path to the initialize() function in the processor.

model_config

No

Model configuration. Any text is supported. The system passes this value as the second argument to the initialize() function in the processor.

processor

No

  • If using an official pre-built processor, specify the processor code. For information about processor codes used in eascmd, see Pre-built processors.

  • If using a custom processor, omit this parameter. Instead, configure the processor_path, processor_entry, processor_mainclass, and processor_type parameters.

processor_path

No

Path of the processor's package. For more information, see the model_path parameter description.

processor_entry

No

Main file of the processor. Examples: libprocessor.so and app.py. File must contain implementations of the initialize() and process() functions required for inference.

This parameter is required when processor_type is set to cpp or python.

processor_mainclass

No

Main class of the processor in the JAR package. Example: com.aliyun.TestProcessor.

This parameter is required when processor_type is set to java.

processor_type

No

Implementation language of the processor. Valid values:

  • cpp

  • java

  • python

warm_up_data_path

No

Path to the request file used for model prefetch. For more information about model prefetch, see Prefetch a model service.

runtime.enable_crash_block

No

Specifies whether a service instance automatically restarts after crashing due to an exception in processor code. Valid values:

  • true: Service instance does not automatically restart. This allows you to retain the environment for troubleshooting.

  • false: Service instance automatically restarts. This is the default value.

autoscaler

No

Configuration for horizontal auto scaling. For parameter details, see Horizontal auto scaling.

labels

No

Labels for the EAS service. Format is key:value.

unit.size

No

Number of machines deployed for a single instance in a distributed inference configuration. Default value is 2.

sinker

No

Persists all service requests and responses to MaxCompute or Simple Log Service (SLS). For parameter details, see sinker parameter descriptions.

confidential

No

Enables a secure and verifiable inference service by configuring a system trust management service. This ensures data, models, and code are securely encrypted during service deployment and invocation. Format:

Note

The secure encryption environment is primarily for files on your mounted storage. Mount storage before enabling this feature.

"confidential": {
        "trustee_endpoint": "xxxx",
        "decryption_key": "xxxx"
    }

The following list describes the parameters.

  • trustee_endpoint: URI of the system trust management service, Trustee.

  • decryption_key: KBS URI of the decryption key. Example: kbs:///default/key/test-key.

Metadata parameters

General parameters

Parameter

Required

Description

name

Yes

A unique name for the service within the region.

instance

Yes

Number of service instances.

workspace_id

No

Workspace ID. If specified, the service is restricted to this PAI workspace. Example: 1405**.

cpu

No

Number of CPU cores required by each instance.

memory

No

Memory required for each instance, specified as an integer in MB. For example, "memory": 4096 indicates each instance requires 4 GB of memory.

gpu

No

Number of GPUs required by each instance.

gpu_memory

No

Enables GPU slicing, allowing multiple instances to share a single GPU. This feature requires an EAS resource group or resource quotas.

gpu_core_percentage

qos

No

Quality of service (QoS) for the instance. Valid values are empty or BestEffort. When qos is set to BestEffort, CPU sharing mode is enabled. In this mode, instances are scheduled based on system memory and GPU memory requirements and are not limited by the number of CPU cores on the node. All instances on the node share CPU resources. The cpu parameter then specifies the maximum CPU quota that a single instance can use.

resource

No

ID of the resource group. Set as follows:

  • If the service is deployed to a public resource group, omit this parameter. The service is billed on a pay-as-you-go basis.

  • If the service is deployed to a dedicated resource group, set this parameter to the resource group ID. Example: eas-r-6dbzve8ip0xnzt****.

cuda

No

CUDA version required by the service. At runtime, the specified CUDA version is automatically mounted to the /usr/local/cuda directory of the instance.

Supported versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: "cuda":"11.2".

rdma

No

Set to 1 to enable RDMA networking for distributed inference. If omitted, RDMA networking is disabled.

Note

Currently, RDMA networking is available only for services deployed with Lingjun intelligent computing resources.

enable_grpc

No

Enables gRPC for the service gateway. Valid values:

  • false (default): Service gateway does not enable gRPC and supports HTTP requests by default.

  • true: Service gateway enables gRPC connections.

Note

If you deploy a service with a custom image where the server-side implementation is gRPC, set this parameter to true to switch the gateway protocol.

enable_webservice

No

Specifies whether to deploy the service as an AI web application. Valid values:

  • false (default): A web server is not enabled.

  • true: A web server is enabled.

type

No

Set this parameter to LLMGatewayService to deploy an LLM intelligent router. For information on how to configure the JSON file, see Deploy an LLM intelligent router.

Advanced parameters

Important

Adjust these parameters with caution.

Parameter

Required

Description

rpc

batching

No

Specifies whether to enable server-side batching to accelerate GPU models. This feature is supported only in pre-built processor mode. Valid values:

  • false (default): Disables server-side batching.

  • true: Enables server-side batching.

keepalive

No

Maximum processing time for a single request, in milliseconds. If a request exceeds this time, the server returns a 408 timeout error and closes the connection. Default value: 600000 for a dedicated gateway. This parameter is not supported for Application Load Balancer (ALB) dedicated gateways.

io_threads

No

Number of network I/O threads per instance. Default value: 4.

max_batch_size

No

Maximum size of each batch. Default value: 16. This parameter is supported only in pre-built processor mode and takes effect only when rpc.batching is set to true.

max_batch_timeout

No

Maximum timeout for each batch, in milliseconds. Default value: 50. This parameter is supported only in pre-built processor mode and takes effect only when rpc.batching is set to true.

max_queue_size

No

For an asynchronous inference service, this parameter specifies the maximum queue length. Default value: 64. If the queue is full, the server returns a 450 error and closes the connection. To prevent server overload, the queue can proactively notify the client to retry the request on other instances. For services with long response times, consider reducing the queue length to avoid request backlogs and timeouts.

worker_threads

No

Number of worker threads per instance for concurrent request processing. Default value: 5. This parameter is supported only in pre-built processor mode.

rate_limit

No

Enables QPS rate limiting and specifies the maximum QPS that an instance can handle. A value of 0 (the default) disables this feature.

For example, if you set this parameter to 2000, requests are rejected with a 429 (Too Many Requests) error when the QPS exceeds 2000.

enable_sigterm

No

Specifies whether to send a SIGTERM signal when an instance enters the terminating state. Valid values:

  • false (default): A SIGTERM signal is not sent.

  • true: When an instance enters the terminating state, the system sends a SIGTERM signal to the main process. Your service process must handle this signal to perform a custom graceful termination. If the signal is not handled, the main process might exit immediately, causing the graceful termination to fail.

rolling_strategy

max_surge

No

During a rolling update, this is the maximum number of additional instances that can be created above the desired instance count. This value can be a positive integer or a percentage, such as "2%". The default value is "2%". Increasing this value can accelerate the update process.

For example, if the service has 100 instances and this parameter is set to 20, the system immediately creates 20 new instances when the update begins.

max_unavailable

No

During a rolling update, this is the maximum number of instances that can be unavailable. This parameter frees up resources for new instances, which prevents updates from stalling due to insufficient capacity. The default value is 1 in a dedicated resource group and 0 in a public resource group.

For example, if this parameter is set to N, N old instances are stopped immediately when the update begins.

Note

If you have sufficient idle resources, set this parameter to 0. A high value might affect service stability because the reduced number of available instances increases the load on each remaining instance. Balance service stability with resource availability when configuring this parameter.

eas.termination_grace_period

No

Graceful termination period for an instance, in seconds. Default value: 30.

EAS services use a rolling update strategy. An instance first enters a Terminating state, during which traffic is routed away from it. The instance then waits for the specified period to finish processing in-flight requests before it shuts down. If your requests take a long time to process, you can increase this value to ensure all requests are completed during an update.

Important

Reducing this value can affect service stability. Setting it too high can slow down the update process. Do not change this parameter unless you have a specific requirement.

scheduling

spread.policy

No

Distribution policy for scheduling service instances. Supported policies:

  • host: Spreads instances across different nodes.

  • zone: Spreads instances across different availability zones.

  • default: Uses the default scheduling policy without an active distribution strategy.

Configuration example:

{
  "metadata": {
    "scheduling": {
      "spread": {
        "policy": "host"
      }
    }
}

resource_rebalancing

No

Specifies whether to enable resource rebalancing. Valid values:

  • false (default): Disables this feature.

  • true: Enables EAS to periodically create probe instances on high-priority resources. If a probe instance is scheduled successfully, the system creates more probe instances exponentially until scheduling fails. A successfully scheduled probe instance, once ready, replaces an instance running on a lower-priority resource.

This feature helps resolve the following issues:

  • During a rolling update, terminating instances can occupy resources, forcing new instances to start in a public resource group before being rescheduled to the dedicated resource group.

  • When you use both spot instances and regular instances, the system periodically checks for available spot instances. If available, it migrates instances from regular instances to spot instances.

workload_type

No

To deploy an EAS service as a job, set this parameter to elasticjob. For more information about the Elastic Job service, see Elastic Job service.

resource_burstable

No

Enables the elastic resource pool feature for an EAS service that is deployed in a dedicated resource group. Valid values:

  • true: Enables the feature.

  • false: Disables the feature.

shm_size

No

Shared memory size for the instance, in GB. Shared memory provides direct memory access without data replication or transfer.

Cloud parameters

Parameter

Required

Description

computing

instances

No

Specifies the list of instance specifications for deploying a service in a public resource group. If a bid for an instance specification fails or its inventory is insufficient, the system attempts to create the service by using the next instance specification in the configured order.

  • type: The instance specification type.

  • spot_price_limit:

    • If set, this parameter defines the instance as a spot instance with the specified maximum price limit. The unit is USD.

    • If this parameter is omitted, the instance is a regular pay-as-you-go instance.

  • capacity: The maximum number of instances of this type. This can be a number, such as 500, or a percentage in a string, such as "20%". If this limit is reached, the system stops creating instances of this type, even if more inventory is available.

    For example, if a service has a total of 200 instances and the capacity for instance type A is set to "20%", the service uses a maximum of 40 instances of type A. The system creates the remaining instances by using other specifications.

disable_spot_protection_period

No

This parameter applies only to spot instances. Valid values:

  • false (default): After a spot instance is created, it has a default 1-hour protection period. During this period, the instance is not reclaimed even if the market price exceeds your bid.

  • true: Disables the protection period. An instance without a protection period is typically about 10% cheaper than one with a protection period.

networking

vpc_id

No

Specifies the VPC, vSwitch, and security group for the EAS service.

vswitch_id

No

security_group_id

No

Example:

{
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ecs.c8i.2xlarge",
                    "spot_price_limit": 1
                },
                {
                    "type": "ecs.c8i.xlarge",
                    "capacity": "20%"
                }
            ],
            "disable_spot_protection_period": false
        },
        "networking": {
            "vpc_id": "vpc-bp1oll7xawovg9*****",
            "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
            "security_group_id": "sg-bp1ej061cnyfn0b*****"
        }
    }
}

Container parameters

To deploy a service using a custom image, see Deploy services with custom images.

Parameter

Required

Description

image

Yes

The URI of the container image for the model service.

env

name

No

The name of the environment variable.

value

No

The value of the environment variable.

command

Either command or script is required.

The entry point command for the container. Only single commands are supported. For complex scripts, such as cd xxx && python app.py, use the script parameter instead. Use this field for images that do not have a /bin/sh command.

script

The entry point script for the container. Use \n or a semicolon (;) to separate multiple commands.

port

No

The container port.

Important
  • Avoid ports 8080 and 9090, which are reserved for the EAS engine.

  • This port must match the port that the application in your command or script listens on.

prepare

pythonRequirements

No

List of Python packages to install before the service instance starts. The image must have python and pip commands in its system path. Example:

"prepare": {
  "pythonRequirements": [
    "numpy==1.16.4",
    "absl-py==0.11.0"
  ]
}

pythonRequirementsPath

No

The path to a requirements.txt file. Packages from this file are installed before the service instance starts. The image must have python and pip commands in its system path. The requirements.txt file can be included in the image or mounted from external storage. Example:

"prepare": {
  "pythonRequirementsPath": "/data_oss/requirements.txt"
}

Networking parameters

Parameter

Required

Description

gateway

No

Specifies the dedicated gateway for the EAS service.

gateway_policy

No

  • rate_limit: The maximum number of requests per second that the service can receive.

    • enable: Specifies whether to enable rate limiting.

    • limit: The maximum number of requests per second.

      Note

      Services using a shared gateway have a default single-service limit of 1,000 QPS and a server group limit of 10,000 QPS. Dedicated gateways do not have a default value.

  • concurrency_limit: The maximum number of in-flight requests that can be processed concurrently. This setting is not supported for dedicated gateways that use an Application Load Balancer (ALB).

    • enable: Specifies whether to enable concurrency control.

    • limit: The maximum number of concurrent requests.

Example configuration:

{
    "networking": {
        "gateway_policy": {
            "rate_limit": {
                "enable": true,
                "limit": 100
            },
            "concurrency_limit": {
                "enable": true,
                "limit": 50
            }
        }
    }
}

Sinker parameters

Parameter

Required

Description

type

No

The storage type to persist records. The following types are supported:

  • maxcompute: MaxCompute

  • sls: Simple Log Service (SLS)

config

maxcompute.project

No

The MaxCompute project name.

maxcompute.table

No

The name of the MaxCompute table.

sls.project

No

The SLS project name.

sls.logstore

No

The name of the SLS Logstore.

The following sections provide configuration examples.

MaxCompute

"sinker": {
        "type": "maxcompute",
        "config": {
            "maxcompute": {
                "project": "cl****",
                "table": "te****"
            }
        }
    }

Simple Log Service

"sinker": {
        "type": "sls",
        "config": {
            "sls": {
                "project": "k8s-log-****",
                "logstore": "d****"
            }
        }
    }

Appendix: JSON configuration example

The following JSON example uses the parameters described above:

{
  "token": "****M5Mjk0NDZhM2EwYzUzOGE0OGMx****",
  "processor": "tensorflow_cpu_1.12",
  "model_path": "oss://examplebucket/exampledir/",
  "oss_endpoint": "oss-cn-beijing.aliyuncs.com",
  "model_entry": "",
  "model_config": "",
  "processor_path": "",
  "processor_entry": "",
  "processor_mainclass": "",
  "processor_type": "",
  "warm_up_data_path": "",
  "runtime": {
    "enable_crash_block": false
  },
  "unit": {
        "size": 2
    },
  "sinker": {
        "type": "maxcompute",
        "config": {
            "maxcompute": {
                "project": "cl****",
                "table": "te****"
            }
        }
    },
  "cloud": {
    "computing": {
      "instances": [
        {
          "capacity": 800,
          "type": "dedicated_resource"
        },
        {
          "capacity": 200,
          "type": "ecs.c7.4xlarge",
          "spot_price_limit": 3.6
        }
      ],
      "disable_spot_protection_period": true
    },
    "networking": {
            "vpc_id": "vpc-bp1oll7xawovg9t8****",
            "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
            "security_group_id": "sg-bp1ej061cnyfn0b****"
        }
  },
  "autoscaler": {
    "min": 2,
    "max": 5,
    "strategies": {
      "qps": 10
    }
  },
  "storage": [
    {
      "mount_path": "/data_oss",
      "oss": {
        "endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
        "path": "oss://bucket/path/"
      }
    }
  ],
  "confidential": {
        "trustee_endpoint": "xx",
        "decryption_key": "xx"
    },
  "metadata": {
    "name": "test_eascmd",
    "resource": "eas-r-9lkbl2jvdm0puv****",
    "instance": 1,
    "workspace_id": "1405**",
    "gpu": 0,
    "cpu": 1,
    "memory": 2000,
    "gpu_memory": 10,
    "gpu_core_percentage": 10,
    "qos": "",
    "cuda": "11.2",
    "enable_grpc": false,
    "enable_webservice": false,
    "rdma": 1,
    "rpc": {
      "batching": false,
      "keepalive": 5000,
      "io_threads": 4,
      "max_batch_size": 16,
      "max_batch_timeout": 50,
      "max_queue_size": 64,
      "worker_threads": 5,
      "rate_limit": 0,
      "enable_sigterm": false
    },
    "rolling_strategy": {
      "max_surge": 1,
      "max_unavailable": 1
    },
    "eas.termination_grace_period": 30,
    "scheduling": {
      "spread": {
        "policy": "host"
      }
    },
    "resource_rebalancing": false,
    "workload_type": "elasticjob",
    "shm_size": 100
  },
  "features": {
    "eas.aliyun.com/extra-ephemeral-storage": "100Gi",
    "eas.aliyun.com/gpu-driver-version": "tesla=550.127.08"
  },
  "networking": {
    "gateway": "gw-m2vkzbpixm7mo****"
  },
  "containers": [
    {
      "image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
      "prepare": {
        "pythonRequirements": [
          "numpy==1.16.4",
          "absl-py==0.11.0"
        ]
      },
      "command": "python app.py",
      "port": 8000
    }
  ],
  "dockerAuth": "dGVzdGNhbzoxM*******"
}