All Products
Search
Document Center

Platform For AI:Service deployment via JSON

Last Updated:Apr 03, 2026

Define and deploy an online inference service in EAS by using a JSON configuration file.

Quick start

1. Prepare a JSON file

A JSON file with all configurations is required to deploy a service. First-time users can auto-generate this file by configuring parameters under Custom Model Deployment > Custom Deployment, and then modify the generated JSON as needed.

Example service.json file. For all parameters, see Appendix: JSON Parameter Reference.

{
    "metadata": {
        "name": "demo",
        "instance": 1,
        "workspace_id": "your-workspace-id"
    },
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ecs.c7a.large"
                }
            ]
        }
    },
    "containers": [
        {
            "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/python-inference:py39-ubuntu2004",
            "script": "python app.py",
            "port": 8000
        }
    ]
}

2. Deploy the service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. On the Inference Service tab, click Deploy Service. In the Custom Model Deployment section, select JSON Deployment.

  3. Paste the JSON content and click Deploy. Deployment succeeds when the service status changes to running.

Appendix: JSON parameters

Parameter

Required

Description

metadata

Yes

Service metadata. See metadata parameter description.

cloud

No

Compute and VPC resource configuration. See cloud parameter description.

containers

No

Image configuration. See containers parameter description.

dockerAuth

No

Required to access a private repository that requires authentication. The value is a Base64-encoded string of username:password.

networking

No

Service invocation configuration. See networking parameter description.

storage

No

Mounts data from sources such as OSS or NAS into the container. See storage mount.

token

No

The access token for service authentication. If omitted, the system generates one automatically.

aimaster

No

Enables computing power check and fault tolerance for multi-node distributed inference.

model_path

Yes

Required when deploying a service with a processor. The model_path and processor_path parameters specify the data source paths for the model and processor. Both parameters support the following path formats:

  • OSS path: Points to a specific file or directory.

  • HTTP URL: The file must be a compressed archive, such as TAR.GZ, TAR, BZ2, or ZIP.

  • Local path: Use a local path with the test command for local debugging.

oss_endpoint

No

The OSS endpoint. Example: oss-cn-beijing.aliyuncs.com. For other valid values, see Regions and endpoints.

Note

By default, this parameter is not required. The system uses the internal OSS endpoint in the current region to download the model or processor files. Specify this parameter when accessing OSS across regions. For example, if the service is deployed in the China (Hangzhou) region and model_path specifies an OSS path in the China (Beijing) region, set this parameter to the public OSS endpoint of the China (Beijing) region.

model_entry

No

The entry file for the model. Can be any file. If not specified, the filename in model_path is used. The entry file path is passed to the initialize() function in the Processor.

model_config

No

Model configuration. Supports any text. The value is passed to the second parameter of the initialize() function in the Processor.

processor

No

  • If you use a pre-built processor, specify its code. For the codes used in eascmd, see pre-built processors.

  • If you use a custom processor, omit this parameter and configure processor_path, processor_entry, processor_mainclass, and processor_type parameters instead.

processor_path

No

The path to the processor package. See the model_path parameter description.

processor_entry

No

The main file of the Processor, such as libprocessor.so or app.py, which contains the implementations of the initialize() and process() functions required for prediction.

This parameter is required if processor_type is set to cpp or python.

processor_mainclass

No

The processor's main class in the JAR package, for example, com.aliyun.TestProcessor.

This parameter is required if processor_type is set to java.

processor_type

No

The language in which the processor is implemented. Valid values:

  • cpp

  • java

  • python

warm_up_data_path

No

The path to the request file for model prefetch. See model prefetch.

runtime.enable_crash_block

No

Whether a service instance automatically restarts after it crashes due to an exception in the processor code. Valid values:

  • true: The service instance does not automatically restart. This preserves the runtime environment for troubleshooting.

  • false: The service instance automatically restarts. This is the default value.

autoscaler

No

Horizontal auto scaling configuration. See horizontal auto scaling.

labels

No

Configure labels for EAS. The format is key:value.

unit.size

No

The number of machines per service instance in a distributed inference deployment. The default value is 2.

sinker

No

Persists all service requests and responses to MaxCompute or Simple Log Service (SLS). See sinker parameter description.

confidential

No

Enables secure, encrypted inference through a trust management service. Data, models, and code remain encrypted during service deployment and invocation. The format is as follows:

Note

This secure encryption feature applies to files on your mounted storage. Ensure you mount the required storage files before enabling this feature.

"confidential": {
        "trustee_endpoint": "xxxx",
        "decryption_key": "xxxx"
    }

The parameters are as follows.

  • trustee_endpoint: The URI of the system trust management service, Trustee.

  • decryption_key: The KBS URI of the decryption key. For example, kbs:///default/key/test-key.

Metadata parameters

General parameters

Parameter

Required

Description

name

Yes

The service name, which must be unique within a region.

instance

Yes

The number of service instances to launch.

workspace_id

No

The ID of the PAI workspace. If set, the service is restricted to the specified PAI workspace. For example: 1405**.

cpu

No

The number of CPU cores required by each instance.

memory

No

The memory for each instance, in megabytes (MB). For example, "memory": 4096 means that each instance requires 4 GB of memory.

gpu

No

The number of GPUs required by each instance.

gpu_memory

No

Used for GPU slicing, which allows multiple instances to share a single GPU. This feature is available only with an EAS resource group or resource quotas.

gpu_core_percentage

qos

No

The quality of service (QoS) for the instance. Valid values are an empty value or BestEffort. Setting qos to BestEffort enables CPU sharing mode. In this mode, instances are scheduled based on memory and GPU memory, and CPU cores no longer limit scheduling. All instances on the node share CPU resources. The cpu parameter then specifies the maximum CPU quota per instance.

resource

No

The ID of the resource group.

  • If the service is deployed in a public resource group, omit this parameter. The service is billed on a pay-as-you-go basis.

  • If the service is deployed in a dedicated resource group, set this parameter to the ID of the resource group. Example: eas-r-6dbzve8ip0xnzt****.

cuda

No

The CUDA version required by the service. At runtime, EAS automatically mounts the specified CUDA version to the /usr/local/cuda directory of the instance.

Supported versions: 8.0, 9.0, 10.0, 10.1, 10.2, 11.0, 11.1, and 11.2. Example: "cuda":"11.2".

rdma

No

Set to 1 to enable Remote Direct Memory Access (RDMA) networking for distributed inference. If omitted, RDMA is disabled.

Note

Currently, RDMA networking is available only for services deployed with Lingjun intelligent computing resources.

enable_grpc

No

Enables or disables gRPC connections for the service gateway. Valid values:

  • false: The default value. gRPC connections are disabled, and the gateway supports HTTP requests by default.

  • true: gRPC connections are enabled for the gateway.

Note

To deploy a service by using a custom image with a gRPC server-side implementation, set this parameter to true to switch the gateway protocol to gRPC.

enable_webservice

No

Whether to enable a web server to deploy the service as an AI-Web application.

  • false: The default value. A web server is not enabled.

  • true: A web server is enabled.

type

No

Set this parameter to LLMGatewayService to deploy an LLM intelligent router. For information about how to configure the JSON file, see Step 1: Deploy an LLM intelligent router.

Advanced parameters

Important

Adjust these parameters with caution.

Parameter

Required

Description

rpc

batching

No

Whether to enable server-side batching to accelerate GPU model inference. Supported only in pre-built processor mode. Valid values:

  • false: The default value. Server-side batching is disabled.

  • true: Server-side batching is enabled.

keepalive

No

The maximum processing time for a single request, in milliseconds. If a request exceeds this processing time, the server returns a 408 timeout error and closes the connection. The default value is 600,000 for dedicated gateways. This parameter is not supported for dedicated gateways that use Application Load Balancer (ALB).

io_threads

No

The number of network I/O threads per instance. The default value is 4.

max_batch_size

No

The maximum size of each batch. The default value is 16. This parameter is supported only in pre-built processor mode and takes effect only when rpc.batching is set to true.

max_batch_timeout

No

The maximum timeout for each batch, in milliseconds. The default value is 50. This parameter is supported only in pre-built processor mode and takes effect only when rpc.batching is set to true.

max_queue_size

No

For an asynchronous inference service, this parameter specifies the maximum queue length. The default value is 64. If the queue is full, the server returns a 450 error and closes the connection. To prevent server overload, the queue can proactively notify the client to retry the request on other instances. For services with long response times, consider reducing the queue length to avoid request backlogs and timeouts.

worker_threads

No

The number of worker threads per instance for concurrent request processing. The default value is 5. This parameter is supported only in pre-built processor mode.

rate_limit

No

Enables QPS rate limiting and sets the maximum QPS per instance. The default value is 0, which disables QPS rate limiting.

For example, if you set this parameter to 2000, requests are rejected with a 429 (Too Many Requests) error when the QPS exceeds 2000.

enable_sigterm

No

Valid values:

  • false: The default value. The system does not send a SIGTERM signal when an instance enters the terminating state.

  • true: When a service instance enters the terminating state, the system immediately sends a SIGTERM signal to the main process. The process in your service must handle this signal to perform a custom graceful termination. If the signal is not handled, the main process might exit immediately, which causes the graceful termination to fail.

rolling_strategy

max_surge

No

During a rolling update, this parameter specifies the maximum number of additional instances created above the desired count. This value can be a positive integer or a percentage, such as 2%. The default is 2%. Increasing this value can accelerate the service update.

For example, if the service has 100 instances and this parameter is set to 20, the system immediately creates 20 new instances when the update begins.

max_unavailable

No

During a rolling update, this parameter specifies the maximum number of instances that can be unavailable. This parameter allows the system to free up resources for new instances and prevents the update from stalling due to insufficient idle resources. The default value is 1 in a dedicated resource group and 0 in a public resource group.

For example, if this parameter is set to N, the system stops N instances when the update begins.

Note

If you have sufficient idle resources, set this parameter to 0. A high value might affect service stability by reducing the number of available instances during the update, thereby increasing the traffic load on each remaining instance. You must balance service stability with resource availability when you configure this parameter.

eas.termination_grace_period

No

The graceful termination period for an instance, in seconds. Default: 30.

EAS services use a rolling update strategy. An instance first enters a Terminating state, during which traffic is routed away from it. The instance then waits for the specified period to finish processing in-flight requests before shutting down. If requests take a long time to process, increase this value to ensure all in-flight requests complete during an update.

Important

Reducing this value can affect service stability. Setting it too high can slow down the update process. Do not change this parameter unless you have a specific requirement.

scheduling

spread.policy

No

The distribution policy for scheduling service instances. Supported policies:

  • host: Spreads instances across different nodes.

  • zone: Spreads instances across different availability zones.

  • default: Uses the default scheduling policy without an active distribution strategy.

Configuration example:

{
  "metadata": {
    "scheduling": {
      "spread": {
        "policy": "host"
      }
    }
}

resource_rebalancing

No

Valid values:

  • false: The default value. This feature is disabled.

  • true: EAS periodically creates probe instances on high-priority resources. If a probe instance is scheduled successfully, EAS creates more probe instances exponentially until scheduling fails. When a successfully scheduled probe instance is ready, it replaces an instance that runs on a lower-priority resource.

This feature helps resolve the following issues:

  • During a rolling update, terminating instances can occupy resources, forcing new instances to be scheduled in a public resource group. These instances are later rescheduled to the dedicated resource group.

  • When you use both spot instances and regular instances, EAS periodically checks whether spot instances are available. If they are, EAS migrates regular instances to spot instances.

workload_type

No

To deploy an EAS service as a job, set this parameter to elasticjob. For more information about the Elastic Job service, see Elastic Job service.

resource_burstable

No

Enables the elastic resource pool feature for an EAS service that is deployed in a dedicated resource group:

  • true: Enables the feature.

  • false: Disables the feature.

shm_size

No

The size of shared memory for the instance, in GB. Shared memory enables direct read and write operations without data replication.

Cloud parameters

Parameter

Required

Description

computing

instances

No

Instance types for deploying a service in a public resource group. If a bid for a spot instance fails or an instance type has insufficient inventory, the system tries the next instance type in order.

  • type: instance type

  • spot_price_limit: Optional.

    • If set, the instance becomes a spot instance with this value as its maximum price limit in USD.

    • If omitted, the corresponding instance is a regular pay-as-you-go instance.

  • capacity: The maximum number of instances for this instance type. You can specify a number, such as "500", or a percentage as a string, such as "20%". If this limit is reached, the system stops creating instances of this type, even if more inventory is available. 

    For example, if a service has a total of 200 instances and you set the capacity for instance type A to "20%", the service uses a maximum of 40 instances of type A. The system uses other instance types for the remaining instances.

disable_spot_protection_period

No

This parameter applies to spot instances. Valid values:

  • false (default): After a spot instance is created, it has a default 1-hour protection period. During this period, the instance is not reclaimed even if the market price exceeds your bid.

  • true: Disables the protection period. Instances without a protection period are typically about 10% cheaper.

networking

vpc_id

No

The ID of the VPC.

vswitch_id

No

The ID of the VSwitch.

security_group_id

No

The ID of the security group.

destination_cidrs

No

If the CIDR block of the configured VSwitch conflicts with the EAS management CIDR blocks (10.224.0.0/16 or 10.240.0.0/12), set this field to the CIDR block of the VSwitch.
Example:

"cloud": {
    "networking": {
      "destination_cidrs": "10.241.28.0/22"
    }
  } 

Replace 10.241.28.0/22 with the actual CIDR block of your VSwitch.

Example:

{
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ecs.c8i.2xlarge",
                    "spot_price_limit": 1
                },
                {
                    "type": "ecs.c8i.xlarge",
                    "capacity": "20%"
                }
            ],
            "disable_spot_protection_period": false
        },
        "networking": {
            "vpc_id": "vpc-bp1oll7xawovg9*****",
            "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
            "security_group_id": "sg-bp1ej061cnyfn0b*****"
        }
    }
}

Parameter containers

To deploy a service with a custom image, see Custom Images.

Parameter

Required

Description

image

Yes

The address of the image for the model service.

env

name

No

The name of the environment variable.

value

No

The value of the environment variable.

command

Either command or script is required.

The entry point command for the container. Only single commands are supported. For complex scripts, such as cd xxx && python app.py, use the script parameter instead. Use the command parameter if the image does not contain a /bin/sh command.

script

The entry point script that runs in the container. This parameter supports complex scripts. Use \n or a semicolon (;) to separate multiple commands.

port

No

The container port.

Important
  • Avoid ports 8080 and 9090, which are reserved for the EAS engine.

  • This port must match the port configured in the xxx.py file in the command.

prepare

pythonRequirements

No

A list of Python packages to install before the service instance starts. The python and pip commands must be in the image's system path. For example:

"prepare": {
  "pythonRequirements": [
    "numpy==1.16.4",
    "absl-py==0.11.0"
  ]
}

pythonRequirementsPath

No

The path to a requirements.txt file. Packages in this file are installed before the service instance starts. The python and pip commands must be in the image's system path. The requirements.txt file can be built into the image or mounted into the service instance from external storage. For example:

"prepare": {
  "pythonRequirementsPath": "/data_oss/requirements.txt"
}

Networking parameters

Parameter

Required

Description

gateway

No

The dedicated gateway configured for the EAS service.

gateway_policy

No

  • rate_limit: Defines the global rate limiting for the service, specifying the maximum number of requests it can receive per second.

    • enable: Whether to enable rate limiting. Set to true to enable or false to disable.

    • limit: The maximum number of requests per second.

      Note

      For services that use a shared gateway, the default rate limit is 1,000 QPS for a single service and 10,000 QPS for a server group. Dedicated gateways do not have a default value.

  • concurrency_limit: Defines the global concurrency control for the service, which is the maximum number of concurrent requests. Dedicated gateways that use an Application Load Balancer (ALB) do not support this setting.

    • enable: Whether to enable concurrency control. Set to true to enable or false to disable.

    • limit: The maximum number of concurrent requests.

Example configuration:

{
    "networking": {
        "gateway_policy": {
            "rate_limit": {
                "enable": true,
                "limit": 100
            },
            "concurrency_limit": {
                "enable": true,
                "limit": 50
            }
        }
    }
}

Sinker parameters

Parameter

Required

Description

type

No

The storage type. Supported types are:

  • maxcompute: MaxCompute.

  • sls: Log Service (SLS).

config

maxcompute.project

No

The MaxCompute project name.

maxcompute.table

No

The MaxCompute table name.

sls.project

No

The Log Service (SLS) project name.

sls.logstore

No

The Logstore name.

Configuration examples:

MaxCompute

"sinker": {
        "type": "maxcompute",
        "config": {
            "maxcompute": {
                "project": "cl****",
                "table": "te****"
            }
        }
    }

Log Service (SLS)

"sinker": {
        "type": "sls",
        "config": {
            "sls": {
                "project": "k8s-log-****",
                "logstore": "d****"
            }
        }
    }

JSON configuration example

Sample JSON configuration:

{
  "token": "****M5Mjk0NDZhM2EwYzUzOGE0OGMx****",
  "processor": "tensorflow_cpu_1.12",
  "model_path": "oss://examplebucket/exampledir/",
  "oss_endpoint": "oss-cn-beijing.aliyuncs.com",
  "model_entry": "",
  "model_config": "",
  "processor_path": "",
  "processor_entry": "",
  "processor_mainclass": "",
  "processor_type": "",
  "warm_up_data_path": "",
  "runtime": {
    "enable_crash_block": false
  },
  "unit": {
        "size": 2
    },
  "sinker": {
        "type": "MaxCompute",
        "config": {
            "maxcompute": {
                "project": "cl****",
                "table": "te****"
            }
        }
    },
  "cloud": {
    "computing": {
      "instances": [
        {
          "capacity": 800,
          "type": "dedicated_resource"
        },
        {
          "capacity": 200,
          "type": "ecs.c7.4xlarge",
          "spot_price_limit": 3.6
        }
      ],
      "disable_spot_protection_period": true
    },
    "networking": {
            "vpc_id": "vpc-bp1oll7xawovg9t8****",
            "vswitch_id": "vsw-bp1jjgkw51nsca1e****",
            "security_group_id": "sg-bp1ej061cnyfn0b****"
        }
  },
  "autoscaler": {
    "min": 2,
    "max": 5,
    "strategies": {
      "qps": 10
    }
  },
  "storage": [
    {
      "mount_path": "/data_oss",
      "oss": {
        "endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
        "path": "oss://bucket/path/"
      }
    }
  ],
  "confidential": {
        "trustee_endpoint": "xx",
        "decryption_key": "xx"
    },
  "metadata": {
    "name": "test_eascmd",
    "resource": "eas-r-9lkbl2jvdm0puv****",
    "instance": 1,
    "workspace_id": "1405**",
    "gpu": 0,
    "cpu": 1,
    "memory": 2000,
    "gpu_memory": 10,
    "gpu_core_percentage": 10,
    "qos": "",
    "cuda": "11.2",
    "enable_grpc": false,
    "enable_webservice": false,
    "rdma": 1,
    "rpc": {
      "batching": false,
      "keepalive": 5000,
      "io_threads": 4,
      "max_batch_size": 16,
      "max_batch_timeout": 50,
      "max_queue_size": 64,
      "worker_threads": 5,
      "rate_limit": 0,
      "enable_sigterm": false
    },
    "rolling_strategy": {
      "max_surge": 1,
      "max_unavailable": 1
    },
    "eas.termination_grace_period": 30,
    "scheduling": {
      "spread": {
        "policy": "host"
      }
    },
    "resource_rebalancing": false,
    "workload_type": "elasticjob",
    "shm_size": 100
  },
  "features": {
    "eas.aliyun.com/extra-ephemeral-storage": "100Gi",
    "eas.aliyun.com/gpu-driver-version": "tesla=550.127.08"
  },
  "networking": {
    "gateway": "gw-m2vkzbpixm7mo****"
  },
  "containers": [
    {
      "image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
      "prepare": {
        "pythonRequirements": [
          "numpy==1.16.4",
          "absl-py==0.11.0"
        ]
      },
      "command": "python app.py",
      "port": 8000
    }
  ],
  "dockerAuth": "dGVzdGNhbzoxM*******"
}