Elastic Algorithm Service (EAS) provides both shared gateways and dedicated gateways, which support service invocations over the internet and internal networks. The invocation process is largely the same for both. You can choose the gateway type and endpoint that best fits your requirements.
Select a gateway
EAS offers two types of gateways: shared gateway and dedicated gateway. Their key differences are summarized below:
Comparison | Shared gateway | Dedicated gateway |
Internet invocation | Supported by default. | Supported. Must be enabled first. |
Internal network invocation | Supported by default. | Supported. Must be enabled first. |
Cost | Free of charge. | Incurs additional fees. |
Use Cases | Shared bandwidth, suitable for services with low traffic that do not require custom access policies. Recommended for a testing scenario. | Dedicated bandwidth, ideal for services with high traffic that demand greater security, stability, and performance. Recommended for a production environment. |
Configuration | Default configuration. Ready to use out of the box. | Create the gateway first, and then select it when deploying the service. For more information, see Use a dedicated gateway. |
Select an endpoint
Internet endpoint: Use this in any environment with internet access. Requests are forwarded through an EAS shared gateway to your service.
VPC endpoint: Use this when your client application and the EAS service are deployed in the same region. You can establish a connection between two VPCs located in the same Region.
Invoking services over a VPC is faster than over the internet because it eliminates public network latency. It is also more cost-effective, as intra-VPC traffic is typically free.
How to invoke a service
To invoke a service, you first need its endpoint and token. Then, construct a request based on the specifications of the deployed model.
1. Get the endpoint and token
After you deploy a service, the system automatically generates the required endpoint and token.
The Endpoint provided in the Console is a base address. You must append the correct Request Path to this base address to build the complete request URL. An incorrect path is the most common cause of 404 Not Found errors.
On the Inference Services tab, click the name of the target service to go to its Overview page. In the Basic Information section, click View Endpoint Information.
In the Invocation Method panel, get the endpoint and token. Choose an Internet or VPC endpoint based on your requirements. The following examples use
<EAS_ENDPOINT>and<EAS_TOKEN>as placeholders for these values.
2. Construct and send the request
For both Internet and VPC endpoints, the request construction is the same; only the URL differs. A standard invocation request typically includes four core components:
Method: The most common methods are POST and GET.
URL: The base
<EAS_ENDPOINT>with the specific Request Path appended.Request Header: At a minimum, this must include your authentication token, such as
Authorization: <EAS_TOKEN>.Request Body: The API of the deployed model determines the format, such as JSON.
ImportantWhen invoking a service through a gateway, the request body cannot exceed 1 MB.
Scenarios
Scenario 1: Invoke a model deployed from Model Gallery
Refer directly to the Overview page in the Model Gallery. It provides the most accurate API call examples, typically in cURL or Python, including the complete URL path and Request Body format.
cURL command
The basic syntax for a cURL command is curl [options] [URL]:
optionsare optional parameters. Common options include-Xto specify the request method,-Hfor the Request Header, and-dfor the Request Body.URLis the HTTP endpoint you want to access.

Python code
This Python code example uses the Qwen3-Reranker-8B model. Note that its URL and Request Body differ from the cURL example. Always refer to the corresponding Overview page.

Scenario 2: Invoke a large language model
Large Language Model (LLM) services typically provide OpenAI-compatible API endpoints, such as the chat completions endpoint (/v1/chat/completions) and the completions endpoint (/v1/completions).
For example, to call the chat completions endpoint of a DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM, you need the following components. For more information, see Invoke an LLM:
Method: POST
URI: <EAS_ENDPOINT>/v1/chat/completions
Request Header:
Authorization: <EAS_TOKEN>andContent-Type: application/jsonRequest Body:
{ "model": "DeepSeek-R1-Distill-Qwen-7B", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "hello!" } ] }
Example: Invoke with cURL and Python
Assume that <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.
curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}' import requests
# Replace with the actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is the actual token.
headers = {
"Content-Type": "application/json",
"Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)More scenarios
For services deployed with a universal Processor (including TensorFlow, Caffe, and PMML): See Construct a service request based on a universal processor.
Models you trained yourself: The invocation method is the same as for the original model.
Other custom services: The data input format that you define in your custom image or code determines the request format.
FAQ
For more information, see Service invocation FAQ.