EAS provides shared and dedicated gateways for calling deployed model inference services over the Internet or a VPC. This topic describes how to choose a gateway type and access method, and how to call EAS services.
Choose a gateway type
EAS offers two gateway types. The following table compares their features:
|
Comparison |
Shared Gateway |
Dedicated Gateway |
|
Internet Access |
Supported (default) |
Supported (manual configuration required) |
|
VPC Access |
Supported (default) |
Supported (manual configuration required) |
|
Cost |
Free |
Paid |
|
Bandwidth |
Shared |
Dedicated |
|
Scenarios |
Development and testing environments with low traffic |
Production environments requiring high security, stability, and performance |
|
Setup |
No setup required. Available by default. |
Must be created before deployment. See Use a dedicated gateway. |
Recommendations:
-
Use a shared gateway for development and testing.
-
Use a dedicated gateway for production.
Choose an access method
Internet endpoint
Use this method when your application has Internet access. Requests are routed through the EAS gateway to your deployed service.
Scenarios:
-
Calling services from outside Alibaba Cloud
-
Local development and testing
-
Integration with external applications
VPC address
Use this method when your application and EAS service are deployed in the same region. VPC connections enable secure communication between resources in the same region.
Scenarios:
-
Your application runs on Alibaba Cloud in the same region as EAS.
-
You need lower latency and reduced costs.
-
Your service should not be publicly accessible.
VPC access offers lower latency by avoiding Internet routing overhead and reduced costs since VPC traffic is typically free.
How to call a service
To call an EAS service, you need three components:
-
Service endpoint
-
Authorization token
-
Request structure that conforms to the model's API specification
Step 1: Get the endpoint and token
After deploying a service, the system automatically generates an endpoint and authorization token.
The console displays the base endpoint URL. Append the appropriate API path to create the complete request URL. An incorrect path is the most common cause of 404 Not Found errors.
-
On the Inference Service tab, click the name of the target service to go to the Overview page.
-
In the Basic Information section, click View Endpoint Information.
-
In the Invocation Method panel, copy the endpoint and token:
-
Choose the Internet endpoint or VPC endpoint as needed.
-
The following examples use <EAS_ENDPOINT> as the endpoint and <EAS_TOKEN> as the token.

-
Step 2: Construct and send the request
The request format is identical for both Internet and VPC endpoints. A standard request includes four core elements:
-
Method: Common methods include POST and GET.
-
URL:
-
Format: <EAS_ENDPOINT> + API path
-
Example:
http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test+/v1/chat/completion
-
-
Headers:
-
Authorization: <EAS_TOKEN>(Required for authorization) -
Content-Type: application/json(Typically required for POST requests)
-
-
Body: The request payload format depends on the deployed model's API specification (e.g., JSON).
ImportantGateway requests are limited to 1 MB for the request body.
Invocation example
This example demonstrates calling a DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM:
-
Method: POST
-
Request path: <EAS_ENDPOINT>/v1/chat/completions (chat API)
-
Headers:
-
Authorization: <Token>
-
Content-Type: application/json
-
-
Request body:
{ "model": "DeepSeek-R1-Distill-Qwen-7B", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "hello!" } ] }
Code example:
Assume <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.
curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}' import requests
# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
"Content-Type": "application/json",
"Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)For more information about calling Large Language Model (LLM) services, see LLM service invocation.
More scenarios
-
Model Gallery deployments: The Overview page typically provides complete API call examples, including URL paths and request formats.
cURL command
Basic syntax:
curl [options] [URL]Common parameters (options):
-
-X: Specifies the HTTP method, such as-X POST. -
-H: Adds a request header, such as-H "Content-Type: application/json". -
-d: Adds a request body, such as-d '{"key": "value"}'.

Python code
The following Python code demonstrates using the Qwen3-Reranker-8B model. Note that the URL and request body differ from the cURL example. Always refer to the specific model documentation.

-
-
Scenario-based deployments:
-
Services deployed using a generic processor (TensorFlow, Caffe, PMML): See Construct a service request based on a generic processor.
-
Custom services: The request format depends on the data input format defined in your custom image or code.
-
Self-trained models: Use the same invocation method as the base model.
FAQ
For common issues and troubleshooting, see Service Invocation FAQ.