After an EAS service is deployed, a shared gateway is provided by default. You can use this gateway to invoke the deployed model inference service via a public endpoint or a VPC address.
We recommend using a shared gateway for development and testing environments. For production environments, use a dedicated gateway.
Choose an invocation address
After the shared gateway is deployed, two types of invocation addresses are provided by default:
Invocation address | Description | Use cases |
public endpoint | The EAS shared gateway forwards requests to the target service. This method is suitable for any environment with access to the public internet. |
|
VPC address | Suitable for scenarios where your application and the EAS service are deployed in the same region. Important Compared to public internet invocation, VPC invocation offers lower latency by avoiding public network overhead and is more cost-effective because traffic within a VPC is typically free. |
|
Invoke the service
Step 1: Get the endpoint and token
After you deploy a service, the system automatically generates an endpoint and an authorization token.
The console provides the base endpoint. When you construct the full request URL, you must append the correct API path to this base endpoint. An incorrect path is the most common cause of 404 Not Found errors.
On the Inference Service tab, click the name of the target service to go to the Overview page.
In the Basic Information section, click View Endpoint Information.
In the Invocation Method panel, copy the endpoint and token:
Depending on your needs, select the public endpoint or VPC address.
The following examples use <EAS_ENDPOINT> as a placeholder for the endpoint and <EAS_TOKEN> as a placeholder for the token.

Step 2: Construct and send the request
The request format is the same whether you use a public endpoint or a VPC address. A standard request includes the following elements:
Element | Description |
Method | The most common methods are POST and GET. |
Request path (URL) | Format: <EAS_ENDPOINT> + API path. Example: |
Authorization (Required) |
|
Content-Type |
|
Request body | The API specification of the deployed model determines the format. The request body cannot exceed 1 MB when sent through a gateway. |
Invocation example
The following example shows how to invoke the DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM. Assume that <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.
Request body:
{
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}Code example:
curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}' import requests
# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
"Content-Type": "application/json",
"Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
"model": "DeepSeek-R1-Distill-Qwen-7B",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "hello!"
}
]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)For more information on invoking Large Language Model (LLM) services, see LLM service invocation.
Other deployment scenarios
Models deployed from Model Gallery: The Overview page for the model typically provides API invocation examples, including the full URL path and request format.
cURL command
Common parameters:
Parameter
Description
Example
-XSpecifies the HTTP method.
-X POST-HAdds a request header.
-H "Content-Type: application/json"-dAdds a request body.
-d '{"key": "value"}'
Python code
The following example uses the Qwen3-Reranker-8B model to show how to invoke the service with Python code. Note that its URL and request body differ from the cURL example. Always follow the instructions on the model's Overview page.

Scenario-based deployments:
Services deployed with a generic processor (such as TensorFlow, Caffe, and PMML): See Construct a service request based on a generic processor.
Other custom services: The request format is determined by the data input format defined in your custom image or code.
Self-trained models: The invocation method is the same as for the original model.
FAQ
For common issues and solutions related to service invocation, see Service Invocation FAQ.