Invoke service via shared gateway (public endpoint/VPC) - Platform For AI

After an EAS service is deployed, a shared gateway is provided by default. You can use this gateway to invoke the deployed model inference service via a public endpoint or a VPC address.

Important

We recommend using a shared gateway for development and testing environments. For production environments, use a dedicated gateway.

Choose an invocation address

After the shared gateway is deployed, two types of invocation addresses are provided by default:

Invocation address

Description

Use cases

public endpoint

The EAS shared gateway forwards requests to the target service. This method is suitable for any environment with access to the public internet.

Invoking from outside Alibaba Cloud
Local development and testing

VPC address

Suitable for scenarios where your application and the EAS service are deployed in the same region.

Important

Compared to public internet invocation, VPC invocation offers lower latency by avoiding public network overhead and is more cost-effective because traffic within a VPC is typically free.

Invoking from within Alibaba Cloud (in the same region as the EAS service)
Requiring lower latency and cost
Preventing services from being exposed to the public internet

If your application and the EAS service are in different regions, you cannot use the shared gateway's VPC address to access the service, even if the VPCs are connected. In this case, you can only access the service by using the instance IP address and port. However, because the IP address changes when the service is restarted or updated, we recommend using a dedicated gateway.

Invoke the service

Step 1: Get the endpoint and token

After you deploy a service, the system automatically generates an endpoint and an authorization token.

Important

The console provides the base endpoint. When you construct the full request URL, you must append the correct API path to this base endpoint. An incorrect path is the most common cause of 404 Not Found errors.

On the Inference Service tab, click the name of the target service to go to the Overview page.
In the Basic Information section, click View Endpoint Information.
In the Invocation Method panel, copy the endpoint and token:
- Depending on your needs, select the public endpoint or VPC address.
- The following examples use <EAS_ENDPOINT> as a placeholder for the endpoint and <EAS_TOKEN> as a placeholder for the token.

Step 2: Construct and send the request

The request format is the same whether you use a public endpoint or a VPC address. A standard request includes the following elements:

Element	Description
Method	The most common methods are POST and GET.
Request path (URL)	Format: <EAS_ENDPOINT> + API path. Example: `http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions`
Authorization (Required)	`Authorization: <EAS_TOKEN>`, used for authentication.
Content-Type	`Content-Type: application/json`, typically required for POST requests.
Request body	The API specification of the deployed model determines the format. The request body cannot exceed 1 MB when sent through a gateway.

Invocation example

The following example shows how to invoke the DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM. Assume that <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.

Request body:

{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}

Code example:

curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}'

import requests

# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
    "Content-Type": "application/json",
    "Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
    ]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)

For more information on invoking Large Language Model (LLM) services, see LLM service invocation.

Other deployment scenarios

Models deployed from Model Gallery: The Overview page for the model typically provides API invocation examples, including the full URL path and request format.
cURL command
Common parameters:
Parameter
Description
Example
-X
Specifies the HTTP method.
-X POST
-H
Adds a request header.
-H "Content-Type: application/json"
-d
Adds a request body.
-d '{"key": "value"}'
Python code
The following example uses the Qwen3-Reranker-8B model to show how to invoke the service with Python code. Note that its URL and request body differ from the cURL example. Always follow the instructions on the model's Overview page.
Scenario-based deployments:
Services deployed with a generic processor (such as TensorFlow, Caffe, and PMML): See Construct a service request based on a generic processor.
Other custom services: The request format is determined by the data input format defined in your custom image or code.
Self-trained models: The invocation method is the same as for the original model.

FAQ

For common issues and solutions related to service invocation, see Service Invocation FAQ.

Parameter	Description	Example
`-X`	Specifies the HTTP method.	`-X POST`
`-H`	Adds a request header.	`-H "Content-Type: application/json"`
`-d`	Adds a request body.	`-d '{"key": "value"}'`

Choose an invocation address

Invoke the service

Step 1: Get the endpoint and token

Step 2: Construct and send the request

Invocation example

Other deployment scenarios

cURL command

Python code

FAQ