All Products
Search
Document Center

Platform For AI:Invoke service via shared gateway (public endpoint/VPC)

Last Updated:Jun 25, 2026

After an EAS service is deployed, a shared gateway is provided by default. You can use this gateway to invoke the deployed model inference service via a public endpoint or a VPC address.

Important

We recommend using a shared gateway for development and testing environments. For production environments, use a dedicated gateway.

Choose an invocation address

After the shared gateway is deployed, two types of invocation addresses are provided by default:

Invocation address

Description

Use cases

public endpoint

The EAS shared gateway forwards requests to the target service. This method is suitable for any environment with access to the public internet.

  • Invoking from outside Alibaba Cloud

  • Local development and testing

VPC address

Suitable for scenarios where your application and the EAS service are deployed in the same region.

Important

Compared to public internet invocation, VPC invocation offers lower latency by avoiding public network overhead and is more cost-effective because traffic within a VPC is typically free.

  • Invoking from within Alibaba Cloud (in the same region as the EAS service)

  • Requiring lower latency and cost

  • Preventing services from being exposed to the public internet

If your application and the EAS service are in different regions, you cannot use the shared gateway's VPC address to access the service, even if the VPCs are connected. In this case, you can only access the service by using the instance IP address and port. However, because the IP address changes when the service is restarted or updated, we recommend using a dedicated gateway.

Invoke the service

Step 1: Get the endpoint and token

After you deploy a service, the system automatically generates an endpoint and an authorization token.

Important

The console provides the base endpoint. When you construct the full request URL, you must append the correct API path to this base endpoint. An incorrect path is the most common cause of 404 Not Found errors.

  1. On the Inference Service tab, click the name of the target service to go to the Overview page.

  2. In the Basic Information section, click View Endpoint Information.

  3. In the Invocation Method panel, copy the endpoint and token:

    • Depending on your needs, select the public endpoint or VPC address.

    • The following examples use <EAS_ENDPOINT> as a placeholder for the endpoint and <EAS_TOKEN> as a placeholder for the token.

    image

Step 2: Construct and send the request

The request format is the same whether you use a public endpoint or a VPC address. A standard request includes the following elements:

Element

Description

Method

The most common methods are POST and GET.

Request path (URL)

Format: <EAS_ENDPOINT> + API path. Example: http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions

Authorization (Required)

Authorization: <EAS_TOKEN>, used for authentication.

Content-Type

Content-Type: application/json, typically required for POST requests.

Request body

The API specification of the deployed model determines the format. The request body cannot exceed 1 MB when sent through a gateway.

Invocation example

The following example shows how to invoke the DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM. Assume that <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.

Request body:

{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}

Code example:

curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}' 
import requests

# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
    "Content-Type": "application/json",
    "Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
    ]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)

For more information on invoking Large Language Model (LLM) services, see LLM service invocation.

Other deployment scenarios

  • Models deployed from Model Gallery: The Overview page for the model typically provides API invocation examples, including the full URL path and request format.

    cURL command

    Common parameters:

    Parameter

    Description

    Example

    -X

    Specifies the HTTP method.

    -X POST

    -H

    Adds a request header.

    -H "Content-Type: application/json"

    -d

    Adds a request body.

    -d '{"key": "value"}'

    image

    Python code

    The following example uses the Qwen3-Reranker-8B model to show how to invoke the service with Python code. Note that its URL and request body differ from the cURL example. Always follow the instructions on the model's Overview page.

    image

  • Scenario-based deployments:

  • Services deployed with a generic processor (such as TensorFlow, Caffe, and PMML): See Construct a service request based on a generic processor.

  • Other custom services: The request format is determined by the data input format defined in your custom image or code.

  • Self-trained models: The invocation method is the same as for the original model.

FAQ

For common issues and solutions related to service invocation, see Service Invocation FAQ.