All Products
Search
Document Center

Platform For AI:Call services over the Internet or a private network through a gateway

Last Updated:Mar 04, 2026

EAS provides shared and dedicated gateways for calling deployed model inference services over the Internet or a VPC. This topic describes how to choose a gateway type and access method, and how to call EAS services.

Choose a gateway type

EAS offers two gateway types. The following table compares their features:

Comparison

Shared Gateway

Dedicated Gateway

Internet Access

Supported (default)

Supported (manual configuration required)

VPC Access

Supported (default)

Supported (manual configuration required)

Cost

Free

Paid

Bandwidth

Shared

Dedicated

Scenarios

Development and testing environments with low traffic

Production environments requiring high security, stability, and performance

Setup

No setup required. Available by default.

Must be created before deployment. See Use a dedicated gateway.

Recommendations:

  • Use a shared gateway for development and testing.

  • Use a dedicated gateway for production.

Choose an access method

Internet endpoint

Use this method when your application has Internet access. Requests are routed through the EAS gateway to your deployed service.

Scenarios:

  • Calling services from outside Alibaba Cloud

  • Local development and testing

  • Integration with external applications

VPC address

Use this method when your application and EAS service are deployed in the same region. VPC connections enable secure communication between resources in the same region.

Scenarios:

  • Your application runs on Alibaba Cloud in the same region as EAS.

  • You need lower latency and reduced costs.

  • Your service should not be publicly accessible.

Important

VPC access offers lower latency by avoiding Internet routing overhead and reduced costs since VPC traffic is typically free.

How to call a service

To call an EAS service, you need three components:

  • Service endpoint

  • Authorization token

  • Request structure that conforms to the model's API specification

Step 1: Get the endpoint and token

After deploying a service, the system automatically generates an endpoint and authorization token.

Important

The console displays the base endpoint URL. Append the appropriate API path to create the complete request URL. An incorrect path is the most common cause of 404 Not Found errors.

  1. On the Inference Service tab, click the name of the target service to go to the Overview page.

  2. In the Basic Information section, click View Endpoint Information.

  3. In the Invocation Method panel, copy the endpoint and token:

    • Choose the Internet endpoint or VPC endpoint as needed.

    • The following examples use <EAS_ENDPOINT> as the endpoint and <EAS_TOKEN> as the token.

    image

Step 2: Construct and send the request

The request format is identical for both Internet and VPC endpoints. A standard request includes four core elements:

  • Method: Common methods include POST and GET.

  • URL:

    • Format: <EAS_ENDPOINT> + API path

    • Example: http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test + /v1/chat/completion

  • Headers:

    • Authorization: <EAS_TOKEN> (Required for authorization)

    • Content-Type: application/json (Typically required for POST requests)

  • Body: The request payload format depends on the deployed model's API specification (e.g., JSON).

    Important

    Gateway requests are limited to 1 MB for the request body.

Invocation example

This example demonstrates calling a DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM:

  • Method: POST

  • Request path: <EAS_ENDPOINT>/v1/chat/completions (chat API)

  • Headers:

    • Authorization: <Token>

    • Content-Type: application/json

  • Request body:

    {
        "model": "DeepSeek-R1-Distill-Qwen-7B",
        "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
        ]
    }

Code example:

Assume <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.

curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}' 
import requests

# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
    "Content-Type": "application/json",
    "Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
    ]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)

For more information about calling Large Language Model (LLM) services, see LLM service invocation.

More scenarios

  • Model Gallery deployments: The Overview page typically provides complete API call examples, including URL paths and request formats.

    cURL command

    Basic syntax: curl [options] [URL]

    Common parameters (options):

    • -X: Specifies the HTTP method, such as -X POST.

    • -H: Adds a request header, such as -H "Content-Type: application/json".

    • -d: Adds a request body, such as -d '{"key": "value"}'.

    image

    Python code

    The following Python code demonstrates using the Qwen3-Reranker-8B model. Note that the URL and request body differ from the cURL example. Always refer to the specific model documentation.

    image

  • Scenario-based deployments:

  • Services deployed using a generic processor (TensorFlow, Caffe, PMML): See Construct a service request based on a generic processor.

  • Custom services: The request format depends on the data input format defined in your custom image or code.

  • Self-trained models: Use the same invocation method as the base model.

FAQ

For common issues and troubleshooting, see Service Invocation FAQ.