Call services over the Internet or a private network through a gateway - Platform For AI

Choose a gateway type

EAS offers two gateway types. The following table compares their features:

Comparison	Shared Gateway	Dedicated Gateway
Internet Access	Supported (default)	Supported (manual configuration required)
VPC Access	Supported (default)	Supported (manual configuration required)
Cost	Free	Paid
Bandwidth	Shared	Dedicated
Scenarios	Development and testing environments with low traffic	Production environments requiring high security, stability, and performance
Setup	No setup required. Available by default.	Must be created before deployment. See Use a dedicated gateway.

Recommendations:

Use a shared gateway for development and testing.
Use a dedicated gateway for production.

Choose an access method

Internet endpoint

Use this method when your application has Internet access. Requests are routed through the EAS gateway to your deployed service.

Scenarios:

Calling services from outside Alibaba Cloud
Local development and testing
Integration with external applications

VPC address

Use this method when your application and EAS service are deployed in the same region. VPC connections enable secure communication between resources in the same region.

Scenarios:

Your application runs on Alibaba Cloud in the same region as EAS.
You need lower latency and reduced costs.
Your service should not be publicly accessible.

Important

VPC access offers lower latency by avoiding Internet routing overhead and reduced costs since VPC traffic is typically free.

How to call a service

To call an EAS service, you need three components:

Service endpoint
Authorization token
Request structure that conforms to the model's API specification

Step 1: Get the endpoint and token

After deploying a service, the system automatically generates an endpoint and authorization token.

Important

The console displays the base endpoint URL. Append the appropriate API path to create the complete request URL. An incorrect path is the most common cause of 404 Not Found errors.

On the Inference Service tab, click the name of the target service to go to the Overview page.
In the Basic Information section, click View Endpoint Information.
In the Invocation Method panel, copy the endpoint and token:
- Choose the Internet endpoint or VPC endpoint as needed.
- The following examples use <EAS_ENDPOINT> as the endpoint and <EAS_TOKEN> as the token.

Step 2: Construct and send the request

The request format is identical for both Internet and VPC endpoints. A standard request includes four core elements:

Method: Common methods include POST and GET.
URL:
- Format: <EAS_ENDPOINT> + API path
- Example: http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test + /v1/chat/completion
Headers:
- Authorization: <EAS_TOKEN> (Required for authorization)
- Content-Type: application/json (Typically required for POST requests)
Body: The request payload format depends on the deployed model's API specification (e.g., JSON).

Important
Gateway requests are limited to 1 MB for the request body.

Invocation example

This example demonstrates calling a DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM:

Method: POST
Request path: <EAS_ENDPOINT>/v1/chat/completions (chat API)
Headers:
- Authorization: <Token>
- Content-Type: application/json

Request body:

{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}

Code example:

Assume <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.

curl

curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}'

Python

import requests

# Replace with your actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is your actual token.
headers = {
    "Content-Type": "application/json",
    "Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
    ]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)

For more information about calling Large Language Model (LLM) services, see LLM service invocation.

More scenarios

Model Gallery deployments: The Overview page typically provides complete API call examples, including URL paths and request formats.
cURL command

Basic syntax: curl [options] [URL]

Common parameters (options):
- -X: Specifies the HTTP method, such as -X POST.
- -H: Adds a request header, such as -H "Content-Type: application/json".
- -d: Adds a request body, such as -d '{"key": "value"}'.
Python code

The following Python code demonstrates using the Qwen3-Reranker-8B model. Note that the URL and request body differ from the cURL example. Always refer to the specific model documentation.
Scenario-based deployments:
Services deployed using a generic processor (TensorFlow, Caffe, PMML): See Construct a service request based on a generic processor.
Custom services: The request format depends on the data input format defined in your custom image or code.
Self-trained models: Use the same invocation method as the base model.

FAQ

For common issues and troubleshooting, see Service Invocation FAQ.