All Products
Search
Document Center

Platform For AI:Invoke services over the Internet or an internal network through a gateway

Last Updated:Dec 16, 2025

Elastic Algorithm Service (EAS) provides both shared gateways and dedicated gateways, which support service invocations over the internet and internal networks. The invocation process is largely the same for both. You can choose the gateway type and endpoint that best fits your requirements.

Select a gateway

EAS offers two types of gateways: shared gateway and dedicated gateway. Their key differences are summarized below:

Comparison

Shared gateway

Dedicated gateway

Internet invocation

Supported by default.

Supported. Must be enabled first.

Internal network invocation

Supported by default.

Supported. Must be enabled first.

Cost

Free of charge.

Incurs additional fees.

Use Cases

Shared bandwidth, suitable for services with low traffic that do not require custom access policies. Recommended for a testing scenario.

Dedicated bandwidth, ideal for services with high traffic that demand greater security, stability, and performance. Recommended for a production environment.

Configuration

Default configuration. Ready to use out of the box.

Create the gateway first, and then select it when deploying the service. For more information, see Use a dedicated gateway.

Select an endpoint

  • Internet endpoint: Use this in any environment with internet access. Requests are forwarded through an EAS shared gateway to your service.

  • VPC endpoint: Use this when your client application and the EAS service are deployed in the same region. You can establish a connection between two VPCs located in the same Region.

Important

Invoking services over a VPC is faster than over the internet because it eliminates public network latency. It is also more cost-effective, as intra-VPC traffic is typically free.

How to invoke a service

To invoke a service, you first need its endpoint and token. Then, construct a request based on the specifications of the deployed model.

1. Get the endpoint and token

After you deploy a service, the system automatically generates the required endpoint and token.

Important

The Endpoint provided in the Console is a base address. You must append the correct Request Path to this base address to build the complete request URL. An incorrect path is the most common cause of 404 Not Found errors.

  1. On the Inference Services tab, click the name of the target service to go to its Overview page. In the Basic Information section, click View Endpoint Information.

  2. In the Invocation Method panel, get the endpoint and token. Choose an Internet or VPC endpoint based on your requirements. The following examples use <EAS_ENDPOINT> and <EAS_TOKEN> as placeholders for these values.

    image

2. Construct and send the request

For both Internet and VPC endpoints, the request construction is the same; only the URL differs. A standard invocation request typically includes four core components:

  • Method: The most common methods are POST and GET.

  • URL: The base <EAS_ENDPOINT> with the specific Request Path appended.

  • Request Header: At a minimum, this must include your authentication token, such as Authorization: <EAS_TOKEN>.

  • Request Body: The API of the deployed model determines the format, such as JSON.

    Important

    When invoking a service through a gateway, the request body cannot exceed 1 MB.

Scenarios

Scenario 1: Invoke a model deployed from Model Gallery

Refer directly to the Overview page in the Model Gallery. It provides the most accurate API call examples, typically in cURL or Python, including the complete URL path and Request Body format.

cURL command

The basic syntax for a cURL command is curl [options] [URL]:

  • options are optional parameters. Common options include -X to specify the request method, -H for the Request Header, and -d for the Request Body.

  • URL is the HTTP endpoint you want to access.

image

Python code

This Python code example uses the Qwen3-Reranker-8B model. Note that its URL and Request Body differ from the cURL example. Always refer to the corresponding Overview page.

image

Scenario 2: Invoke a large language model

Large Language Model (LLM) services typically provide OpenAI-compatible API endpoints, such as the chat completions endpoint (/v1/chat/completions) and the completions endpoint (/v1/completions).

For example, to call the chat completions endpoint of a DeepSeek-R1-Distill-Qwen-7B model service deployed with vLLM, you need the following components. For more information, see Invoke an LLM:

  • Method: POST

  • URI: <EAS_ENDPOINT>/v1/chat/completions

  • Request Header: Authorization: <EAS_TOKEN> and Content-Type: application/json

  • Request Body:

    {
        "model": "DeepSeek-R1-Distill-Qwen-7B",
        "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
        ]
    }

Example: Invoke with cURL and Python

Assume that <EAS_ENDPOINT> is http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test.

curl http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: *********5ZTM1ZDczg5OT**********" \
-X POST \
-d '{
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "hello!"
    }
    ]
}' 
import requests

# Replace with the actual endpoint.
url = 'http://16********.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test/v1/chat/completions'
# The value of Authorization in the header is the actual token.
headers = {
    "Content-Type": "application/json",
    "Authorization": "*********5ZTM1ZDczg5OT**********",
}
# Construct the service request based on the data format required by the specific model.
data = {
    "model": "DeepSeek-R1-Distill-Qwen-7B",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "hello!"
        }
    ]
}
# Send the request.
resp = requests.post(url, json=data, headers=headers)
print(resp)
print(resp.content)

More scenarios

FAQ

For more information, see Service invocation FAQ.