All Products
Search
Document Center

API Gateway:Create a gateway instance

Last Updated:Dec 04, 2025

This topic describes how to create an AI Gateway instance.

Procedure

  1. Log on to the AI Gateway console.

  2. In the navigation pane on the left, choose Instance. In the top menu bar, select a region.

  3. Click Create Instance. On the AI Gateway purchase page, select the required configurations and then click Buy Now.

    Configuration Item

    Description

    Product Type

    Supports Dedicated Instance (pay-as-you-go), Dedicated Instance (subscription), and Serverless (pay-as-you-go). For more information about the billing methods for these three types, see Billing.

    Region

    Select the gateway's region.

    Important

    After the resource is created, you cannot change its region.

    Instance Name

    Enter a custom name for the gateway. A recommended naming convention is `environment` or `environment-business_realm`, such as `test` or `order-prod`. The name can be up to 64 characters long.

    Instance Specification

    Select node specifications based on your actual requirements. For the capacity specifications of different gateway specifications, see Product selection. The Serverless edition does not have gateway specifications.

    Resource Group

    Use the default resource group or an existing resource group. To create a new resource group, click Create Resource Group.

    Note

    Use resource groups to classify and manage resources under your Alibaba Cloud account. This lets you manage permissions, deploy resources, and monitor resources by group instead of managing each resource individually.

    Network Type

    Supports three access types: Public Network, Private Network, and Public + Private.

    • Internet: When you access the gateway over the Internet, you incur data transfer costs for traffic that is uniformly billed based on Cloud Data Transfer (CDT) and uses the Border Gateway Protocol (BGP) in a multi-line pattern. For more information, see Internet data transfers.

    • Private Network: No data transfer costs are incurred for access over a private network.

    • Internet + Private Network:

      When you access the gateway over the internet, data transfer costs are incurred. Internet traffic is billed based on CDT and uses the BGP (multi-line) mode. No data transfer costs are incurred for access over a private network.

    Private Network

    Select the virtual private cloud (VPC) where the gateway instance runs. To create a new VPC, go to the VPC console.

    Note
    • The VPC of the gateway must be the same as the VPC of the service.

    Select Zone

    Select Auto-assign or Manually Select.

    • Auto-assign: Select a vSwitch, and the system automatically allocates 2 zones to deploy gateway nodes.

    • Manually Select: Manually select the zones and vSwitches for deploying gateway nodes.

    vSwitch

    Select the vSwitch where the gateway instance runs. To create a new vSwitch, go to the VPC console.

    Simple Log Service

    Select Use Simple Log Service to activate Simple Log Service (SLS) and enable the gateway log delivery feature, providing log analysis and dashboards. For more information, see Enable gateway log delivery.

    Service-linked Role

    Automatically created. This role allows AI Gateway to access other Alibaba Cloud services.

  4. On the Confirm Order page, review the AI Gateway configuration details and then click Activate Now.

    Note

    Creating the gateway instance takes 1 to 5 minutes.

  5. Return to the AI Gateway Instance page. Verify that the gateway information is correct and that the Status is Running. This indicates that the gateway was created successfully.

Advanced features

When you create a gateway instance, you can configure advanced features to use log data for monitoring and analysis or to compress requests and responses to reduce gateway traffic. You can enable Gzip hardware acceleration only when you create the instance. You cannot enable this feature after the instance is created. However, there are no restrictions on when you can enable Simple Log Service (SLS).

Enable Gzip hardware acceleration

Gzip hardware acceleration is a technology that uses dedicated hardware devices for fast data compression and decompression. By offloading Gzip compression and decompression tasks from the CPU to dedicated hardware, this technology significantly improves processing efficiency and reduces CPU load.

Note

The Serverless edition does not support Gzip hardware acceleration.

Procedure

  1. On the AI Gateway purchase page, set the following parameters and click Buy Now to create a gateway instance:

    • Region: Gzip hardware acceleration is supported in the Hangzhou, Beijing, Shanghai, Shenzhen, Ulanqab, China (Hong Kong), and Singapore regions.

      This feature may not be available in all zones within the supported regions. For the most up-to-date information, refer to the product purchase page.
    • Instance Specification: Select aigw.medium.x1 or higher.

    • GZIP Hardware Accelerator: Select this option to enable Gzip hardware acceleration.

    • Zone: Select a zone that Supports Gzip hardware acceleration and then select a vSwitch.

  2. After the instance is created, click the ID or name of the target instance. In the navigation pane on the left, click Parameters. In the Gateway Engine Parameters area, edit the EnableGzipHardwareAccelerate parameter.

    Note

    If you did not select Enable Gzip Hardware Acceleration when you purchased the instance, you cannot enable this feature later.

  3. After you enable this feature, the client must be able to process Gzip-compressed data. For supported clients, add the Accept-Encoding: gzip request header.

Performance reference

How much traffic can be saved by enabling Gzip compression?

When using Gzip for compression, the compression ratio, which is the ratio of the compressed data size to the original data size, is highly dependent on the data itself. A lower compression ratio indicates better compression, while a higher compression ratio indicates poorer compression.

Generally, if data contains many repetitive patterns or structures, such as letters, words, and punctuation in text, Gzip compression is more effective and results in a lower compression ratio. Conversely, for data with high randomness and entropy, such as images, videos, and already compressed files, the compression effect is limited and the compression ratio is typically higher because of low internal repetition.

The compression ratio varies significantly among different customers depending on their business attributes. According to statistics from instances with Gzip enabled in core regions, the compression ratio for most instances ranges from 10% to 50%. This means that after you enable Gzip, you can save over 50% of traffic on average.

With Gzip already enabled, how many instance resources can be saved using hardware acceleration?

After you enable Gzip hardware acceleration, the gateway uses dedicated hardware for compression, which saves CPU resources. The following stress testing data compares the CPU consumption of a standalone instance with Gzip hardware acceleration enabled and a 4-node instance that uses software-based Gzip. Both instances handle the same number of queries per second (QPS).

For example, the compressed data is a JSON text of approximately 120 KB:

QPS

Hardware-accelerated Gzip / aigw.medium.x1 / single-node CPU consumption

Software-based Gzip / aigw.medium.x1 / 4-node CPU consumption

2000

9%

11%

5000

26%

28%

10000

56%

56%

13000

69%

72%

The table shows that the CPU consumption of the single-node instance with Gzip hardware acceleration is almost the same as that of the 4-node instance with software-based Gzip. This means that a workload that originally requires four nodes can be handled by a single node after you enable Gzip hardware acceleration, which saves about 75% of instance resources.

Enable gateway log delivery

To collect, store, and analyze gateway operational logs, you can activate Simple Log Service (SLS) when you create a gateway instance. This allows for log analysis and dashboard monitoring.

When you create the gateway instance, select Use Simple Log Service. This action activates SLS and enables the gateway log delivery feature.

After you enable log delivery, you can go to Observability & Analysis > Logs to view gateway logs.

Log field descriptions

Field name

Type

Description

__time__

long

The time when the log was generated.

cluster_id

string

The purchased gateway instance.

ai_log

json

A log field designed for Model API, Agent API, and MCP API. The field is in JSON format. This field is empty for other types of APIs.

  • api: The name of the AI API.

  • cache_status: If content caching is enabled for the Model API, this field indicates whether the request hits the cache.

  • consumer: If consumer authentication is enabled, this field records the identity of the consumer for the current request.

  • fallback_from: If a fallback policy is enabled for the Model API, this field records the route from which the request falls back.

  • input_token: The number of input tokens in the LLM request.

  • llm_first_token_duration: The response time (RT) of the first packet of the LLM request.

  • llm_service_duration: The overall RT of the LLM request.

  • model: The model name in the LLM request.

  • output_token: The number of output tokens in the LLM request.

  • response_type: The response type of the LLM request, such as streaming or non-streaming.

  • safecheck_status: The Content Moderation status of the LLM request.

  • token_ratelimit_status: Indicates whether the LLM request is blocked by token-based rate limiting.

authority

string

The Host header in the request message.

bytes_received

long

The size of the request body, excluding the header.

bytes_sent

long

The size of the response body, excluding the header.

downstream_local_address

string

The gateway pod address.

downstream_remote_address

string

The address of the client that connects to the gateway.

duration

long

The total time taken to process the request. This is the period from when the gateway receives the first byte from the downstream service to when it sends the last byte of the response. Unit: milliseconds.

method

string

The HTTP method.

path

string

The path in the HTTP request.

protocol

string

The HTTP protocol version.

request_duration

long

The period from when the gateway receives the first byte from the downstream service to when it receives the last byte from the downstream service. Unit: milliseconds.

request_id

string

The gateway generates an ID for each request and includes it in the x-request-id header. The backend can use this field for logging and troubleshooting.

requested_server_name

string

The server name used for the SSL-VPN connection.

response_code_details

string

Provides additional information about the response code. For example, `via_upstream` indicates that the response code is returned by the backend service, and `route_not_found` indicates that no matching route is found for the request.

response_tx_duration

long

The period from when the gateway receives the first byte from the upstream service to when it sends the last byte to the downstream service. Unit: milliseconds.

route_name

string

The route name.

start_time

string

The time when the request is initiated. Format: UTC.

trace_id

string

The trace ID.

upstream_cluster

string

The upstream cluster.

upstream_host

string

The upstream IP address.

upstream_local_address

string

The local address used to connect to the upstream service.

upstream_service_time

long

The time taken by the upstream service to process the request, in milliseconds. This includes the network latency for the gateway to access the upstream service and the processing time of the upstream service itself.

upstream_transport_failure_reason

string

The reason why the connection to the upstream service failed.

user_agent

string

The User-Agent header in the HTTP request.

x_forwarded_for

string

The x-forwarded-for header in the HTTP request. This header usually indicates the originating IP address of the HTTP client.