All Products
Search
Document Center

API Gateway:Policies and plugins

Last Updated:Jun 21, 2026

AI Gateway lets you add policies and configure plugins at the API level to improve the security, performance, and maintainability of your APIs.

Important

Policy configuration changes take effect immediately. You do not need to republish the API.

Procedure

  1. Go to the Instance page of the AI Gateway console. In the top navigation bar, select the region where your target instance is located, and then click the target instance ID.

  2. In the left-side navigation pane, click Model API, and then click the target API Name to go to the API Details page.

  3. Click the Policies and Plug-ins tab. In the More policies and plugins section, select where you want to configure the policy or plugin (Inbound Processing/Outbound Processing), and then click Enable Policy/Plug-in.

  4. In the Enable Policy/Plug-in panel, select and configure a policy or plugin. For more information, see Policy configurations and Plugin configurations.

Policy configurations

Concurrency control

Concurrency control protects your backend service by limiting the number of simultaneous requests. The gateway counts the requests it is processing, and when this count reaches a specified threshold, it immediately blocks subsequent traffic to prevent the backend service from being overloaded and to ensure its availability.

Procedure

On the Add Policy tab, click the Concurrency Control card. In the Add Policy panel, configure the following parameters.

Parameter

Description

Enable or Not

When enabled, the concurrency control rule takes effect.

Overall Concurrency Threshold

Set the Overall Concurrency Threshold.

Web Fallback Behavior

Return Specific Content

HTTP Status Code

Set the HTTP Status Code. The default is 429.

Type of Returned Content

Specifies whether the Type of Returned Content is Regular Text or JSON.

HTTP Text

The content of the response body to return when the threshold is exceeded.

Return Specific Content

Redirect URL

Enter the Redirect URL.

Traffic shaping

Traffic shaping monitors an API's queries per second (QPS). When the QPS reaches a specified threshold, the gateway immediately blocks traffic. This prevents sudden traffic spikes from overwhelming the backend service and ensures high availability.

Procedure

On the Add Policy tab, click the Traffic Shaping card. In the Add Policy panel, configure the following parameters.

Parameter

Description

Enable or Not

When enabled, the traffic shaping rule takes effect.

Overall QPS Threshold

Set the Overall QPS Threshold.

Web Fallback Behavior

Return Specific Content

HTTP Status Code

Set the HTTP Status Code. The default value is 429.

Type of Returned Content

Specifies whether the Type of Returned Content is Regular Text or JSON.

HTTP Text

The content of the response body to return when the threshold is exceeded.

Redirect to Specified Page

Redirect URL

Enter the Redirect URL.

Circuit breaking policy

A circuit breaking policy protects your backend service by monitoring its response time or error ratio. When a specified threshold is exceeded, the gateway "trips the circuit" and stops sending requests to the service for a set duration. This action prevents cascading failures and ensures high availability. After the duration passes, the gateway cautiously resumes sending requests to the resource.

Procedure

On the Add Policy tab, click the Circuit Breaking card. In the Add Policy panel, configure the following parameters.

Parameter

Description

Enable or Not

When enabled, the circuit breaking rule takes effect.

Statistical Window Duration

The duration of the time window for collecting request statistics. Valid values range from 1 second to 120 minutes.

Minimum Number of Requests

The minimum number of requests within the statistics window required to evaluate the circuit breaking rule. If the request count is below this value, the circuit will not trip, even if the threshold is exceeded.

Threshold Type

Select whether to use the Slow Call Ratio (%) or Exception Ratio (%) as the threshold.

  1. When you select Slow Call Ratio (%), you must also specify a Slow Call RT (response time). A request is counted as a slow call if its response time exceeds this value. When the rule is enabled, the circuit trips if the request count in a statistics window exceeds the Minimum Number Of Requests and the ratio of slow calls exceeds the threshold. When tripped, all subsequent requests fail immediately for the configured duration. After this period, the circuit enters a half-open state and allows a single probing request. If this request succeeds (response time is less than the Slow Call RT), the circuit closes and normal operation resumes. If it fails, the circuit trips again.

  2. When you select Exception Ratio (%), you must specify the error ratio that trips the circuit. The circuit trips if two conditions are met within a statistics window: the request count exceeds the minimum, and the error ratio exceeds the defined threshold. When tripped, subsequent requests fail immediately for the configured duration.

Slow Call RT

Set the allowed Slow Call RT (the maximum response time).

Circuit Breaking Ratio Threshold

The slow call ratio that trips the circuit. Valid values: 0 to 100 (representing 0% to 100%).

Circuit Breaking Duration (s)

The duration, in seconds, that the circuit remains open after tripping. During this time, all requests to the resource fail immediately.

Web Fallback Behavior

Return Specific Content

HTTP Status Code

Set the HTTP Status Code. The default is 429.

Type of Returned Content

Specifies whether the Type of Returned Content is Regular Text or JSON.

HTTP Text

The content of the response body to return when the threshold is exceeded.

Redirect to Specified Page

Redirect URL

Enter the Redirect URL.

IP blacklist and whitelist policy

The IP blacklist and whitelist policy controls client access to your services based on a pre-configured list of IP addresses that are either allowed (whitelist) or denied (blacklist).

Procedure

On the Add Policy tab, click the IP Blacklist/Whitelist card. In the Add Policy panel, configure the following parameters.

Parameter

Description

Enable

When enabled, the IP blacklist and whitelist policy takes effect.

Name

A unique name for the policy. This helps with identification, especially when managing multiple policies.

Remarks

A description for the policy to help with identification and management.

Type

Specifies the access control type.

  • Whitelist: Allows access only from the specified IP addresses. Access from all other IP addresses is denied by default.

  • Blacklist: Blocks access from the specified IP addresses. Access from all other IP addresses is allowed by default.

IP Address/CIDR Block

The list of IP addresses or CIDR blocks to which this policy applies. Multiple entries are supported. For example, 192.168.1.1/24.

Timeout policy

AI Gateway lets you set API-level timeouts, defining the maximum time the gateway waits for a response from a backend service. If no response is received within this period, the gateway returns an HTTP 504 Gateway Timeout status code to the client.

Procedure

On the Add Policy tab, click the Timeout card. In the Add Policy panel, configure the following parameters.

Note

After you configure and enable the timeout policy, verify that the timeout rule works as expected for your service.

Parameter

Description

Enable

Specifies whether to enable the timeout policy.

  • Enable: The API timeout policy takes effect.

  • Disable: The API timeout policy is disabled.

Timeout Period

Specifies the timeout period for the API, in seconds.

Note

If you set this parameter to 0 or disable the timeout policy, the gateway waits indefinitely for a response.

Retry policy

AI Gateway can automatically retry failed requests at the API level. You can configure retries to be triggered by specific conditions, such as a connection failure, an unavailable backend service, or a specific HTTP status code, to trigger a retry.

API retry conditions

When the backend service returns a 5xx error, AI Gateway automatically retries the failed request based on the configured number of retries.

image
  • Retry conditions for HTTP:

    • 5xx: If the backend service returns any 5xx response, or if a connection is lost, reset, or a read timeout occurs, AI Gateway attempts to retry the request.

      Note

      The 5xx condition includes the connect-failure and refused-stream conditions.

    • reset: If a connection is lost, reset, or a read timeout occurs, AI Gateway attempts to retry the request.

    • connect-failure: If a connection to the backend service cannot be established, AI Gateway attempts to retry the failed request.

    • refused-stream: If the backend service resets the stream with a REFUSED_STREAM error code, AI Gateway attempts to retry the request.

    • retriable-status-codes: If the HTTP status code of the backend service response matches one of the specified retry status codes, AI Gateway attempts to retry the request.

      Note

      You can use retry status codes only if you specify retriable-status-codes in the retry conditions.

  • Retry conditions for gRPC:

    • cancelled: If the gRPC status code in the response header from the backend gRPC service is cancelled, AI Gateway attempts to retry the request.

    • deadline-exceeded: If the gRPC status code in the response header from the backend gRPC service is deadline-exceeded, AI Gateway attempts to retry the request.

    • internal: If the gRPC status code in the response header from the backend gRPC service is internal, AI Gateway attempts to retry the request.

    • resource-exhausted: If the gRPC status code in the response header from the backend gRPC service is resource-exhausted, AI Gateway attempts to retry the request.

    • unavailable: If the gRPC status code in the response header from the backend gRPC service is unavailable, AI Gateway attempts to retry the request.

Procedure

On the Add Policy tab, click the Retry card. In the Add Policy panel, configure the following parameters.

Note

After you configure and enable the retry policy, verify that the retry rule works as expected for your service.

Parameter

Description

Enable

Specifies whether to enable the retry policy.

  • Enable: The API retry policy takes effect.

  • Disable: The API retry policy is disabled.

    If this policy is disabled, the gateway falls back to a default internal retry configuration. The number of retries is 2, and the retry conditions are connect-failure, refused-stream, unavailable, cancelled, non_idempotent, and retriable-status-codes.

Retry Times

The maximum number of retries for a failed request. Valid values: 0 to 10. A value of 2 or less is recommended.

If you set this parameter to 0, failed requests are not retried.

Retry Condition

Select one or more conditions that trigger a retry.

Retry Status Code

The specific HTTP status codes that trigger a retry. You can specify multiple codes.

Important

You can configure Retry Status Code only if you select retriable-status-codes for the Retry Condition.

Header modification policy

The header modification feature lets you modify request headers before they are sent to the backend service, and response headers before they are returned to the client.

Procedure

On the Add Policy tab, click the Edit Header card. In the Add Policy panel, configure the following parameters.

Parameter

Description

Enable

Specifies whether to enable the header modification policy.

  • Enable: When enabled, the gateway modifies the request and response headers as configured.

  • Disable: When disabled, the gateway does not modify the request or response headers.

Header Type

Select the header type to modify.

  • Request: Modifies the request header.

  • Response: Modifies the response header.

Operation Type

Select the operation to perform.

  • Add: Adds a header to the request or response.

    Note

    If the header already exists, the new value is appended to the existing value, separated by a comma (,).

  • Modify: Modifies a specified header in the request or response.

    Note

    • If the specified header does not exist, it is added with the specified key and value.

    • If the specified header exists, its value is overwritten.

  • Delete: Deletes a specified header from the request or response.

Header Key

Enter the name of the request or response header.

Header Value

Enter the value of the request or response header.

Plugin configurations

  1. Click the Add Plug-in tab.

  2. In the Quick Navigation section, find the desired plugin by type or by searching its name, and then click the plugin card:

    • If the plugin is not installed, click Install and Configure in the dialog box that appears. Then, configure the plugin rules and enable it.

    • If the plugin is already installed, configure the plugin rules and enable it in the dialog box that appears.

  3. Click OK. You are redirected to the API attachment list, where you can view its attachment status.

    In the attachment list, the Inbound Processing section shows jwt-auth (disabled) and key-auth (enabled). The Outbound Processing section shows key-auth (enabled) and jwt-auth (disabled). The backend service address is test.static. The request flows from the front-end API, through Inbound Processing, to the backend service, and then through Outbound Processing.