More policies and plugins - API Gateway - Alibaba Cloud Documentation Center

AI Gateway lets you add policies and configure plugins at the API level to improve the security, performance, and maintainability of your APIs.

Important

Policy configuration changes take effect immediately. You do not need to republish the API.

Procedure

Go to AI Gateway Instance, selec the region, and click the target instance ID.
In the navigation pane on the left, click LLM API. Then, click the name of the API to go to the API details page.
Click the Policies & Plugins tab. In the More Policies & Plugins section, select where you want to configure the policy or plugin (Inbound Processing or Outbound Processing), and then click Enable Policy/Plugin.
In the Enable Policy/Plugin panel, select and configure a policy or plugin. For more information, see Policy configurations and Plugin configurations.

Policy configurations

Concurrency control

Concurrency control rules count the total number of requests being processed by the gateway. When this number reaches a specified threshold, the gateway immediately blocks traffic. You can set this threshold to the maximum number of concurrent requests that your backend service can handle. This protects the availability of your backend service during periods of high concurrency.

Procedure

On the Add Policy tab, click the Concurrency Control card. In the Add Policy: Concurrency Control panel, configure the parameters.

Configuration item			Description
Enable			If enabled, the concurrency control rule takes effect.
Overall Concurrency Threshold			Set the Overall Concurrency Threshold.
Web Fallback Behavior	Return Specified Content	HTTP Status Code	Set the HTTP Status Code. The default value is 429.
		Return Content-type	Set Return Content-type to Plain Text or JSON.
		HTTP Response Body	Enter the response body text.
	Specify Content to Return	Redirect URL	Enter the Redirect URL.

Traffic shaping

Traffic shaping rules monitor the queries per second (QPS) of an API. When the QPS reaches a specified threshold, the gateway immediately blocks traffic. This prevents sudden traffic spikes from overwhelming the backend service and ensures high availability.

Procedure

On the Add Policy tab, click the Traffic Shaping card. In the Add Policy: Traffic Shaping panel, configure the parameters.

Configuration item			Description
Enable			If enabled, the traffic shaping rule takes effect.
Overall QPS Threshold			Set the Overall QPS Threshold.
Web Fallback Behavior	Return Specified Content	HTTP Status Code	Set the HTTP Status Code. The default value is 429.
		Return Content-type	Set Return Content-type to Plain Text or JSON.
		HTTP Response Body	Enter the response body text.
	Redirect To A Specific Page	Redirect URL	Enter the Redirect URL.

Circuit breaking policy

Circuit breaking rules monitor the response time or error rate of an API. When a threshold is reached, the gateway immediately trips the circuit. For a specified period, the gateway stops calling the unstable resource. This prevents the backend service from being affected and ensures its high availability. After the specified time, the gateway resumes calls to the resource.

Procedure

On the Add Policy tab, click the Circuit Breaking card. In the Add Policy: Circuit Breaking panel, configure the parameters.

Configuration item			Description
Enable			If enabled, the circuit breaking rule takes effect.
Statistics Window Duration			The length of the time window for statistics. The value can be from 1 second to 120 minutes.
Minimum Number Of Requests			The minimum number of requests required to trigger circuit breaking. If the number of requests in the current statistics window is less than this value, the rule is not triggered, even if the circuit breaking conditions are met.
Threshold Type			Select Slow Call Ratio (%) or Error Ratio (%) as the threshold. If you select Slow Call Ratio (%) as the threshold, you must set the allowed Slow Call RT (maximum response time). A request is counted as a slow call if its response time is greater than this value. Set the slow call ratio that triggers circuit breaking in the degradation threshold. After the rule is enabled, if the number of requests within the statistics window duration is greater than the minimum number of requests, and the slow call ratio exceeds the threshold, requests are automatically blocked for the circuit breaking duration. After the circuit breaking duration, the circuit breaker enters a probing recovery state. If the response time of the next request is less than the set Slow call RT, the circuit breaking ends. If it is greater than the set Slow call RT, the circuit will be broken again. If you select Error Ratio (%) as the threshold, you must set the error ratio that triggers circuit breaking in the degradation threshold. After the rule is enabled, if the number of business errors within the statistics window duration is greater than the minimum number of requests, and the error ratio exceeds the threshold, requests are automatically blocked for the circuit breaking duration.
Slow Call RT			Set the allowed Slow Call RT (maximum response time).
Circuit Breaking Ratio Threshold			The slow call ratio threshold that triggers circuit breaking. The value can be from 0 to 100, which represents 0% to 100%.
Circuit Breaking Duration (s)			The duration for which the circuit remains broken after being triggered. After a resource enters the circuit breaking state, requests fail fast during the configured circuit breaking duration.
Web Fallback Behavior	Return Specified Content	HTTP Status Code	Set the HTTP Status Code. The default value is 429.
		Return Content-type	Set Return Content-type to Plain Text or JSON.
		HTTP Response Body	Enter the response body text.
	Redirect To A Specific Page	Redirect URL	Enter the Redirect URL.

IP blacklist and whitelist policy

The IP blacklist and whitelist policy controls client access to services based on a pre-configured list of allowed (whitelist) or denied (blacklist) IP addresses.

Procedure

On the Add Policy tab, click the IP Blacklist/Whitelist card. In the Add Policy: IP Blacklist/Whitelist panel, configure the parameters.

Parameter	Description
Enable	If enabled, the IP blacklist and whitelist policy takes effect.
Name	A custom ID to distinguish and manage multiple policies.
Notes	A description of the policy for easy identification and management.
Type	Specify whether the list is a blacklist or a whitelist to control the access policy type. Whitelist: Allows access only from specified IP addresses. All other IP addresses are denied by default. Blacklist: Blocks access from specific IP addresses. All other IP addresses are allowed by default.
IP Address/CIDR Block	Configure the list of IP addresses or CIDR blocks to allow or deny. Multiple entries are supported. Use a format such as `192.168.1.1/24`.

Timeout policy

AI Gateway provides API-level timeout settings. You can configure the maximum time the gateway waits for a response from a backend service for a specific API. If the gateway does not receive a response from the backend service within the specified time, it returns an HTTP status code of 504 (Gateway Timeout) to the client.

Procedure

On the Add Policy tab, click the Timeout card. In the Add Policy: Timeout panel, configure the parameters.

Note

After you configure and enable the timeout policy, verify that the timeout rule works as expected.

Parameter

Description

Enable

Specifies whether to enable the timeout policy.

Enable: The gateway API timeout policy takes effect.
Disable: The gateway API timeout policy is disabled.

Timeout Period

Set the timeout period for the current API in seconds.

Note

If you set this parameter to 0 or disable the timeout policy, the gateway waits indefinitely for a response.

Retry policy

AI Gateway provides API-level retry settings that allow you to automatically retry failed requests. You can configure the conditions that trigger a retry, such as a connection failure, an unavailable backend service, or a specific HTTP status code.

API retry conditions

When the backend service returns a 5xx error, AI Gateway automatically retries the failed request based on the configured number of retries.

The retry conditions for the HTTP Protocol are as follows:
- 5xx: If the backend service returns any 5xx response, or if a connection is lost, reset, or a read timeout event occurs, AI Gateway attempts to retry the failed request.
  Note
  5xx includes the conditions for connect-failure and refused-stream.
- reset: If a connection is lost, reset, or a read timeout event occurs, AI Gateway attempts to retry the failed request.
- connect-failure: If a connection to the backend service cannot be established, AI Gateway attempts to retry the failed request.
- refused-stream: If the backend service resets the stream with a REFUSED_STREAM error code, AI Gateway attempts to retry the failed request.
- retriable-status-codes: If the HTTP status code of the backend service response matches one of the specified retry status codes, AI Gateway attempts to retry the request.
  Note
  You can use retry status codes only if you specify retriable-status-codes in the retry conditions.
The retry conditions for the GRPC Protocol are as follows:
- cancelled: If the gRPC status code in the response header from the backend gRPC service is cancelled, AI Gateway attempts to retry the request.
- deadline-exceeded: If the gRPC status code in the response header from the backend gRPC service is deadline-exceeded, AI Gateway attempts to retry the request.
- internal: If the gRPC status code in the response header from the backend gRPC service is internal, AI Gateway attempts to retry the request.
- resource-exhausted: If the gRPC status code in the response header from the backend gRPC service is resource-exhausted, AI Gateway attempts to retry the request.
- unavailable: If the gRPC status code in the response header from the backend gRPC service is unavailable, AI Gateway attempts to retry the request.

Procedure

On the Add Policy tab, click the Retry card. In the Add Policy: Retry panel, configure the parameters.

Note

After you configure and enable the retry policy, verify that the retry rule works as expected.

Parameter	Description
Enable	Specifies whether to enable the retry policy. Enable: The gateway API retry policy takes effect. Disabled: The gateway API retry policy does not take effect. After you disable retries, the gateway has a default internal retry configuration. By default, the number of retries is 2 and the retry conditions are `connect-failure`, `refused-stream`, `unavailable`, `cancelled`, `non_idempotent`, or `retriable-status-codes`.
Number Of Retries	The maximum number of retries for a failed request. You can set this parameter to an integer from 0 to 10. We recommend that you set this parameter to 0, 1, or 2. If you set this parameter to 0, failed requests are not retried.
Retry Conditions	Select the appropriate conditions. You can select multiple conditions.
Retry Status Codes	Retry the request for responses with specific HTTP status codes. You can configure multiple HTTP status codes. Important You can configure Retry Status Codes only if you specify `retriable-status-codes` for Retry Conditions.

Header modification policy

The header modification feature lets you modify the headers in the original request before it is forwarded to the backend service, or in the response from the backend service before it is returned to the client.

Procedure

On the Add Policy tab, click the Header Modification card. In the Add Policy: Header Modification panel, configure the parameters.

Configuration item	Description
Enable	Specifies whether to enable the header modification policy. Enable: If enabled, the gateway controls the request and response headers. Disable: If disabled, the gateway does not control the request and response headers.
Header Type	Select the header type. Request: Modifies the request header. Response: Modifies the response header.
Operation Type	Select the operation type. Add: Adds a header to the request or response. Note If the header to be added already exists, the new header value is appended to the existing value, separated by a comma (,). Modify: Modifies a specified header in the request or response. Note If the specified header does not exist, it is added with the specified header key and value. If the specified header exists, its value is overwritten. Delete: Deletes a specified header from the request or response.
Header Key	Enter the name of the request or response header.
Header Value	Enter the value of the request or response header.

Plugin configurations

Click the Add Plugin tab.
In the Quick Navigation section, select the type of plugin to install or search for the plugin by name, and then click the plugin card:
- If the plugin is not installed, click Install and Configure in the dialog box that appears. Then, configure the plugin rules and set the status to enabled.
- If the plugin is already installed, configure the plugin rules and set the status to enabled in the dialog box that appears.
Click OK. You are redirected to the API attachment list, where you can view the attachment and enabled status of the plugin for the API.