AI Gateway lets you add policies and configure plugins at the API level to improve the security, performance, and maintainability of your APIs.
Policy configuration changes take effect immediately. You do not need to republish the API.
Procedure
Go to AI Gateway Instance, selec the region, and click the target instance ID.
In the navigation pane on the left, click LLM API. Then, click the name of the API to go to the API details page.
Click the Policies & Plugins tab. In the More Policies & Plugins section, select where you want to configure the policy or plugin (Inbound Processing or Outbound Processing), and then click Enable Policy/Plugin.
In the Enable Policy/Plugin panel, select and configure a policy or plugin. For more information, see Policy configurations and Plugin configurations.
Policy configurations
Concurrency control
Concurrency control rules count the total number of requests being processed by the gateway. When this number reaches a specified threshold, the gateway immediately blocks traffic. You can set this threshold to the maximum number of concurrent requests that your backend service can handle. This protects the availability of your backend service during periods of high concurrency.
Traffic shaping
Traffic shaping rules monitor the queries per second (QPS) of an API. When the QPS reaches a specified threshold, the gateway immediately blocks traffic. This prevents sudden traffic spikes from overwhelming the backend service and ensures high availability.
Circuit breaking policy
Circuit breaking rules monitor the response time or error rate of an API. When a threshold is reached, the gateway immediately trips the circuit. For a specified period, the gateway stops calling the unstable resource. This prevents the backend service from being affected and ensures its high availability. After the specified time, the gateway resumes calls to the resource.
IP blacklist and whitelist policy
The IP blacklist and whitelist policy controls client access to services based on a pre-configured list of allowed (whitelist) or denied (blacklist) IP addresses.
Timeout policy
AI Gateway provides API-level timeout settings. You can configure the maximum time the gateway waits for a response from a backend service for a specific API. If the gateway does not receive a response from the backend service within the specified time, it returns an HTTP status code of 504 (Gateway Timeout) to the client.
Retry policy
AI Gateway provides API-level retry settings that allow you to automatically retry failed requests. You can configure the conditions that trigger a retry, such as a connection failure, an unavailable backend service, or a specific HTTP status code.
Header modification policy
The header modification feature lets you modify the headers in the original request before it is forwarded to the backend service, or in the response from the backend service before it is returned to the client.
Plugin configurations
Click the Add Plugin tab.
In the Quick Navigation section, select the type of plugin to install or search for the plugin by name, and then click the plugin card:
If the plugin is not installed, click Install and Configure in the dialog box that appears. Then, configure the plugin rules and set the status to enabled.
If the plugin is already installed, configure the plugin rules and set the status to enabled in the dialog box that appears.
Click OK. You are redirected to the API attachment list, where you can view the attachment and enabled status of the plugin for the API.
