AI fallback - API Gateway - Alibaba Cloud Documentation Center

If a model service for a Model API cannot respond because of an exception, fault, or high load, you can configure a fallback to a backup model. This configuration prevents response failures caused by service interruptions. This topic describes how to enable and configure fallback for a Model API.

What is AI fallback

AI fallback allows a Model API to switch to a backup model when the primary model service is unavailable. This improves API availability and prevents request failures caused by service exceptions or high load.

Model API supports multi-level fallback. You can enable and properly configure fallback to increase the success rate of AI requests.

The AI gateway lets you configure one or more fallback models. If the primary model service is unavailable, the gateway calls the fallback models in sequence. The gateway returns a response immediately after a successful call.

A fallback model includes the following configuration items:

Service name: The name of the backup model service. You can select a service from the list of services for the instance.
Model name: You can either use pass-through or specify a model name, such as Qwen-plus.

Trigger conditions

AI fallback is triggered when a call to a model service returns any HTTP 4xx or 5xx error status code.

Prerequisites

A gateway instance is created.
A service is created.

Configure AI fallback

Go to the Instance page in the AI Gateway console. In the top menu bar, select the region of the target instance, and then click the target Instance ID.

In the navigation pane on the left, click Model API, where you can enable fallback when you create or edit a Model API.

Create a Model API: Click Create Model API and enable Fallback on the Model API configuration page.
Edit a Model API: Click Edit in the Actions column for the target API. On the Model API configuration page, enable Fallback.

Configuration item		Description
Fallback		Enable this feature to add fallback services. The services are executed in descending order of priority. Note You can reuse the same service to create multiple fallback policies.
Fallback on backend service errors only		If you enable this option, fallback is triggered only when the backend service returns an error. If this option is disabled, fallback is also triggered when the gateway rate-limits or blocks the request.
Fallback List	Service Name	Select a fallback service.
	Model Name	The default value is pass-through. This passes the model name from the original request directly to the fallback model service.
First packet timeout		The timeout period, in milliseconds, for the first packet of a streaming response. This setting applies only to streaming responses. A value of 0 disables this feature. If you set a non-zero timeout, the gateway falls back to the backup service when the first packet response is too slow.