Connect Vertex AI to AI Gateway for Centralized LLM Routing - AI Gateway

This topic describes how to connect to Google Vertex AI through Alibaba Cloud AI Gateway to centrally manage and call models, such as Gemini.

Background

Google Vertex AI is an enterprise-grade AI platform from Google Cloud that supports advanced large language models (LLMs), such as Gemini and PaLM. By connecting to Vertex AI through Alibaba Cloud AI Gateway, you can:

Unified access: Access Vertex AI using the OpenAI-compatible protocol without adapting to native APIs.
Flexible authentication: Choose between a GCP Service Account and Vertex AI Express Mode for authentication.
Protocol Transformation: Enables automatic protocol transformation via the AI Gateway or using natively compatible Vertex AI endpoints.
High availability: Use the fallback capability of AI Gateway for cross-provider disaster recovery.

Prerequisites

You have created a Virtual Private Cloud (VPC) and attached a public NAT Gateway. For more information, see Virtual Private Cloud and vSwitch and Public network access.
You have created an AI Gateway instance. For more information, see Create a gateway instance.
You have obtained access credentials for your Google Cloud project, which can be a Service Account key or a Vertex AI Express Mode API key.

Scenarios overview

Scenario	Description
Scenario 1: Connect using a GCP Service Account	Authenticate using a GCP Service Account key. This method supports all Vertex AI features and lets you choose OpenAI protocol compatibility.
Scenario 2: Connect using Vertex AI Express Mode	Use the Express Mode of Vertex AI to connect directly with an API key. This simplifies the configuration.
Scenario 3: Connect using the Vertex AI REST API	Proxy the native REST API of Vertex AI. This is suitable for non-Gemini models, such as Imagen and Veo.
Scenario 4: Connect using the Google GenAI SDK	Use the official GenAI SDK from Google. Clients handle OAuth authentication. The gateway only observes and meters requests. This method is not recommended.
Scenario 5: Connect for multimodal understanding	Use Gemini models through the Chat Completions API to understand and analyze multimodal content, such as images and videos.
Scenario 6: Connect for image generation	Use models, such as Gemini Nano Banana Pro, to generate, edit, and create variations of images using the OpenAI-compatible protocol.

Scenario 1: Connect using GCP Service Account

A GCP Service Account is the standard authentication method for Google Cloud. It is suitable for enterprise-level applications that require full Vertex AI functionality.

1. Prepare a GCP Service Account

Before you start the configuration, you must create a Service Account in the Google Cloud Console and download its JSON key file.

Log on to the Google Cloud Console.
In the navigation pane on the left, choose IAM & Admin > Service Accounts.
Click Create service account. Enter a name and description.
Assign the Vertex AI User role or a higher-level role to the service account.
Click KEY and select Create to download the key JSON file.

Note The key file contains sensitive information. You must store it securely. The following code block shows a sample JSON format:

{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "your-private-key-id",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "your-sa@your-project.iam.gserviceaccount.com",
  "token_uri": "https://oauth2.googleapis.com/token"
}

For more information about how to manage GCP Service Accounts, see the Google documentation.

2. Create an AI service

Log on to the AI Gateway console.
In the navigation pane on the left, choose Instances. Then, select a region from the top navigation bar.
On the Instances page, click the ID of the target Instance.
In the navigation pane on the left, click Services, and then click the Services tab.
Click Create service. In the Create service panel, configure the AI service with the following parameters:

Configuration item	Description
Service source	Select AI service
Service name	Enter a service name, such as `vertex-ai`
Large language model provider	Select Vertex AI
Authentication method	Select GCP Service Account
GCP Service Account KEY	Paste the full Service Account JSON key content
vertexLocation	The Vertex AI service region. Default is `global`. You can also choose other regions such as `us-central1` or `europe-west4`.
vertexProjectId	Your GCP project ID (read-only). The system parses the `project_id` field automatically from the JSON key.
OpenAI protocol compatibility mode	Select a protocol conversion mode (see details below).

OpenAI protocol compatibility modes:

Option	Description	Use case
AI Gateway conversion	Convert protocols during request and response processing. The gateway converts OpenAI-formatted requests to Vertex AI native format and converts responses back to OpenAI format.	When you need Vertex AI native features, such as specific security settings.
Vertex AI native compatible endpoint	Use Vertex AI’s OpenAI-compatible Chat Completions endpoint. Accepts OpenAI-formatted requests directly.	When you need Vertex AI-specific fields such as `extra_content`.

Important:

If the JSON key format is incorrect or lacks required fields, the system displays an error message.
The vertexProjectId field is read-only. The frontend automatically parses the project_id field from the JSON key.
When you use the Vertex AI native compatible endpoint mode, you must add the provider prefix google/ to the model parameter in the request body. For example, you must change gemini-3-flash-preview to google/gemini-3-flash-preview.

For more information about the supported parameters and models in the Vertex AI native compatible endpoint mode, see the Google documentation.

3. Create a Model API

Log on to the AI Gateway console.
In the navigation pane on the left, choose Instances. Then, select a region from the top navigation bar.
On the Instances page, click the ID of the target Instance.
In the navigation pane on the left, click Model API, and then click Create Model API.
In the Create Model API panel, configure the following basic settings:
- Domain name: Configure a custom domain name. If you use the default environment domain, rate limiting may be triggered.
- Base Path: The base path for the API, such as /gemini.
- Model type: Select Text generation.
- Protocol: Select OpenAI-compatible.
- Routing configuration: Select the /v1/chat/completions route.
- AI request monitoring: Enable this feature.
- Service model: Select Single-model service.
- Service list:
  - Service name: Select the Vertex AI service that you configured in the previous step.
  - Model name: Select pass-through or specify a model, such as gemini-2.0-flash.
Click OK.

4. Debug Model API

In the Actions column of the target Model API, click Test.
In the Test panel, enter the name of a Gemini-series model, such as gemini-3-flash-preview, in the Model name field. Then, you can use the Model response tab to chat with the large language model.

Important The Model response tab uses the /v1/chat/completions chat interface. To test other interfaces, you can use the cURL command or Raw output options with cURL or an SDK.

cURL example:

curl -X POST "http://your-gateway-domain/gemini/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [
      {"role": "user", "content": "Hello. Please introduce yourself."}
    ],
    "stream": false
  }'

Scenario 2: Connect using Vertex AI Express Mode

Vertex AI Express Mode is a quick-connect method provided by Google. You can use an API key to directly access Vertex AI without needing to set up a Service Account. This mode is suitable for rapid validation and lightweight use cases.

1. Get a Vertex AI Express Mode API key

For more information, see the Vertex AI in express mode documentation.

Note Express Mode API keys may have usage limits. For more information, see the Google documentation.

2. Create an AI service

Log on to the AI Gateway console.
In the navigation pane on the left, choose Instances. Then, select a region from the top navigation bar.
On the Instances page, click the ID of the target Instance.
In the navigation pane on the left, click Services, and then click the Services tab.
Click Create service. In the Create service panel, configure the AI service with the following parameters:

Configuration item	Description
Service source	Select AI service
Service name	Enter a service name, such as `vertex-express`
Large language model provider	Select Vertex AI
Authentication method	Select Vertex AI Express Mode
API key	Enter the API key you got from Google AI Studio.

Note The configuration for Express Mode is similar to that for other large language model providers, such as OpenAI and Claude. You only need to enter the API key.

3. Create a Model API

In the navigation pane on the left, click Model API, and then click Create Model API.
In the Create Model API panel, configure the following basic settings:
- Domain name: Configure a custom domain name. If you use the default environment domain, rate limiting may be triggered.
- Base Path: The base path for the API, such as /gemini.
- Model type: Select Text generation.
- Protocol: Select OpenAI-compatible.
- Routing configuration: Select the /v1/chat/completions route.
- AI request monitoring: Enable this feature.
- Service model: Select Single-model service.
- Service list:
  - Service name: Select the Vertex AI Express service that you configured in the previous step.
  - Model name: Select pass-through.
Click OK.

4. Test the Model API

In the Actions column of the target Model API, click Test.
In the Test panel, select a Gemini model to test the chat function.

Scenario 3: Connect using Vertex AI REST API

This scenario describes how to proxy the native REST API of Vertex AI through AI Gateway. This method applies to non-Gemini models, such as Imagen 4 and Veo 3. Unlike the first two scenarios, this mode uses the native protocol of Vertex AI, not the OpenAI-compatible protocol.

Use cases

Use image generation models, such as Imagen 4.
Use video generation models, such as Veo 3.
Use third-party models that are hosted in the Vertex AI Model Garden.
Call the native REST API of Vertex AI directly.

1. Create an AI service

The procedure for creating an AI service is similar to that in Scenario 1 or Scenario 2, but with the following key differences:

Log on to the AI Gateway console.
In the navigation pane on the left, choose Instances. Then, select a region from the top navigation bar.
On the Instances page, click the ID of the target Instance.
In the navigation pane on the left, click Services, and then click the Services tab.
Click Create service. In the Create service panel, configure the AI service with the following parameters:

Configuration item	Description
Service source	Select AI service
Service name	Enter a service name, such as `vertex-rest-api`
Large language model provider	Select Vertex AI
Authentication method	Select GCP Service Account or Vertex AI Express Mode (see Scenario 1 and Scenario 2).
Model protocol	Select Native Protocol

Important:

You must select Native Protocol. This means that you will use the native REST API format of Vertex AI.
After you select Native Protocol, the OpenAI protocol compatibility option does not appear because no protocol conversion occurs.

2. Create a Model API

In the navigation pane on the left, click Model API, and then click Create Model API.

In the Create Model API panel, configure the following basic settings:

Domain name: Configure a custom domain name. If you use the default environment domain, rate limiting may be triggered.
Base Path: The base path for the API, such as /vertex-api.
Use case: Select a use case as needed. This setting is used only for categorization in the console. Mixing categories has no effect.
- Text generation: For text models, such as Llama.
- Image generation: For image generation models, such as Imagen 4.
- Video generation: For video generation models, such as Veo 3.
Protocol: Select VertexAI.

Routing configuration: After you select the VertexAI protocol, the gateway automatically configures the following two built-in routes:

Route path	Purpose
`/{api-version}/publishers/{publisher}/models/{model}:{action}`	Path format used by Express Mode.
`/{api-version}/projects/{project}/locations/{location}/publishers/{publisher}/models/{model}:{action}`	Path format used by Service Account mode.

Service Model: Single Model Service.
Service list:
- Service name: Select the AI service that you created in the previous step, such as vertex-rest-api.
- Model name: Select pass-through.

Click OK.

3. Call examples

After the configuration is complete, you can make calls using the native REST API format of Vertex AI.

Generate an image using Imagen 4 (Express Mode):

curl -X POST "https://your-gateway-domain/vertex-api/v1/publishers/google/models/imagen-4.0-generate-preview-06-06:predict" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "instances": [
      {
        "prompt": "A cute orange cat napping in the sunshine."
      }
    ],
    "parameters": {
      "sampleCount": 1,
      "aspectRatio": "1:1"
    }
  }'

Generate a video using Veo 3 (Express Mode):

curl -X POST "https://your-gateway-domain/vertex-api/v1/publishers/google/models/veo-3.0-generate-preview:predictLongRunning" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "instances": [
      {
        "prompt": "Slow-motion video of an orange cat running on grass."
      }
    ],
    "parameters": {
      "aspectRatio": "16:9",
      "durationSeconds": 8
    }
  }'

Supported models

Model category	Example models	Publisher
Image generation	Imagen 4 (`imagen-4.0-generate-preview-06-06`)	google
Video generation	Veo 3 (`veo-3.0-generate-preview`)	google
Text generation	Llama 4 (`llama-4-maverick-17b-128e-instruct-maas`)	meta

Note For a complete list of supported models, see the Google documentation.

Scenario 4: Connect using Google GenAI SDK

Not recommended: In this mode, the gateway acts as a transparent proxy. It cannot use the authentication information that is configured in AI services. Functionality is limited. We recommend that you use Scenario 1, Scenario 2, or Scenario 3 for full feature support.

If you develop applications using the official google-genai SDK from Google, you can use this mode to proxy SDK requests through AI Gateway. This enables request monitoring and metering.

Important The GenAI SDK uses OAuth 2.0 for authentication. The client first contacts the Google authentication server and then calls the inference API. Therefore, the authentication information that is configured in AI services is not used. The gateway acts only as a transparent proxy for monitoring and metering inference API calls.

1. Create a DNS service

Because the SDK handles authentication, you must create a DNS-type service that points directly to the Vertex AI backend.

Log on to the AI Gateway console.
In the navigation pane on the left, choose Instances. Then, select a region from the top navigation bar.
On the Instances page, click the ID of the target Instance.
In the navigation pane on the left, click Services, and then click the Services tab.
Click Create service. In the Create service panel, configure the DNS service with the following parameters:

Configuration item	Description
Service source	Select DNS domain name
Service name	Enter a service name, such as `vertex-dns`
Service endpoint	Enter the Vertex AI API endpoint. Choose one of these formats: • `aiplatform.googleapis.com:443` (global endpoint) • `{location}-aiplatform.googleapis.com:443` (regional endpoint, such as `us-central1-aiplatform.googleapis.com:443`)
TLS mode	Select One-way
SNI	Defaults to the domain name entered. No manual change needed.

Note

Use the global endpoint aiplatform.googleapis.com:443 for Express Mode.
Use the regional endpoint {location}-aiplatform.googleapis.com:443 for Service Account mode. Fill in the region where your GCP project resides.

2. Create a Model API

In the navigation pane on the left, click Model API, and then click Create Model API.
In the Create Model API panel, configure the following basic settings:
- Domain name: Configure a custom domain name. If you use the default environment domain, rate limiting may be triggered.
- Base Path: The base path for the API, such as /vertex. This parameter is optional.
- Model type: Select Text generation or Image generation.
- Protocol: Select VertexAI.
- Routing configuration: After you select the VertexAI protocol, the gateway automatically configures the following two built-in routes:
  | Route path | Purpose |
  |------------------------------------------------------------------------------------------------|---------|
  | /{api-version}/publishers/{publisher}/models/{model}:{action} | Path format used by Express Mode |
  | /{api-version}/projects/{project}/locations/{location}/publishers/{publisher}/models/{model}:{action} | Path format used by Service Account mode |
- Service Model: Single model.
- Service list:
  - Service name: Select the DNS service that you created in the previous step, such as vertex-dns.
  - Model name: Select pass-through.
Click OK.

3. Configure the GenAI SDK

In your Python code, set the base_url of the GenAI SDK to the AI Gateway address:

from google import genai

# Configure API key (Express Mode) or use default credentials (Service Account)
API_KEY = "your-api-key"

# Gateway address = domain name + Base Path (if configured)
# Example: If domain name is your-gateway-domain.com and Base Path is /vertex
base_url = "https://your-gateway-domain.com/vertex"

# Create a client to proxy requests to AI Gateway
client = genai.Client(
    vertexai=True,  # Currently supports only vertexai=True
    api_key=API_KEY,
    http_options={
        'base_url': base_url,
    },
)

# Call the model
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Hello. Please introduce yourself."
)

print(response.text)

Configuration notes:

Parameter	Description
`vertexai`	Must be set to `True` to indicate Vertex AI backend usage
`api_key`	API key for Express Mode, or leave blank to use default Service Account credentials
`base_url`	The AI Gateway access address, formatted as `https://{domain-name}{Base Path}`

Using Service Account mode:

If you authenticate using a Service Account, you can configure credentials using environment variables:

# Set the path to the Service Account key file
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

from google import genai

# No api_key needed in Service Account mode
base_url = "https://your-gateway-domain.com/vertex"

client = genai.Client(
    vertexai=True,
    http_options={
        'base_url': base_url,
    },
)

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="Hello. Please introduce yourself."
)

print(response.text)

4. Verify the connection

After the configuration is complete, you can view request metrics in the AI Gateway console:

In the navigation pane on the left, click Observability & Analysis > Log Center.
Review the requests that are sent through the GenAI SDK, including the following information:
- Request path
- Response status code
- Response time
- Token usage (if included in the response)

Note Because authentication occurs on the client side, the gateway cannot proxy authentication. For full AI Gateway functionality, we recommend that you use Scenario 1, Scenario 2, or Scenario 3.

Scenario 5: Connect for multimodal understanding

This scenario describes how to use Gemini models through the Chat Completions API to understand and analyze multimodal content, such as images and videos. Multimodal understanding uses the text-based chat interface. Therefore, you can reuse the AI service and Model API that you created in Scenario 1 or Scenario 2.

Key configuration points

AI service: Select Vertex AI as the large language model provider. You can use either a GCP Service Account or Express Mode.
Model API: Select Text generation as the use case. Select OpenAI-compatible as the protocol. Select the /v1/chat/completions route.

Image understanding

Method 1: Use an image URL

from openai import OpenAI

client = OpenAI(
    base_url="http(s)://{your-domain}/{model-api-base-path}/v1"
)

completion = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "How do I solve this problem?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Method 2: Use Base64-encoded images

import base64
from openai import OpenAI

client = OpenAI(
    base_url="http(s)://{your-domain}/{model-api-base-path}/v1"
)

# Encode a local image as Base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Specify the image path
image_path = "<your-image-path>"
base64_image = encode_image(image_path)

completion = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Supported image formats: image/png, image/jpeg, image/webp, image/heic, and image/heif

For more information about model-specific capabilities, see the Google documentation.

Video understanding

Note: Although the OpenAI API specification defines only the image_url type, you can still send video content. AI Gateway handles compatibility conversion and correctly passes video content to Vertex AI.

Method 1: Use a video URL

from openai import OpenAI

client = OpenAI(
    base_url="http(s)://{your-domain}/{model-api-base-path}/v1"
)

completion = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Briefly describe the video content."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Method 2: Use Base64-encoded videos

import base64
from openai import OpenAI

client = OpenAI(
    base_url="http(s)://{your-domain}/{model-api-base-path}/v1"
)

# Encode a local video as Base64
def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode("utf-8")

# Specify the video path
video_path = "example.mp4"
base64_video = encode_video(video_path)

completion = client.chat.completions.create(
    model="gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Briefly describe the video content."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:video/mp4;base64,{base64_video}",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Supported video formats: video/x-flv, video/quicktime, video/mpeg, video/mpegs, video/mpg, video/mp4, video/webm, video/wmv, and video/3gpp

For more information about model-specific capabilities, see the Google documentation.

Code Description

Parameter	Description
`model`	A Gemini model that supports multimodal understanding, such as `gemini-3-flash-preview` or `gemini-2.0-flash`.
`content`	An array of message content, containing both text and media (images or videos).
`type: "text"`	Text prompt describing the analysis you want the model to perform on the media.
`type: "image_url"`	Media content. Supports URLs or Base64-encoded images or videos.

Scenario 6: Connect for image generation

This scenario describes how to use Vertex AI models, such as Gemini Nano Banana Pro (model ID: gemini-3-pro-image-preview), to generate, edit, and create variations of images using the OpenAI-compatible protocol.

Key configuration points

The configuration is similar to that in Scenario 1 or Scenario 2, but with the following key differences:

Configuration item	Image generation configuration
AI service – Large language model provider	Select Vertex AI
AI service – Authentication method	Either GCP Service Account or Express Mode works.
AI service – OpenAI protocol compatibility mode (GCP Service Account only)	Select AI Gateway conversion
Model API – Use case	Select Image generation
Model API – Protocol	Select OpenAI-compatible
Model API – Routing configuration	Select `/v1/images/generations`, `/v1/images/edits`, or `/v1/images/variations` as needed.

Important: When you generate, edit, or create variations of images using the GCP Service Account mode, you must select AI Gateway conversion for the OpenAI protocol compatibility mode. The native compatible endpoints of Vertex AI support only the Chat Completions interface. The protocol conversion of AI Gateway supports image-related interfaces.

Interface support matrix

Interface	Recommended OpenAI SDK call	`image_url` (non-file upload) support
`/v1/images/generations`	`client.images.generate()` (JSON)	Not applicable (this interface usually does not require `image_url`)
`/v1/images/edits`	`client.images.edit(image=<file>)` (multipart)	The OpenAI SDK does not support the `image_url` syntax. To use `image_url`, use an HTTP JSON request (the SDK's multipart file upload works normally).
`/v1/images/variations`	`client.images.create_variation(image=<file>)` (multipart)	The OpenAI SDK does not support the `image_url` parameter. To use `image_url`, send an HTTP JSON request. (Multipart file uploads using the SDK can be used normally.)

Image generation (`/v1/images/generations`)

After the configuration is complete, you can call the image generation interface using the OpenAI Python SDK:

from openai import OpenAI
from PIL import Image
from io import BytesIO
import base64

# Create a client pointing to the AI Gateway address
client = OpenAI(
    base_url="http(s)://{your-domain}/{model-api-base-path}/v1"
)

# Call the image generation interface
response = client.images.generate(
    model="gemini-3-pro-image-preview",  # Gemini Nano Banana Pro model
    prompt="A cute orange cat napping in the sunshine.",
    size="1024x1024",
    n=1
)

# Get the generated image (Base64 encoded)
image_data = response.data[0].b64_json
if image_data:
    # Decode and save the image locally
    image = Image.open(BytesIO(base64.b64decode(image_data)))
    image.save("output_folder/example-image-cat.png")
    print("Image saved to output_folder/example-image-cat.png")

Code notes:

Parameter	Description
`base_url`	The AI Gateway access address, formatted as `http(s)://{domain-name}/{Base Path}`
`model`	The model ID. Gemini Nano Banana Pro’s model ID is `gemini-3-pro-image-preview`.
`prompt`	The prompt for image generation. Describes the image content you want to generate.
`size`	The size of the generated image, such as `1024x1024` (see size mapping table below).
`n`	The number of images to generate.

Size parameter mapping:

The OpenAI API uses resolution, such as 1024x1024, while Vertex AI uses aspect ratio (aspectRatio) and image size level (imageSize). Because of this difference, AI Gateway automatically performs parameter conversion.

Aspect ratios supported by Vertex AI: 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, and 21:9

Resolution levels supported by Vertex AI: 1k, 2k, and 4k

OpenAI size parameter	Vertex AI aspectRatio	Vertex AI imageSize
`256x256`	1:1	1k
`512x512`	1:1	1k
`1024x1024`	1:1	1k
`1792x1024`	16:9	2k
`1024x1792`	9:16	2k
`2048x2048`	1:1	2k
`4096x4096`	1:1	4k
`1536x1024`	3:2	2k
`1024x1536`	2:3	2k
`1024x768`	4:3	1k
`768x1024`	3:4	1k
`1280x1024`	5:4	1k
`1024x1280`	4:5	1k
`2560x1080`	21:9	2k

cURL example:

curl -X POST "https://your-gateway-domain/gemini-image/v1/images/generations" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-pro-image-preview",
    "prompt": "A cute orange cat napping in the sunshine.",
    "size": "1024x1024",
    "n": 1
  }'

Image editing (`/v1/images/edits`)

After the configuration is complete, you can use the OpenAI Python SDK to upload the original image and edit it using multipart:

from openai import OpenAI
from PIL import Image
from io import BytesIO
import base64

client = OpenAI(
    base_url="http(s)://{your-domain}/{model-api-base-path}/v1",
    api_key="YOUR_API_KEY"
)

# Upload the original image using the file upload method (recommended by OpenAI SDK)
with open("assets/image-cat.png", "rb") as image_file:
    response = client.images.edit(
        model="gemini-3-pro-image-preview",
        prompt="Change the cat breed to Ragdoll while keeping the same pose.",
        image=image_file,
        size="1024x1024",
        n=1
    )

image_data = response.data[0].b64_json
if image_data:
    image = Image.open(BytesIO(base64.b64decode(image_data)))
    image.save("output_folder/edited-image-cat.png")
    print("Image editing completed. Saved to output_folder/edited-image-cat.png")

Pass `image_url` using HTTP requests (SDK does not support this syntax)

The /v1/images/edits interface supports data URLs in HTTP JSON requests. The following example shows how to convert a local image to Base64 and call the interface directly using curl:

# 1) Convert a local image to Base64 (remove line breaks)
BASE64_IMAGE=$(base64 < assets/image-cat.png | tr -d '\n')

# 2) Call the edits interface using image_url (data URL)
curl -X POST "https://your-gateway-domain/{model-api-base-path}/v1/images/edits" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d "{
    \"model\": \"gemini-3-pro-image-preview\",
    \"prompt\": \"Change the cat breed to Ragdoll while keeping the same pose.\",
    \"image_url\": {
      \"url\": \"data:image/png;base64,${BASE64_IMAGE}\"
    },
    \"size\": \"1024x1024\",
    \"n\": 1
  }"

Image variations (`/v1/images/variations`)

After the configuration is complete, you can use the OpenAI Python SDK to generate image variations using multipart:

from openai import OpenAI
from PIL import Image
from io import BytesIO
import base64

client = OpenAI(
    base_url="http(s)://{your-domain}/{model-api-base-path}/v1",
    api_key="YOUR_API_KEY"
)

with open("assets/image-cat.png", "rb") as image_file:
    response = client.images.create_variation(
        model="gemini-3-pro-image-preview",
        image=image_file,
        size="1792x1024",
        n=1
    )

image_data = response.data[0].b64_json
if image_data:
    image = Image.open(BytesIO(base64.b64decode(image_data)))
    image.save("output_folder/variation-image-cat.png")
    print("Image variation completed. Saved to output_folder/variation-image-cat.png")

If you prefer to use image_url (URL or data URL) instead of file uploads for variations, you can use HTTP JSON requests directly:

# 1) Convert a local image to Base64 (remove line breaks)
BASE64_IMAGE=$(base64 < assets/image-cat.png | tr -d '\n')

# 2) Call the variations interface using image_url (data URL)
curl -X POST "https://your-gateway-domain/{model-api-base-path}/v1/images/variations" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d "{
    \"model\": \"gemini-3-pro-image-preview\",
    \"image_url\": {
      \"url\": \"data:image/png;base64,${BASE64_IMAGE}\"
    },
    \"size\": \"1792x1024\",
    \"n\": 1
  }"

Supported image-generation models

Model name	Model ID	Description
Gemini Nano Banana Pro	`gemini-3-pro-image-preview`	Google’s latest image-generation model. Supports high-quality image generation.
Gemini Nano Banana	`gemini-2.5-flash-image`	Google’s previous-generation image-generation model. Supports high-quality image generation.

Note For a complete list of supported image generation models, see the Google documentation.

Advanced configuration

Multi-provider high availability

To improve availability, you can configure Vertex AI and other providers in a fallback relationship:

Create multiple AI services, for example, Vertex AI and Alibaba Cloud Model Studio.
When you create a Model API, select Multi-model service (by model name).
Configure routing rules:
- Service name: vertex-ai. Model name pattern: gemini*
- Service name: bailian. Model name pattern: qwen*
Enable Fallback and select a backup service.

Security best practices

Credential management: Store Service Account keys or API keys in Alibaba Cloud KMS instead of in plain text to reduce the risk of credential exposure.
Access control: Configure IP address whitelists or other access control policies to restrict the sources of API access.
Rate limiting: Set reasonable rate limiting rules to prevent unexpectedly high costs.

FAQ

401 or 403 authentication failure

Check the integrity of the Service Account JSON key. Make sure that it includes required fields, such as private_key, client_email, and token_uri.
Confirm that the Service Account has the Vertex AI User role or a higher-level role.
For Express Mode, confirm that the API key is valid and has not expired.

404 model not found

Confirm that the model name is correct, such as gemini-3-flash-preview or gemini-3-pro-preview.
Check whether the model is supported in the selected region.

JSON parsing failure

Make sure that you pasted the complete content of the Service Account JSON file. Do not omit the opening { or closing }.
You can verify the JSON format using an online JSON validator.

Request timeout

Check whether the VPC is configured with a public NAT Gateway.
Confirm the network connectivity to the target region.

Supported models

Note For a complete list of supported models, see the Google Vertex AI documentation.

Summary

With Alibaba Cloud AI Gateway, you can easily connect to Google Vertex AI and benefit from unified API management, flexible authentication, and enterprise-grade high availability:

GCP Service Account mode: This mode is best for enterprise applications that require full Vertex AI features. It supports OpenAI protocol conversion.
Vertex AI Express Mode: This mode provides a quick connection using an API key. It features a simple configuration and is ideal for rapid validation.
Vertex AI REST API mode: This mode proxies the native REST API of Vertex AI. It works with all models and is commonly used for non-Gemini models, such as Imagen, Veo, and Claude.
Multimodal understanding: You can use Gemini models through the Chat Completions API to understand and analyze multimodal content, such as images and videos.
Image generation, editing, and variations: You can use models, such as Gemini Nano Banana Pro. The protocol conversion of AI Gateway enables OpenAI compatibility.

Regardless of the connection method you choose, AI Gateway provides reliable proxy services and rich observability features.

Background

Prerequisites

Scenarios overview

Scenario 1: Connect using GCP Service Account

1. Prepare a GCP Service Account

2. Create an AI service

3. Create a Model API

4. Debug Model API

Scenario 2: Connect using Vertex AI Express Mode

1. Get a Vertex AI Express Mode API key

2. Create an AI service

3. Create a Model API

4. Test the Model API

Scenario 3: Connect using Vertex AI REST API

Use cases

1. Create an AI service

2. Create a Model API

3. Call examples

Supported models

Scenario 4: Connect using Google GenAI SDK

1. Create a DNS service

2. Create a Model API

3. Configure the GenAI SDK

4. Verify the connection

Scenario 5: Connect for multimodal understanding

Key configuration points

Image understanding

Video understanding

Code Description

Scenario 6: Connect for image generation

Key configuration points

Interface support matrix

Image generation (/v1/images/generations)

Image editing (/v1/images/edits)

Pass image_url using HTTP requests (SDK does not support this syntax)

Image variations (/v1/images/variations)

Supported image-generation models

Advanced configuration

Multi-provider high availability

Security best practices

FAQ

401 or 403 authentication failure

404 model not found

JSON parsing failure

Request timeout

Supported models

Summary

Image generation (`/v1/images/generations`)

Image editing (`/v1/images/edits`)

Pass `image_url` using HTTP requests (SDK does not support this syntax)

Image variations (`/v1/images/variations`)