All Products
Search
Document Center

:Use AI Gateway to access a deployed model in PAI

Last Updated:Mar 25, 2025

This topic describes how to use the AI Gateway feature of Cloud-native API Gateway to access a model deployed in Alibaba Cloud Platform for AI (PAI).

Prerequisites

  • PAI is activated and necessary permissions are granted. For more information, see Preparations.

  • A Cloud-native API Gateway instance is created. For more information, see Create a gateway instance.

Important

If you use a private endpoint, make sure that your instance resides in the same virtual private cloud (VPC) as the PAI model to be accessed.

Use PAI to deploy a DeepSeek-R1 model

Deploy the model

  1. Go to the Model Gallery page.

    1. Log on to the PAI console. Select a region in the upper left corner.

    2. From the left-side navigation pane, choose Workspaces and click the name of the desired workspace.

    3. In the left-side navigation pane, choose QuickStart > Model Gallery.

  2. Choose a model.

    On the Model Gallery page, find the model you want to deploy and click to enter the model details page.

    For example, consider DeepSeek-R1-Distill-Qwen-7B, a distilled model that is smaller in size, making it ideal for quick practice. It has low computational resource requirements and can be deployed using free trial resources.

    You can find more DeepSeek models in the model list, where you can also learn about their deployment methods and token limits.

  3. Configure deployment parameters.

    Click Deploy in the upper right corner. The system provides default deployment parameters, which you can modify as needed. After confirming all settings, click Deploy and wait for the deployment to complete.

    Important

    If deploying with public resources, billing starts after the service enters the Running state. Fees will be incurred even if the service is not actually called. Stop unused model services in time to avoid additional expenses.

    Deployment Method: We recommend SGLang or vLLM accelerated deployment (fully compatible with Open API standards and mainstream AI applications). For more information, see the Deployment methods.

    Resource Deployment: The default settings use public resources and recommended specifications.

    • When deploying with public resources, the system automatically filters out the specifications available for the model. If the inventory is insufficient, consider switching to another region. For deploying DeepSeek-R1 or DeepSeek-V3, you can select based on resources for full-version models.

    • When deploying with resource quotas, you must select deployment method according to the node type. For GP7V type, select Single-Node-GP7V under SGLang Accelerate Deployment. Otherwise, deployment will fail.

    部署页面

  4. View more information.

    1. Go to Model Gallery > Job Management > Deployment Jobs.

    2. Click the name of the deployed service.

    3. View the deployment progress and call information.

    4. You can also click More Infor in the upper right corner to jump to the service details page in Elastic Algorithm Service (EAS) of PAI.

      View more information

Obtain call information

After the model is deployed, you can perform the following steps to view the information that is required to create an AI service to invoke Elastic Algorithm Service (EAS).

  1. Choose Model Gallery > Job Management > Deployment Jobs and click the name of the deployed service to go to the service details page.

  2. Click View Call Information. On the VPC Endpoint tab of the Invocation Method dialog box, obtain the endpoint and token.

    Note

    By default, a VPC endpoint is recommended. For more information, see Create an AI service.

    image

Create and configure an AI gateway

1. Create an AI service

  1. Log on to the Cloud-native API Gateway console.

  2. In the left-side navigation pane, click Instance. In the top navigation bar, select a region.

  3. On the Instance page, click the name of the gateway instance that you want to manage.

  4. In the left-side navigation tree, click Service. Then, click the Services tab.

  5. Click Create Service. In the Create Service panel, configure the following parameters:

    • Service Source: AI Services.

    • Large Model Supplier: Compatibility with OpenAI.

    • Service URL: Enter the endpoint obtained in Obtain call information and add /v1 to the end.

    • API-KEY: Enter the token obtained in Obtain call information.

    image

2. Create and publish an AI API

  1. Log on to the Cloud-native API Gateway console. In the left-side navigation pane, choose API.

  2. Click the AI API tab and click Create AI API.

    image

  3. In the Create AI API panel, configure the following parameters:

    • Instance: Select your Cloud-native API Gateway instance.

    • Services: Select the AI service that you created in Create an AI service.

    image

  4. Click OK.

3. Debug the AI API

  1. Find the created AI API and click Debugging on the AI API tab.

  2. In the Debugging panel, select the deployed model from the Select Model drop-down list. In the Model Returned tab on the right, interact with the Large Language Model (LLM).

    Important

    The Model Returned tab uses the /v1/chat/completions dialog interface. If you want to use another interface, you can run cURL commands or use an SDK to debug the API on the cURL Command or Row Output tab.

    image

  3. [Example] Perform the following steps to run a cURL command to call completions:

    image

    1. On the cURL Command tab, copy the sample code provided by Cloud-native API Gateway.

    2. Replace the url interface provided in the sample code with /v1/completions.

    3. Change data(body) provided in the sample code to the format required by /v1/completions.

    image