Use a data cache to deploy Qwen in console mode or API mode - Elastic Container Instance

This topic describes how to use a data cache to deploy the Qwen-7B-Chat application that is provided by the ModelScope community. Before you deploy the Qwen-7B-Chat application, you can store the Qwen-7B-Chat model data to a data cache. When you create the elastic container instance that corresponds to the Qwen-7B-Chat application, you can mount the model data to the instance. This way, the system does not need to pull the model data from the elastic container instance and can deploy the application faster.

Background information

Tongyi Qianwen-7b (Qwen-7B) is a 7 billion-parameter model of the Tongyi Qianwen foundation model series that is developed by Alibaba Cloud. Qwen-7B is a large language model (LLM) that is based on Transformer and trained on ultra-large-scale pre-training data. Qwen-7B-Chat is an LLM AI assistant that is developed based on Qwen-7B by using the alignment mechanism.

Prerequisites

The virtual private cloud (VPC) that you use is associated with an Internet NAT gateway. An SNAT entry is configured for the NAT gateway to allow resources in the VPC or resources connected to vSwitches in the VPC to access the Internet.

Note

If the VPC is not associated with an Internet NAT gateway, you must associate an elastic IP address (EIP) with the VPC when you create the DataCache and deploy the application. This way, you can pull data from the Internet.

Prepare a runtime environment

Hardware requirements
The GPU-accelerated Elastic Compute Service (ECS) instance family that is used to create the Elastic Container Instance-based pod must meet the following conditions. For information about the GPU-accelerated ECS instance families that can be used to create pods, see Supported instance families.
- CPU: no strict limits
- Memory size: greater than 16 GiB
- Number of GPUs: 1 or more
- Size of the GPU memory: greater than 16 GB. If the GPU memory size is less than 16 GB, an out-of-memory (OOM) error may occur.
Software requirements
Qwen-7B-Chat depends on a large number of libraries and configurations. Elastic Container Instance provides a public container image that contains a Gradio-based Qwen WebUI. You can directly use the public container image or use it as a base image for secondary development. The image address is registry.cn-hangzhou.aliyuncs.com/eci_open/qwen-webui:1.0.0, and the image size is about 15 GB.

Procedure

Create a data cache

Console mode

Access ModelScope and obtain the ID of the Qwen-7B-Chat model.
In this topic, the version of Qwen-7B-Chat is v1.1.4. Find the model that you want to use in ModelScope, and then copy the model ID at the top of the model details page.
Log on to the Elastic Container Instance console.
In the top navigation bar, select a region.
In the left-side navigation pane, click Data Cache.
Create a data cache for the Qwen-7B-Chat model.
1. Click Create Data Cache.
2. Configure the parameters that are used to create the data cache.
  The following table describes the sample parameters. You must use the specified values for the Cache Data Source parameter that are fixed configurations for pulling the Qwen-7B-Chat model. You can specify other parameters based on your business requirements. For more information, see Create a data cache.
  Parameter
  Example
  Cache Bucket
  test
  Cache Directory
  /model/qwen/
  Cache Name
  qwen
  Cache Size
  20 GiB
  Cache Data Source
  Type: URL
  Parameters
  repoSource: ModelScope/Model
  repoId: qwen/Qwen-7B-Chat
  revision: v1.1.4
3. Click OK.
View the status of the data cache.
On the Data Cache page, refresh the page to view the status of the data cache. If the status becomes Available, the data cache is ready for use.

API mode

Access ModelScope and obtain the ID of the Qwen-7B-Chat model.
In this topic, the version of Qwen-7B-Chat is v1.1.4. Find the model that you want to use in ModelScope, and then copy the model ID at the top of the model details page.
Create a data cache for the Qwen-7B-Chat model.
The following code shows the sample parameters that are used if you call the CreateDataCache API operation to create a data cache. The system pulls the model data from ModelScope and saves the model data to the /model/qwen/ directory of the bucket named test. The data cache is named qwen and is retained for one day.
Important
If you use an SDK to create a DataCache, you do not need to prefix the length of the parameter name to each parameter in DataSource.Options. For example, write repoSource instead of #10#repoSource. Write repoId instead of #6#repoId.
```
{
  "RegionId": "cn-beijing",
  "SecurityGroupId": "sg-2ze7l1o0ql1cbk******",
  "VSwitchId": "vsw-2ze23nqzig8inpr******",
  "Bucket": "test",
  "Path": "/model/qwen/",
  "Name": "qwen",
  "DataSource": {
    "Type": "URL",
    "Options": {
      "#6#repoId": "qwen/Qwen-7B-Chat",
      "#10#repoSource": "ModelScope/Model"
      "#8#revision": "v1.1.4"
    }
  },
  "RetentionDays": 1
}
```
Query the status of the data cache.
Use the returned DataCache ID to call the DescribeDataCaches API operation and query the information of the DataCache. If the status of the DataCache that is indicated by DataCaches.Status is Available, the DataCache is ready for use.

Deploy Qwen-7B-Chat

Console mode

In the left-side navigation pane of the Elastic Container Instance console, click Container Group. On the Container Group page, click Create Container Group.

Configure the parameters that are used to create the container group (elastic container instance) and then click Confirm Configuration.

The following table describes the sample parameters that are used to create the elastic container instance. The instance is created based on a GPU-accelerated ECS instance type and is mounted with a Qwen-7B-Chat model. The container in the elastic container instance uses the image that contains the Qwen WebUI application. After the container is started, the container runs the python Qwen-7B/web_demo.py --server_port 8888 command to start the Qwen WebUI.

Important

If the VPC to which the elastic container instance belongs is associated with an Internet NAT gateway, you may not associate an EIP with the elastic container instance when you create the instance. After the instance is created, you can configure DNAT entries to allow external access to the instance.

Section	Parameter	Example
Container Group Configurations	Specify Instance Type	ecs.gn6i-c16g1.4xlarge
Container Group Configurations	Name	qwen-web
Container Configurations	Container Name	qwen
	Image	Image: registry.cn-hangzhou.aliyuncs.com/eci_open/qwen-webui Image Tag: 1.0.0
	Executable Command	/bin/sh
	Parameter	-c python Qwen-7B/web_demo.py --server_port 8888
Data Cache	Data Cache Bucket	test
	Click Add to mount the data cache of the Qwen-7B-Chat model.	Data Cache Directory: /model/qwen/ Destination Container: qwen Container Mount Directory :/data/model/
	Enable Burst	Select Enable Burst.
EIP	EIP	Auto Create Maximum Bandwidth: 5 Mbit/s

Check the configurations of the instance, read and select the Terms of Service, and then click Confirm Order.
Return to the Container Group page, check whether the Qwen-7B-Chat application is deployed, and view the EIP of the instance.
On the Container Group page, you can view the status of the elastic container instance. You can click the instance ID to go to the instance details page and view the status of the container. If the status of the instance and the status of the container are Running, the application is deployed. You can obtain the EIP of the instance in the IP Address column.

API mode

Use the data cache to create an elastic container instance and deploy the Qwen-7B-Chat application.
The following code shows the parameters that are used if you call the CreateContainerGroup API operation to create the elastic container instance. The instance uses a GPU-accelerated ECS instance type and is mounted with the Qwen-7B-Chat model. The container in the elastic container instance uses the image that contains the Qwen WebUI application. After the container is started, the container runs the python Qwen-7B/web_demo.py --server_port 8888 command to start the Qwen WebUI.
Note
In the following example, the system automatically creates an EIP and associates the EIP with the elastic container instance. If the VPC to which the elastic container instance belongs is associated with an Internet NAT gateway, you may not associate an EIP with the elastic container instance when you create the instance. After the instance is created, you can configure DNAT entries to allow external access to the instance.
```
{
  "RegionId": "cn-beijing",
  "SecurityGroupId": "sg-2ze7l1o0ql1cbk******",
  "VSwitchId": "vsw-2ze23nqzig8inpr******",
  "ContainerGroupName": "qwen-web",
  "InstanceType": "ecs.gn6i-c16g1.4xlarge",
  "DataCacheBucket": "test",
  "Container": [
    {
      "Arg": [
        "-c",
        "python Qwen-7B/web_demo.py --server_port 8888"
      ],
      "Command": [
        "/bin/sh"
      ],
      "Gpu": 1,
      "Name": "qwen",
      "Image": "registry.cn-hangzhou.aliyuncs.com/eci_open/qwen-webui:1.0.0",
      "VolumeMount": [
        {
          "Name": "model-qwen",
          "MountPath": "/data/model"
        }
      ]
    }
  ],
  "Volume": [
    {
      "Type": "HostPathVolume",
      "HostPathVolume.Path": "/model/qwen/",
      "Name": "model-qwen"
    }
  ],
  "DataCacheProvisionedIops": 35000,
  "DataCacheBurstingEnabled": true,
  "AutoCreateEip": true
}
```
Check whether the application is deployed.
Use the returned instance ID to call the DescribeContainerGroupStatus API operation and query the status of the instance and the container. If the status of the instance that is indicated by Status and the status of the container that is indicated by ContainerStatuses.State are Running, the instance is created and the container is running.
Query the EIP of the elastic container instance.
Use the returned instance ID to call the DescribeContainerGroups API operation and query the instance details. You can obtain the EIP of the instance from the InternetIP parameter.

Test the model

Add an inbound rule to the security group to which the elastic container instance belongs and allow port 8888.
Use a browser to visit the web page of Qwen.
In this example, an EIP is associated with the Qwen-7B-Chat application. You can enter the EIP of the elastic container instance and the allowed port of the container to access the application. Example: 123.57.XX.XX:8888.
Enter text to test the effect of the Qwen-7B-Chat model.
Example:

Parameter	Example
Cache Bucket	test
Cache Directory	/model/qwen/
Cache Name	qwen
Cache Size	20 GiB
Cache Data Source	Type: URL Parameters repoSource: ModelScope/Model repoId: qwen/Qwen-7B-Chat revision: v1.1.4