Community Blog AI Container Image Deployment: Qwen-VL-Chat

AI Container Image Deployment: Qwen-VL-Chat

This article introduces how to quickly build a personal AI vision assistant service based on Alibaba Cloud AMD servers and OpenAnolis AI container service.

By Alibaba Cloud ECS Team


Qwen-VL is a large-scale vision language model developed by Alibaba Cloud. It takes images, text, and detection boxes as input, and generates text and detection boxes as output. Building upon Qwen-VL, an AI vision assistant called Qwen-VL-Chat has been developed using alignment mechanisms based on the large language model. This vision assistant supports flexible interaction methods, including multi-picture, multi-round question and answer, and content creation capabilities. It naturally supports multi-language dialogues such as English and Chinese, multi-picture input and comparison, designated picture question and answer, and multi-picture literature creation.

This article introduces how to quickly build a personal AI vision assistant service based on Alibaba Cloud AMD servers and OpenAnolis AI container service.

Create an ECS Instance

When you create an ECS instance, you must select an instance type based on the size of the model. The inference process of the entire model consumes a large number of computing resources, and the run-time memory occupies a large amount of memory. To ensure the stability of the model, select an ecs.g8a.4xlarge instance type. In addition, multiple model files need to be downloaded to run the Qwen-VL-Chat, which can occupy a large amount of storage. When creating an instance, at least 100 GB of storage disk should be allocated. Finally, to guarantee the speed of environment installation and model download, the instance bandwidth is allocated 100 Mbit/s.

Alibaba Cloud Linux 3.2104 LTS 64-bit is chosen for the instance operating system.

Create a Docker Runtime Environment

Install Docker

For more information about how to install Docker on Alibaba Cloud Linux 3, see Install and use Docker (Linux). After the installation is completed, make sure that the Docker daemon has been enabled.

systemctl status docker

Create and Run a PyTorch AI Container

The OpenAnolis community provides a variety of container images based on Anolis OS, including AMD-optimized PyTorch images. You can use these images to create a PyTorch runtime environment.

docker pull registry.openanolis.cn/openanolis/pytorch-amd:1.13.1-23-zendnn4.1
docker run -d -it --name pytorch-amd --net host -v $HOME:/root registry.openanolis.cn/openanolis/pytorch-amd:1.13.1-23-zendnn4.1

The above command first pulls the container image, then uses the image to create a container named pytorch-amd that runs in independent mode and maps the user's home directory to the container to preserve the development content.

Manual Deployment Procedure

Enter the Container Environment

After the PyTorch container is created and run, run the following command to access the container environment:

docker exec -it -w /root pytorch-amd /bin/bash

You must run subsequent commands in the container environment. If you exit unexpectedly, re-enter the container environment. To check whether the current environment is a container, you can use the following command to query.

cat /proc/1/cgroup | grep docker
# A command output indicates that it is the container environment

Software Installation Configuration

Before deploying the Qwen-VL-Chat, you need to install some required software.

yum install -y git git-lfs wget gperftools-libs anolis-epao-release

The subsequent download of the pre-trained model requires support for Git LFS to be enabled.

git lfs install

Download the Source Code and Pre-trained Models

Download the GitHub project source code and the pre-trained model.

git clone https://github.com/QwenLM/Qwen-VL.git
git clone https://www.modelscope.cn/qwen/Qwen-VL-Chat.git qwen-vl-chat

Deploy the Runtime Environment

Before deploying the Python environment, you can change the pip download source to speed up the download of the dependency package.

mkdir -p ~/.config/pip && cat > ~/.config/pip/pip.conf <<EOF

Install Python runtime dependencies.

yum install -y python3-transformers python-einops
pip install tiktoken transformers_stream_generator accelerate gradio

To ensure that ZenDNN can fully release CPU computing power, two environment variables need to be set: OMP_NUM_THREADS and GOMP_CPU_AFFINITY.

cat > /etc/profile.d/env.sh <<EOF
export OMP_NUM_THREADS=\$(nproc --all)
export GOMP_CPU_AFFINITY=0-\$(( \$(nproc --all) - 1 ))
source /etc/profile

Run the Web Demo

A web demo is provided in the project source code, which can be used to interact with Qwen-VL-Chat.

cd ~/Qwen-VL
export LD_PRELOAD=/usr/lib64/libtcmalloc.so.4
python3 web_demo_mm.py -c=${HOME}/qwen-vl-chat/ --cpu-only --server-name= --server-port=7860

After the service is deployed, you can go to http://<ECS public IP address>:7860 to access the service.

0 1 0
Share on

Alibaba Cloud Community

893 posts | 201 followers

You may also like


Alibaba Cloud Community

893 posts | 201 followers

Related Products