Deploy Qwen-7B-Chat on an NVIDIA GPU - Alibaba Cloud Linux

Deploy the Qwen-7B-Chat AI container image on an NVIDIA GPU using Alibaba Cloud AI Containers (AC2).

Background

Qwen-7B is a 7-billion-parameter model in the Qwen large language model (LLM) series developed by Alibaba Cloud. It is a Transformer-based LLM that is pre-trained on a massive dataset containing a wide range of data, including web text, professional books, and code. Qwen-7B-Chat is an AI assistant created by applying alignment techniques to the base Qwen-7B model.

Important

The code for Qwen-7B-Chat is open-sourced under the LICENSE. To use it for commercial purposes free of charge, you must submit a commercial license application. You must comply with the user agreements, usage specifications, and all applicable laws and regulations for any third-party models you use. You are solely responsible for ensuring the legality and compliance of your use of these models.

Step 1: Create an ECS instance

Go to the instance buy page.
Configure the parameters to create an ECS instance.
Set the following key parameters. Other parameters are described in Custom launch.
- Instance: Qwen-7B-Chat requires more than 16 GiB of GPU memory. Select at least ecs.gn6i-c4g1.xlarge.
- Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.
- Public IP Address: Select Assign Public IPv4 Address. Set Bandwidth Billing Method to Pay-by-traffic and Maximum Bandwidth to 100 Mbps to accelerate model downloads.
- Data Disk: Qwen-7B-Chat model files require significant storage. Set the data disk size to at least 100 GiB.

Step 2: Set up the Docker environment

Install Docker.
Install Docker on Alibaba Cloud Linux 3 by following Install and use Docker and Docker Compose.
Check the Docker daemon status.
```
sudo systemctl status docker
```

Install the NVIDIA driver and CUDA components.

sudo dnf install -y anolis-epao-release
sudo dnf install -y kernel-devel-$(uname -r) nvidia-driver{,-cuda}

Install the NVIDIA Container Toolkit.

sudo dnf install -y nvidia-container-toolkit

The NVIDIA Container Toolkit adds a prestart hook that exposes GPUs to containers. Restart Docker to apply the changes.
```
sudo systemctl restart docker
```
After the restart, use the --gpus <gpu-request> parameter when creating containers to specify GPU passthrough.
Create and run a PyTorch AI container.
AC2 provides container images for AI scenarios. Use the following image to create a PyTorch runtime environment.
```
sudo docker pull ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.2.0.1-3.2304-cu121
sudo docker run -itd --name pytorch --gpus all --net host -v $HOME/workspace:/workspace   ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.2.0.1-3.2304-cu121
```
These commands pull the image and create a detached container named pytorch with your home directory mounted into the container.

Step 3: Manually deploy Qwen-7B-Chat

Start a shell in the container.
```
sudo docker exec -it -w /workspace pytorch /bin/bash
```
Run all subsequent commands inside the container. If you exit, re-enter with the same command. To verify you are in the container, run cat /proc/1/cgroup | grep docker.
Install required software.
```
yum install -y git git-lfs wget tmux
```
Enable Git LFS.
The pretrained model download requires Git LFS.
```
git lfs install
```
Download the source code and model.
1. Create a new tmux session.
```
tmux
```
  The model download may take a long time. Use tmux so you can resume with tmux attach if the connection drops.
2. Download the Qwen-7B source code and pretrained model.
```
git clone https://github.com/QwenLM/Qwen.git
git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git qwen-7b-chat --depth=1
```

Set up the runtime environment.

AC2 containers include prepackaged Python AI dependencies. Install additional dependencies with yum or dnf.

dnf install -y python-einops     python3-datasets     python3-gradio     python3-mdtex2html     python3-protobuf     python3-psutil     python3-pyyaml     python3-rich     python3-scikit-learn     python3-scipy     python3-sentencepiece     python3-tensorboard     python3-tiktoken     python3-transformers     python3-transformers-stream-generator     yum-utils

Some dependencies must be installed manually to avoid overwriting AC2 image components.

yumdownloader --destdir ./rpmpkgs python3-timm python3-accelerate
rpm -ivh --nodeps rpmpkgs/*.rpm && rm -rf rpmpkgs

Start the AI chatbot.
1. Start the chatbot.
```
cd /workspace/Qwen
python3 cli_demo.py -c ../qwen-7b-chat
```
  After startup, enter text at the User: prompt to interact with Qwen-7B-Chat.
  Note
  Enter the :exit command to exit the chatbot.