Deploy the Qwen-7B-Chat AI container image on an NVIDIA GPU using Alibaba Cloud AI Containers (AC2).
Background
Qwen-7B is a 7-billion-parameter model in the Qwen large language model (LLM) series developed by Alibaba Cloud. It is a Transformer-based LLM that is pre-trained on a massive dataset containing a wide range of data, including web text, professional books, and code. Qwen-7B-Chat is an AI assistant created by applying alignment techniques to the base Qwen-7B model.
The code for Qwen-7B-Chat is open-sourced under the LICENSE. To use it for commercial purposes free of charge, you must submit a commercial license application. You must comply with the user agreements, usage specifications, and all applicable laws and regulations for any third-party models you use. You are solely responsible for ensuring the legality and compliance of your use of these models.
Step 1: Create an ECS instance
Go to the instance buy page.
Configure the parameters to create an ECS instance.
Set the following key parameters. Other parameters are described in Custom launch.
Instance: Qwen-7B-Chat requires more than 16 GiB of GPU memory. Select at least ecs.gn6i-c4g1.xlarge.
Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.
Public IP Address: Select Assign Public IPv4 Address. Set Bandwidth Billing Method to Pay-by-traffic and Maximum Bandwidth to 100 Mbps to accelerate model downloads.

Data Disk: Qwen-7B-Chat model files require significant storage. Set the data disk size to at least 100 GiB.
Step 2: Set up the Docker environment
Install Docker.
Install Docker on Alibaba Cloud Linux 3 by following Install and use Docker and Docker Compose.
Check the Docker daemon status.
sudo systemctl status dockerInstall the NVIDIA driver and CUDA components.
sudo dnf install -y anolis-epao-release sudo dnf install -y kernel-devel-$(uname -r) nvidia-driver{,-cuda}Install the NVIDIA Container Toolkit.
sudo dnf install -y nvidia-container-toolkitThe NVIDIA Container Toolkit adds a prestart hook that exposes GPUs to containers. Restart Docker to apply the changes.
sudo systemctl restart dockerAfter the restart, use the
--gpus <gpu-request>parameter when creating containers to specify GPU passthrough.Create and run a PyTorch AI container.
AC2 provides container images for AI scenarios. Use the following image to create a PyTorch runtime environment.
sudo docker pull ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.2.0.1-3.2304-cu121 sudo docker run -itd --name pytorch --gpus all --net host -v $HOME/workspace:/workspace ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.2.0.1-3.2304-cu121These commands pull the image and create a detached container named
pytorchwith your home directory mounted into the container.
Step 3: Manually deploy Qwen-7B-Chat
Start a shell in the container.
sudo docker exec -it -w /workspace pytorch /bin/bashRun all subsequent commands inside the container. If you exit, re-enter with the same command. To verify you are in the container, run
cat /proc/1/cgroup | grep docker.Install required software.
yum install -y git git-lfs wget tmuxEnable Git LFS.
The pretrained model download requires Git LFS.
git lfs installDownload the source code and model.
Create a new tmux session.
tmuxThe model download may take a long time. Use tmux so you can resume with
tmux attachif the connection drops.Download the Qwen-7B source code and pretrained model.
git clone https://github.com/QwenLM/Qwen.git git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git qwen-7b-chat --depth=1
Set up the runtime environment.
AC2 containers include prepackaged Python AI dependencies. Install additional dependencies with
yumordnf.dnf install -y python-einops python3-datasets python3-gradio python3-mdtex2html python3-protobuf python3-psutil python3-pyyaml python3-rich python3-scikit-learn python3-scipy python3-sentencepiece python3-tensorboard python3-tiktoken python3-transformers python3-transformers-stream-generator yum-utilsSome dependencies must be installed manually to avoid overwriting AC2 image components.
yumdownloader --destdir ./rpmpkgs python3-timm python3-accelerate rpm -ivh --nodeps rpmpkgs/*.rpm && rm -rf rpmpkgsStart the AI chatbot.
Start the chatbot.
cd /workspace/Qwen python3 cli_demo.py -c ../qwen-7b-chatAfter startup, enter text at the
User:prompt to interact with Qwen-7B-Chat.
NoteEnter the
:exitcommand to exit the chatbot.