Deploy the Qwen-7B-Chat large language model on an Intel CPU by using an Alibaba Cloud AI Containers (AC2) image.
Background
Qwen-7B is a 7-billion-parameter model in the Qwen large language model (LLM) series developed by Alibaba Cloud. It is a Transformer-based LLM that is pre-trained on a massive dataset containing a wide range of data, including web text, professional books, and code. Qwen-7B-Chat is an AI assistant created by applying alignment techniques to the base Qwen-7B model.
The code for Qwen-7B-Chat is open-sourced under the LICENSE. To use it for commercial purposes free of charge, you must submit a commercial license application. You must comply with the user agreements, usage specifications, and all applicable laws and regulations for any third-party models you use. You are solely responsible for ensuring the legality and compliance of your use of these models.
Step 1: Create an ECS instance
-
Go to the instance buy page.
-
Configure the following parameters to create an ECS instance.
For other parameters, refer to Custom launch.
-
Instance: Qwen-7B-Chat requires approximately 30 GiB of memory. Select an instance type of at least ecs.g8i.4xlarge (64 GiB memory) to ensure stable model operation.
-
Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.
-
Public IP Address: Select Assign Public IPv4 Address. Set Bandwidth Billing Method to Pay-by-traffic and the
to 100 Mbps to accelerate the model download.
-
Data Disk: The model files require significant storage. Set the data disk size to 100 GiB.
-
Step 2: Create a Docker runtime environment
-
Install Docker.
To install Docker on Alibaba Cloud Linux 3, follow the steps in Install and use Docker and Docker Compose.
-
Verify that the Docker daemon is running:
sudo systemctl status docker -
Pull and run the PyTorch AI container:
AC2 provides PyTorch images optimized for Intel CPUs. Use this image to set up a PyTorch runtime environment.
sudo docker pull ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.0.1-3.2304 sudo docker run -itd --name pytorch --net host -v $HOME/workspace:/workspace ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.0.1-3.2304
Step 3: Manually deploy Qwen-7B-Chat
-
Enter the container environment:
sudo docker exec -it -w /workspace pytorch /bin/bashAll subsequent commands run inside the container. If you exit accidentally, re-run the preceding command. To verify you are in the container, run
cat /proc/1/cgroup | grep docker. -
Install the required tools:
yum install -y tmux git git-lfs wget -
Enable Git LFS:
Git LFS is required to download the pre-trained model.
git lfs install -
Download the source code and model.
-
Create a tmux session:
tmuxNoteDownloading the pre-trained model may take a long time depending on network conditions. Use a tmux session to prevent interruption if your SSH connection drops.
-
Download the Qwen-7B project source code and the pre-trained model:
git clone https://github.com/QwenLM/Qwen.git git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git qwen-7b-chat --depth=1 -
Verify the directory contents:
ls -l
-
-
Set up the runtime environment.
The AC2 container includes many Python AI dependencies. Install the remaining dependencies with
yumordnf.yum install -y python3-{transformers{,-stream-generator},tiktoken,accelerate} python-einops -
Start the chatbot.
-
Modify the model loading parameters.
The source code includes a terminal demo script for running Qwen-7B-Chat as a local chatbot. Modify the model loading parameters to use BF16 precision, which leverages the CPU's AVX-512 instruction set for acceleration.
cd /workspace/Qwen grep "torch.bfloat16" cli_demo.py 2>&1 >/dev/null || sed -i "57i orch_dtype=torch.bfloat16," cli_demo.py -
Start the chatbot:
cd /workspace/Qwen python3 cli_demo.py -c ../qwen-7b-chat --cpu-onlyAfter deployment, interact with the Qwen-7B-Chat LLM by entering messages at the
User>prompt.
NoteRun
:exitto exit the chatbot.
-