Deploy Qwen-7B-Chat on Intel CPU with AC2 container image - Alibaba Cloud Linux

Background

Qwen-7B is a 7-billion-parameter model in the Qwen large language model (LLM) series developed by Alibaba Cloud. It is a Transformer-based LLM that is pre-trained on a massive dataset containing a wide range of data, including web text, professional books, and code. Qwen-7B-Chat is an AI assistant created by applying alignment techniques to the base Qwen-7B model.

Important

The code for Qwen-7B-Chat is open-sourced under the LICENSE. To use it for commercial purposes free of charge, you must submit a commercial license application. You must comply with the user agreements, usage specifications, and all applicable laws and regulations for any third-party models you use. You are solely responsible for ensuring the legality and compliance of your use of these models.

Step 1: Create an ECS instance

Go to the instance buy page.
Configure the following parameters to create an ECS instance.

For other parameters, refer to Custom launch.
- Instance: Qwen-7B-Chat requires approximately 30 GiB of memory. Select an instance type of at least ecs.g8i.4xlarge (64 GiB memory) to ensure stable model operation.
- Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.
- Public IP Address: Select Assign Public IPv4 Address. Set Bandwidth Billing Method to Pay-by-traffic and the
  
  to 100 Mbps to accelerate the model download.
- Data Disk: The model files require significant storage. Set the data disk size to 100 GiB.

Step 2: Create a Docker runtime environment

Install Docker.

To install Docker on Alibaba Cloud Linux 3, follow the steps in Install and use Docker and Docker Compose.
Verify that the Docker daemon is running:
```
sudo systemctl status docker
```

Pull and run the PyTorch AI container:

AC2 provides PyTorch images optimized for Intel CPUs. Use this image to set up a PyTorch runtime environment.

sudo docker pull ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.0.1-3.2304
sudo docker run -itd --name pytorch --net host -v $HOME/workspace:/workspace   ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/pytorch:2.0.1-3.2304

Step 3: Manually deploy Qwen-7B-Chat

Enter the container environment:
```
sudo docker exec -it -w /workspace pytorch /bin/bash
```
All subsequent commands run inside the container. If you exit accidentally, re-run the preceding command. To verify you are in the container, run cat /proc/1/cgroup | grep docker.
Install the required tools:
```
yum install -y tmux git git-lfs wget
```
Enable Git LFS:

Git LFS is required to download the pre-trained model.
```
git lfs install
```
Download the source code and model.
1. Create a tmux session:
```
tmux
```
  Note
  Downloading the pre-trained model may take a long time depending on network conditions. Use a tmux session to prevent interruption if your SSH connection drops.
2. Download the Qwen-7B project source code and the pre-trained model:
```
git clone https://github.com/QwenLM/Qwen.git
git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git qwen-7b-chat --depth=1
```
3. Verify the directory contents:
```
ls -l
```
Set up the runtime environment.

The AC2 container includes many Python AI dependencies. Install the remaining dependencies with yum or dnf.
```
yum install -y python3-{transformers{,-stream-generator},tiktoken,accelerate} python-einops
```
Start the chatbot.
1. Modify the model loading parameters.
  
  The source code includes a terminal demo script for running Qwen-7B-Chat as a local chatbot. Modify the model loading parameters to use BF16 precision, which leverages the CPU's AVX-512 instruction set for acceleration.
```
cd /workspace/Qwen
grep "torch.bfloat16" cli_demo.py 2>&1 >/dev/null || sed -i "57i	orch_dtype=torch.bfloat16," cli_demo.py
```
2. Start the chatbot:
```
cd /workspace/Qwen
python3 cli_demo.py -c ../qwen-7b-chat --cpu-only
```
  After deployment, interact with the Qwen-7B-Chat LLM by entering messages at the User> prompt.
  
  Note
  Run :exit to exit the chatbot.