AI Container Image Deployment: Qwen-Audio-Chat

Background

Qwen-Audio is a large audio language model developed by Alibaba Cloud, capable of processing various audio inputs including speaker speech, natural sounds, music, and singing to produce text output. Building upon Qwen-Audio, an AI voice assistant called Qwen-Audio-Chat has been developed using alignment mechanisms based on the large language model. This AI voice assistant supports flexible interaction methods such as multi-audio, multi-round question and answer, creation, and other capabilities.

This article introduces how to quickly build an AI voice assistant service based on Alibaba Cloud AMD servers and OpenAnolis AI container service.

Create an ECS Instance

When creating an ECS instance, the instance type should be selected based on the model's size. The inference process of the entire model requires a significant amount of computing resources and runtime memory. To ensure model stability, the recommended instance type is ecs.g8a.4xlarge. Furthermore, the operation of Qwen-Audio-Chat necessitates downloading multiple model files, thus necessitating a considerable amount of storage allocation of at least 100 GB when creating the instance. Additionally, to expedite environment installation and model downloads, the instance's bandwidth should be allocated at 100 Mbit/s.

The operating system chosen is Alibaba Cloud Linux 3.2104 LTS 64-bit.

Create a Docker Runtime Environment

Install Docker

For more information about how to install Docker on Alibaba Cloud Linux 3, see Install and use Docker (Linux). After the installation is completed, make sure that the Docker daemon has been enabled.

systemctl status docker

Create and Run a PyTorch AI Container

The OpenAnolis community provides a variety of container images based on Anolis OS, including AMD-optimized PyTorch images. You can use these images to create a PyTorch runtime environment.

docker pull registry.openanolis.cn/openanolis/pytorch-amd:1.13.1-23-zendnn4.1
docker run -d -it --name pytorch-amd --net host -v $HOME:/root registry.openanolis.cn/openanolis/pytorch-amd:1.13.1-23-zendnn4.1

The above command first pulls the container image, then uses the image to create a container named pytorch-amd that runs in independent mode and maps the user's home directory to the container to preserve the development content.

Manual Deployment Procedure

Enter the Container Environment

After the PyTorch container is created and run, run the following command to access the container environment:

docker exec -it -w /root pytorch-amd /bin/bash

You must run subsequent commands in the container environment. If you exit unexpectedly, re-enter the container environment. To check whether the current environment is a container, you can use the following command to query.

cat /proc/1/cgroup | grep docker
# A command output indicates that it is the container environment

Software Installation Configuration

Before deploying Qwen-Audio-Chat, you need to install some required software.

yum install -y git git-lfs wget xz gperftools-libs anolis-epao-release

The subsequent download of the pre-trained model requires support for Git LFS to be enabled.

git lfs install

Download the Source Code and Pre-trained Models

Download the GitHub project source code and the pre-trained model.

git clone https://github.com/QwenLM/Qwen-Audio.git
git clone https://www.modelscope.cn/qwen/Qwen-Audio-Chat.git qwen-audio-chat

Deploy the Runtime Environment

Before deploying the Python environment, you can change the pip download source to speed up the download of the dependency package.

mkdir -p ~/.config/pip && cat > ~/.config/pip/pip.conf <<EOF
[global]
index-url=http://mirrors.cloud.aliyuncs.com/pypi/simple/
[install]
trusted-host=mirrors.cloud.aliyuncs.com
EOF

Install Python runtime dependencies.

yum install -y python3-transformers python-einops
pip install typing_extensions==4.5.0 tiktoken transformers_stream_generator accelerate gradio

Install ffmpeg.

wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-6.1-amd64-static.tar.xz
tar -xf ffmpeg-6.1-amd64-static.tar.xz
cp ffmpeg-6.1-amd64-static/{ffmpeg,ffprobe} /usr/local/bin
rm -rf ffmpeg-6.1-amd64-static*

To ensure that ZenDNN can fully release CPU computing power, two environment variables need to be set: OMP_NUM_THREADS and GOMP_CPU_AFFINITY.

cat > /etc/profile.d/env.sh <<EOF
export OMP_NUM_THREADS=\$(nproc --all)
export GOMP_CPU_AFFINITY=0-\$(( \$(nproc --all) - 1 ))
EOF
source /etc/profile

Run the Web Demo

A web demo is provided in the project source code, which can be used to interact with Qwen-Audio-Chat in real time.

cd ~/Qwen-Audio
export LD_PRELOAD=/usr/lib64/libtcmalloc.so.4
python3 web_demo_audio.py -c=${HOME}/qwen-audio-chat/ --cpu-only --server-name=0.0.0.0 --server-port=7860

After the service is deployed, you can go to http://<ECS public IP address>:7860 to access the service.

Community

AI Container Image Deployment: Qwen-Audio-Chat

Background

Create an ECS Instance

Create a Docker Runtime Environment

Install Docker

Create and Run a PyTorch AI Container

Manual Deployment Procedure

Enter the Container Environment

Software Installation Configuration

Download the Source Code and Pre-trained Models

Deploy the Runtime Environment

Run the Web Demo

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Alibaba Cloud Model Studio

Container Service for Kubernetes

Container Compute Service (ACS)

Conversational AI Service