Quick Starts

Deploy ChatGLM on ECS

Overview

This topic provides a tutorial on how to deploy ChatGLM on Alibaba Cloud Elastic Compute Service (ECS).

Contents

Step 1: Install the GPU driver

GPUs are used to accelerate the inference and fine-tuning workloads for ChatGLM. To take advantage of GPU capabilities, you must install a GPU driver on the ECS instance on which you want to deploy ChatGLM.

Step 2: Install NVIDIA Docker

ChatGLM requires multiple dependencies and may involve complex rollback operations in case of installation failure. To prevent this situation, we recommend that you deploy an NVIDIA Docker environment and then deploy ChatGLM in the environment.

Step 3: Deploy ChatGLM

Use an interactive demo, a web demo, or an API demo to deploy ChatGLM.

ChatGLM

ChatGLM is a chatbot platform that uses machine learning and Natural Language Processing technologies to provide customized interactive chat experience. ChatGLM can streamline communication between enterprises and customers, and help promote customer engagement. It offers customizable features, complete with user-friendly interfaces, and is suitable for enterprises of all sizes from various industries.

For more information about ChatGLM, visit the following websites:

Code: https://github.com/THUDM/ChatGLM-6B

Model files: https://huggingface.co/THUDM/chatglm-6b/tree/main

Prerequisites

Before you start, you will need an ECS instance that meets the following requirements:

  • vCPUs: 16

  • GPU memory:

  • Inference workloads: ≥13 GB

  • Fine-tuning workloads: ≥24 GB

  • Memory: ≥16 GB

  • Operating system: Ubuntu x86_64

  • Networking: VPC-connected with Internet access

In this tutorial, an ECS instance with 16 vCPUs and 62 GB of memory is used. The instance runs a Ubuntu 22.04 64-bit operating system and uses NVIDIA T4 GPUs. The instance resides in a virtual private cloud (VPC) and can access the Internet to download resources.

Step 1: Install the GPU driver

1. View the GPU model and the recommended GPU driver version.

ubuntu-drivers devices

image

The command output shown in the preceding figure indicates that the GPU model is NVIDIA T4 and the recommended GPU driver is nvidia-driver-525.

2. Install the NVIDIA GPU driver.

To install a specific driver, run the following command. In this example, the recommended driver (nvidia-driver-525) is installed.

sudo apt install nvidia-driver-525

Alternatively, run the following command to install the recommended GPU driver automatically:

sudo ubuntu-drivers autoinstall

Step 2: Install NVIDIA Docker

1. Install Docker-CE

a. Update the apt packages index and obtain the latest version information.

sudo apt-get update

b. Install dependencies.

sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common

c. Add the GPG key from Docker.

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

d. Add the apt repository from Docker.

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

e. Re-update the apt packages index.

sudo apt-get update

f. Install Docker-CE.

sudo apt-get install docker-ce

2. Install the NVIDIA Container Toolki

a. Add the GPG key from NVIDIA.

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \n  sudo apt-key add -

b. Add the apt repository from NVIDIA.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \n  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

c. Update the apt packages index.

sudo apt-get update

d. Install the NVIDIA Container Toolkit.

sudo apt-get install nvidia-container-toolkit

e. Restart Docker.

sudo systemctl restart docker

f. Check whether the NVIDIA Container Toolkit is installed. If the NVIDIA Container Toolkit is installed, information about the toolkit is displayed in the command output.

nvidia-smi

You have installed the NVIDIA Container Toolkit and can start to use NVIDIA GPUs in Docker containers.

3. Start and run NVIDIA Docker

a. View the image that you need at https://hub.docker.com/r/nvidia/cuda.

image

In this tutorial, the 12.1.0-base-ubuntu22.04image is used.

b. Download the image version that you need.

docker pull nvidia/cuda:12.1.0-base-ubuntu22.04

c. Start a Docker container.

sudo docker run -itd --gpus all --name my-container nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

d. View the running Docker containers.

docker ps -a

image The ID of the Docker container (CONTAINER ID) that you start is displayed in the command output. In this tutorial, the container ID is 257e976ff487. If the container is not in the Running state, run the following command to start the container:

docker start 257e976ff487

Note: Replace the container ID with your actual container ID.

e. Access the Docker container.

docker attach 257e976ff487

You have installed NVIDIA Docker and can proceed to deploy ChatGLM in NVIDIA Docker.

Step 3: Deploy ChatGLM

1. Install Python.

a. Update the apt packages index.

apt-get update

b. Install Python 3 and enter Y at every prompt during the installation process.

apt-get install python3

c. Check whether Python 3 is installed. If Python 3 is installed, a version number is displayed in the command output.

python3 -V

2. Install Python pip.

a. Install pip for Python 3 and enter Y at every prompt during the installation process.

apt-get install python3-pip

b. Check whether pip for Python 3 is installed. If pip for Python 3 is installed, a version number is displayed in the command output.

pip3 -V

3. Install Git.

a. Install Git and enter Y at every prompt during the installation process.

apt-get install git

b. Check whether Git is installed. If Git is installed, a version number is displayed in the command output.

git --version

4. Deploy ChatGLM.

a. Download ChatGLM from GitHub and open the folder of ChatGLM.

git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B

b. Use pip to install dependencies.

pip install -r requirements.txt

c. Use the interactive demo, web demo, or API demo of ChatGLM to deploy ChatGLM.

  • Use the interactive demo of ChatGLM to deploy ChatGLM.

I. Run the interactive demo cli_demo.py to download and test models.

python3 cli_demo.py

II. Enter any text. If responses are returned, models are deployed.

image

  • Use the web demo of ChatGLM to deploy ChatGLM.

I. Use pip to install Gradio.

pip install gradio

II. Run web_demo.py to start a web server.

python3 web_demo.py

II. A URL is returned. Access the URL in a browser to use the web demo.

image

  • Use the API demo of ChatGLM to deploy ChatGLM.

I. Install FastAPI and Uvicorn.

pip install fastapi uvicorn

II. Run api.py to start the API.

python3 api.py

III. Find the URL of the API that is returned.

image

IV. Use cURL to test the API.

Note: Replace the URL with the URL that you obtained.

curl -X POST "http://127.0.0.1:8000" \
    -H 'Content-Type: application/json' \
    -d '{"prompt": "who are you?", "history": []}'

You have deployed ChatGLM on the ECS instance and can proceed to explore more ChatGLM features.

# Summary

By the time you complete this tutorial, you will have learnt how to:

**Install the GPU driver**

**Install NVIDIA Docker**

**Deploy ChatGLM**

ChatGLM makes it easy to set up a chatbot. You can use ChatGLM to bridge the communication gap between your business and customers, streamlining communications and providing a customized interactive chat experience.

Was this helpful?

open