×
Community Blog Best Practice: Create a Chatbot with LLM and AnalyticDB for PostgreSQL on Alibaba Cloud

Best Practice: Create a Chatbot with LLM and AnalyticDB for PostgreSQL on Alibaba Cloud

This article explains multiple ways to create chatbots (with examples and code).

1. Deploy ECS and AnalyticDB for PostgreSQL Using Terraform Scripts

To build a chatbot, you will need to use ECS Resources with GPU and AnalyticDB for PostgreSQL with vector database capabilities. In this article, we will use terraform scripts to launch required resources, so please ensure to install a terraform environment before you start.

1.1. Develop Terraform Scripts

The 'ap-southeast-1' in the script stands for Singapore region on Alibaba Cloud. You can refer to the following blog to get all region codes of Alibaba Cloud.

Blog: What are the region code and Availability Zone (AZ) code of Alibaba Cloud ApsaraDB?

Terraform:main.tf:

provider "alicloud" {
access_key = "Your AK ID"
secret_key = "Your AK secret"

#Configure a region to deploy the resources
region = "ap-southeast-1"
}

Terraform:alicloud_llm_adb.tf:

variable "name" {
default = "auto_provisioning_group"
}

#Create a new ECS instance for a VPC
resource "alicloud_security_group" "group" {
name        = "tf_test_llm"
description = "llm"
vpc_id      = alicloud_vpc.vpc.id
}

resource "alicloud_security_group_rule" "allow_http_7860" {
type              = "ingress"
ip_protocol       = "tcp"
nic_type          = "intranet"
policy            = "accept"
port_range        = "7860/7860"
priority          = 1
security_group_id = alicloud_security_group.group.id
cidr_ip           = "0.0.0.0/0"
}

resource "alicloud_security_group_rule" "allow_ssh_22" {
type              = "ingress"
ip_protocol       = "tcp"
nic_type          = "intranet"
policy            = "accept"
port_range        = "22/22"
priority          = 1
security_group_id = alicloud_security_group.group.id
cidr_ip           = "0.0.0.0/0"
}

data "alicloud_zones" "default" {
available_disk_category     = "cloud_essd"
available_resource_creation = "VSwitch"
}

#Create a new ECS instance for VPC

resource "alicloud_vpc" "vpc" {
vpc_name   = var.name
cidr_block = "172.16.0.0/16"
}

resource "alicloud_vswitch" "vswitch" {
vpc_id       = alicloud_vpc.vpc.id
cidr_block   = "172.16.0.0/24"
zone_id      = data.alicloud_zones.default.zones.0.id
vswitch_name = var.name
}

######## ECS
resource "alicloud_instance" "instance" {
availability_zone = data.alicloud_zones.default.zones.0.id
security_groups   = alicloud_security_group.group.*.id
instance_type              = "ecs.gn6i-c8g1.2xlarge"
system_disk_category       = "cloud_essd"
system_disk_name           = "test_llm_system_disk_name"
system_disk_size           = 50
image_id                   = "ubuntu_22_04_x64_20G_alibase_20230515.vhd"
instance_name              = "test_llm"
password                   = "llm_adbpg1234"
vswitch_id                 = alicloud_vswitch.vswitch.id
}

######## EIP bind to setup ECS accessing from internet
resource "alicloud_eip" "setup_ecs_access" {
bandwidth            = "100"
internet_charge_type = "PayByBandwidth"
}

resource "alicloud_eip_association" "eip_ecs" {
allocation_id = alicloud_eip.setup_ecs_access.id
instance_id   = alicloud_instance.instance.id
}

######## AnalyticDB for PostgreSQL
resource "alicloud_gpdb_instance" "adb_pg_instance" {
db_instance_category           = "HighAvailability"
db_instance_class              = "gpdb.group.seghdx4"
db_instance_mode               = "StorageElastic"
description                    = "Vector store"
engine                         = "gpdb"
engine_version                 = "6.0"
zone_id                        = data.alicloud_zones.default.zones.0.id
seg_storage_type               = "cloud_essd"
seg_node_num                   = 4
storage_size                   = 50
instance_spec                  = "4C32G"
master_node_num                = 1
instance_network_type          = "VPC"
payment_type                   = "PayAsYouGo"
vpc_id                         = alicloud_vpc.vpc.id
vswitch_id                     = alicloud_vswitch.vswitch.id
vector_configuration_status    = "enabled"
ip_whitelist {
    security_ip_list =  alicloud_instance.instance.private_ip 
  }
}

1.2. Check Scripts and Deploy Resources

Execute the following commands to deploy resources defined in the terraform scripts.

shell:

terraform plan
terraform apply
  • terraform plan: Show changes required by the current configuration.
  • terraform apply: Create or update infrastructure.

2. Install Dependencies on ECS

Log on to the ECS and execute following commands to install dependencies of ChatBot (the application layer). Then export the AK and SK of your Alibaba Cloud account.

shell:

apt-get update
add-apt-repository ppa:graphics-drivers/ppa
apt-get install nvidia-driver-525
apt-get install postgresql-server-dev-all
pip3.10 install   langchain==0.0.146
pip3.10 install   transformers==4.27.1
pip3.10 install   unstructured
pip3.10 install   layoutparser
pip3.10 install   nltk
pip3.10 install   sentence-transformers
pip3.10 install   beautifulsoup4
pip3.10 install   icetk
pip3.10 install   cpm_kernels
pip3.10 install   faiss-cpu
pip3.10 install   accelerate
pip3.10 install   gradio==3.28.3
pip3.10 install   fastapi
pip3.10 install   uvicorn
pip3.10 install   peft
pip3.10 install   alibabacloud-bpstudio20210931==1.0.11
pip3.10 install   alibabacloud-gpdb20160503==1.1.21
pip3.10 install   psycopg2cffi

#Export AK and SK
export ALIBABA_CLOUD_ACCESS_KEY_ID="Your AK ID";
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="Your AK SECRET"

3. Deploy Large Language Model on ECS

3.1. Develop Python Scripts to Run the Application

Create a Python file and develop startup scripts to run the ChatBot application.

shell:

vim /root/chatbot.py

Copy all of the following scripts and edit line #187 to line #189, replace values accordingly with your environment. The instance ID of AnalyticDB for PostgreSQL, the private IP of ECS, and the public IP of ECS are needed.

In this article, we will load Large Language Model (LLM) ChatGLM-6B to the ChatBot. ChatGLM-6B was developed by the Tsinghua University Deep Learning Research Group. It is a 6.2 billion-parameter, open-source, multilingual version of the Generic Language Model (GLM) framework that supports both Chinese and English. You can edit the script around line #231 to use other Large Language Models, such as Dolly, GPT, and etc.

chatbot.py:

#!/usr/bin/env python 
# -*- coding: utf-8 -*-

import os, sys
import time
from typing import List
import json

from alibabacloud_bpstudio20210931.client import Client as BPStudio20210931Client
from alibabacloud_bpstudio20210931 import models as bpstudio_20210931_models
from alibabacloud_gpdb20160503.client import Client as gpdb20160503Client
from alibabacloud_gpdb20160503 import models as gpdb_20160503_models

from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient

from subprocess import Popen, PIPE

import logging
import urllib3
import warnings

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
warnings.filterwarnings("ignore")

logging.basicConfig(level=logging.DEBUG,
                    format='%(asctime)s %(levelname)s %(funcName)s %(message)s',
                    datefmt='%a, %d %b %Y %H:%M:%S',
                    filename='chatbot.log',
                    filemode='w')
console = logging.StreamHandler()
console.setLevel(logging.WARN)
formatter = logging.Formatter('%(asctime)s %(levelname)s %(funcName)s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)

class ADBPG(object):
    def __init__(self, access_key_id, access_key_secret, instId, ):
        self.access_key_id = access_key_id
        self.access_key_secret = access_key_secret
        self.instId = instId
        config = open_api_models.Config(access_key_id=self.access_key_id, access_key_secret=self.access_key_secret)
        config.endpoint = f'gpdb.aliyuncs.com'
        self.client = gpdb20160503Client(config)
        return

    def getInsAttrs(self):
        describe_dbinstance_on_ecsattribute_request = gpdb_20160503_models.DescribeDBInstanceOnECSAttributeRequest( dbinstance_id=self.instId )
        runtime = util_models.RuntimeOptions()
        status, connstr = None, None
        try:
            rawData = self.client.describe_dbinstance_on_ecsattribute_with_options(describe_dbinstance_on_ecsattribute_request, runtime)
            retData = rawData.body.to_map().get("Items").get("DBInstanceAttribute")[0]
            status  = retData.get("DBInstanceStatus")
            connstr = retData.get("ConnectionString")
        except Exception as error:
            UtilClient.assert_as_string(error.message)
        logging.debug("ADBPG-get-attr => status=[%s], connstr=[%s]" % (status, connstr))
        return status, connstr

    def allocPubConn(self, connection_string_prefix='adbpgvector', port=5432):
        allocate_instance_public_connection_request = gpdb_20160503_models.AllocateInstancePublicConnectionRequest(
            dbinstance_id=self.instId,
            connection_string_prefix=connection_string_prefix,
            port=port
        )
        runtime = util_models.RuntimeOptions()
        try:
            self.client.allocate_instance_public_connection_with_options(allocate_instance_public_connection_request, runtime)
        except Exception as error:
            UtilClient.assert_as_string(error.message)
        return

    def crtAcc(self, userName='demouser', userPwd='DemoUser123'):
        create_account_request = gpdb_20160503_models.CreateAccountRequest(
            dbinstance_id=self.instId,
            account_name=userName,
            account_password=userPwd
        )
        runtime = util_models.RuntimeOptions()
        try:
            self.client.create_account_with_options(create_account_request, runtime)
        except Exception as error:
            UtilClient.assert_as_string(error.message)
        return

    def modifySecIps(self, ip):
        modify_security_ips_request = gpdb_20160503_models.ModifySecurityIpsRequest(
            dbinstance_id=self.instId,
            security_iplist=ip
        )
        runtime = util_models.RuntimeOptions()
        try:
            self.client.modify_security_ips_with_options(modify_security_ips_request, runtime)
        except Exception as error:
            UtilClient.assert_as_string(error.message)
        return

    def checkIns(self):
        runtime = util_models.RuntimeOptions()
        describe_accounts_request = gpdb_20160503_models.DescribeAccountsRequest( dbinstance_id=self.instId )
        try:
            retData = self.client.describe_accounts_with_options(describe_accounts_request, runtime)
            logging.debug("ADBPG-checkIns DescribeAccounts => \n {%s} \n " % (retData.body.to_map().get("Accounts").get("DBInstanceAccount")))
        except Exception as error:
            UtilClient.assert_as_string(error.message)

        describe_dbinstance_iparray_list_request = gpdb_20160503_models.DescribeDBInstanceIPArrayListRequest( dbinstance_id=self.instId )
        try:
            retData = self.client.describe_dbinstance_iparray_list_with_options(describe_dbinstance_iparray_list_request, runtime)
            logging.debug("ADBPG-checkIns DescribeDBInstanceIPArrayList => \n {%s} \n " % (retData.body.to_map().get("Items").get("DBInstanceIPArray")))
        except Exception as error:
            UtilClient.assert_as_string(error.message)


def LocalShellCmd(cmd, env=None, shell=True):
    p = Popen(
        cmd,
        stdin = PIPE,
        stdout = PIPE,
        stderr = PIPE,
        env = env,
        shell = shell
    )
    stdout, stderr = p.communicate()
    rc = p.wait()
    logging.debug("LocalShellCmd => cmd = [%s] \n stdout => [%s] \n" % (cmd, stdout))
    assert (rc == 0)
    return stdout.strip()


def envCheck():

    if os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET', None) is None or os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID', None) is None:
        print("""\tERROR! Unable to obtain your environment variable:ALIBABA_CLOUD_ACCESS_KEY_ID 和 ALIBABA_CLOUD_ACCESS_KEY_SECRET,Please perform the following method to config:=> \n
                export ALIBABA_CLOUD_ACCESS_KEY_ID=<ak>
                export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<sk>

                If you have difficulty obtaining AK/SK, please refer to the documentation for instructions:https://help.aliyun.com/document_detail/53045.html
        """)
        exit(-1)
    else:
        logging.debug( "ALIBABA_CLOUD_ACCESS_KEY_ID = [%s], ALIBABA_CLOUD_ACCESS_KEY_SECRET = [%s]" % (os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'], os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']) )



    cmd = "nvidia-smi > /dev/null 2>&1"
    LocalShellCmd(cmd)

    cmd = "dpkg -l | grep nvidia-driver-525"
    LocalShellCmd(cmd)

    cmd = "dpkg -l | grep postgresql-server-dev-all"
    LocalShellCmd(cmd)

    for pkg in [ "langchain","transformers","unstructured","layoutparser","nltk","sentence-transformers","beautifulsoup4","icetk","cpm-kernels","faiss-cpu","accelerate","gradio","fastapi","uvicorn","peft","alibabacloud-bpstudio20210931","alibabacloud-gpdb20160503" ]:
        cmd = "pip3.10 list | grep %s"  % pkg
        LocalShellCmd(cmd)


if __name__ == '__main__':

    print("\n" + "*"*30 + """ Tips:\n
            1)If an error occurs during script execution, you can self troubleshoot it by viewing the/root/chatbot.log file (very simple)!
            2)If you need to restart services such as WEBUI or view database information, you can refer to the/root/env.txt file!\n"""+ "*"*30 + "\n")
    print("*"*30 + "Step0: Environmental checks are underway, such as driver and installation dependency packages" + "*"*30)
    envCheck()

    access_key_id      = os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID']
    access_key_secret  = os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']
    userName   = "demouser"
    userPwd    = "DemoUser123"
    dbName     = "demouser"



    print("*"*30 + "Step1: Ensure correct AKID/AKSEC information is set (requires instance operation through openAPI)" + "*"*30)
    ans = input("""\nBefore executing chatbot.py, please confirm your AK/SK again. Currently, in your environment:
    \n access_key_id = [%s], access_key_secret = [%s] \n
    Are you sure to continue? Please enter(Y/N): """ % (access_key_id, access_key_secret))
    if ans not in ["Y", "y", "YES", "yes"]:
        print("\n Please set first_ Key_ ID, access_ Key_ Secret in onekey.py, then run the script. Thank you for your cooperation!\n")
        exit(-1)

    print("*"*30 + "Step3: Call the ADBPG API to config ADBPG related information: create an account, whitelist, etc" + "*"*30)
    adbpgInstID = "gp-gs59uf74wi8q4gcvo"        # replace by your AnalyticDB for PostgreSQL instance id
    ecsPriIpAddr = "172.16.0.181"               # replace by your ECS private IP address
    ecsPubIpAddr = "8.222.204.195"              # replace by your ECS public IP address
    adbpg  = ADBPG(access_key_id, access_key_secret, adbpgInstID)
    status, connstr = adbpg.getInsAttrs()
    assert(status  is not None)
    assert(connstr is not None)
    while status == "Creating":
        print("ADBPG Instance is creating, please wait a minutes! check again!")
        logging.debug("ADBPG Instance status=[%s]@[%s]" % (status, time.localtime()))
        time.sleep(120)
        status, connstr = adbpg.getInsAttrs()
    adbpg.crtAcc(userName, userPwd)
    time.sleep(10)
    adbpg.modifySecIps(ecsPriIpAddr)
    time.sleep(10)
    adbpg.checkIns()

    print("*"*30 + "Step4: Configing operating system environment variables, preparing to download models, and launching web programs, that will take a long time!" + "*"*30)
    # setting os system variables
    os.chdir("/root")
    os.environ["PG_HOST"] = connstr
    os.environ["PG_PORT"] = "5432"
    os.environ["PG_USER"] = userName
    os.environ["PG_PASSWORD"] = userPwd
    os.environ["PG_DATABASE"] = dbName
    logging.debug("""ADBPG SYSTEM VARIABLE =>
        export PG_HOST=%s
        export PG_PORT=%s
        export PG_USER=%s
        export PG_PASSWORD=%s
        export PG_DATABASE=%s
            """ % (connstr, "5432", userName, userPwd, dbName))

    with open("env.txt", "w") as fw:
        fw.write("export ALIBABA_CLOUD_ACCESS_KEY_ID=%s\n" % os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'])
        fw.write("export ALIBABA_CLOUD_ACCESS_KEY_SECRET=%s\n" % os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'])
        fw.write("export PG_HOST=%s\n" % connstr)
        fw.write("export PG_PORT=%s\n" % 5432)
        fw.write("export PG_USER=%s\n" % userName)
        fw.write("export PG_PASSWORD=%s\n" % userPwd)
        fw.write("export PG_DATABASE=%s\n" % dbName)
        fw.write("#webui url=> %s:7860\n" % ecsPubIpAddr)

    cmd1 = "cd /root; git clone https://github.com/nankingguo/langchain-ChatGLM.git ; cd langchain-ChatGLM ; git checkout analyticdb_store"
    cmd2 = "nohup python3.10 /root/langchain-ChatGLM/webui_en.py > webui.log 2>&1 &"

    ans = input("""Preparing to download the open source big model ChatGLM, please note and confirm with you:

    \033[1;5;32;4m The model is based on ChatGLM-6B, developed by a team from Tsinghua University. It is an open source, bilingual dialogue language model that supports both Chinese and English. It is based on the General Language Model (GLM) architecture and has 6.2 billion parameters.

This project is only for user scientific research, please consciously comply with it https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE protocol

    [Special reminder] Alibaba Cloud does not guarantee the legality, security, and accuracy of the third-party models you use on the image, and does not assume responsibility for any damage caused thereby; You should consciously abide by the user agreement, usage specifications, and relevant laws and regulations of the third-party model installed on the image, and bear relevant responsibilities for the legality and compliance of using the third-party model on your own.\033[0m

    Are you sure to continue? Please enter(Y/N): """)
    if ans not in ["Y", "y", "YES", "yes"]:
        print("\n You have not confirmed the above agreement. Thank you for your cooperation and the execution will be terminated!\n")
        exit(-1)


    print("*"*35 + "Step4.1: Download langchain code!" + "*"*30)
    LocalShellCmd(cmd1)

    print("*"*35 + """Step4.2: Starting to run the chatGLM model, due to its large size (around 17GB), downloading it will take a long time and is expected to take about 15 minutes. Please be patient and wait,
            The specific progress can be viewed through  \033[1;5;32;4m tail -f webui.log \033[0m  ...""" + "*"*30)
    LocalShellCmd(cmd2)
    print("*="*30)

    print("""
        【Alibaba Cloud does not guarantee the legality, security, and accuracy of the third-party model you use on the image, and is not responsible for any damage caused thereby; You should consciously abide by the user agreement, usage specifications, and relevant laws and regulations of the third-party model installed on the image, and bear relevant responsibilities for the legality and compliance of using the third-party model on your own.】

        Everything is ready in the environment, and you can access\n\t\t\t=>=>=> %s:7860 <=<=<=\n\t and experience the memory capable Chatbot through a browser!!!
    """ % ecsPubIpAddr)
    print("*="*30)

3.2 Run the Application

Execute the following scripts on the command line to start the ChatBot and observe the progress of loading in the application log.

shell:

python3.10 /root/chatbot.py

# Observe status of the application by logging
tail -f webui.log

4. Access to the ChatBot and Start a Dialog

Use http://public IP of ECS:7860 to access the ChatBot on browser. If the page loads successfully as below, congratulations, you have managed to deploy a AI-driven ChatBot by your own. Have fun!

2

1 2 1
Share on

ApsaraDB

377 posts | 57 followers

You may also like

Comments

Dikky Ryan Pratama July 14, 2023 at 2:39 am

Awesome!