Build an LLM-powered chatbot that is connected to a vector database -

You can use the Elastic Algorithm Service (EAS) module of Platform for AI (PAI) to build a chatbot that is powered by a large language models (LLM). If you want the chatbot to answer questions based on external knowledge, you can store the knowledge in a vector database and connect the database to the LLM by using the open source framework LangChain. This topic describes how to use PAI to build an LLM-powered chatbot that is connected to a vector database.

Background information

LangChain is an open source framework that allows you to combine LLMs, such as Tongyi Qianwen, with external data sources, thereby providing better model performance without additional computing resources. LangChain performs natural language processing on your knowledge base files and then stores the data in a vector database. This way, LangChain retrieves information that is relevant to a user query from the vector database, includes the retrieved information and the query in a prompt, and then sends the prompt to the LLM to generate a response. LangChain also supports user-defined prompts and multi-round conversations.

Prerequisites

PAI is activated. For more information, see Activate PAI and create the default workspace.
A virtual private cloud (VPC), vSwitch, and security group are created. For more information, see Create and manage a VPC and Create a security group.

Step 1: Prepare a vector database

You can use one of the following services to build a vector database:

Faiss does not require activation or purchase, whereas Hologres, AnalyticDB for PostgreSQL, and Elasticsearch require activation and Web User Interface (WebUI) configuration. The WebUI configuration is used to connect the vector database.

Hologres

Purchase a Hologres instance and create a database. For more information, see Purchase a Hologres instance. You must save the name of the database to your on-premises machine.
View the invocation information in the Hologres console.
1. Click the name of the instance to go to the Instance Details page.
2. In the Network Information section, find Select VPC, click Copy in the Endpoint column, and then save the content before :80 in the endpoint to your on-premises machine.
In the left-side navigation pane, click Account Management to create a custom account. Save the account and password to your on-premises machine. This information is used for subsequent connections to the Hologres instance. For information about how to create a custom account, see the "Create a custom account" section in the Manage users topic.
Set the Select Member Role parameter to Examples of the Super Administrator (SuperUser).

AnalyticDB for PostgreSQL

Create an instance in the AnalyticDB for PostgreSQL console. For more information, see Create an instance.
Set the Vector Engine Optimization parameter to Enabled.
Click the name of the instance to go to the Basic Information page. In the Database Connection Information section, copy the internal and public endpoints of the instance and save them to your on-premises machine.
Note
- If no public endpoints are available, click Apply for Public Endpoint. For more information, see Manage public endpoints.
- If the instance resides in the same VPC as EAS, you need only the internal endpoint.
Create a database account. Save the database account and password to your on-premises machine. This information is used for subsequent connections to the database. For more information, see Create a database account.
Create a whitelist that consists of trusted IP addresses. For more information, see Configure an IP address whitelist.

Elasticsearch

Create anElasticsearch cluster. For more information, see Create an Alibaba Cloud Elasticsearch cluster.
- Copy the values of the Username and Password parameters and save them to your on-premises machine.
Click the name of the instance to go to the Basic Information page. Copy the values of Internal Endpoint and Internal Port and save them to your on-premises machine.

Faiss

Faiss streamlines the process of building a local vector database. You do not need to purchase or activate the service.

Step 2: Use EAS to deploy an LLM for inference

Go to the EAS-Online Model Services page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace to which the model service that you want to manage belongs.
3. In the left-side navigation pane, choose Model Deployment>Elastic Algorithm Service (EAS) to go to the EAS-Online Model Services page.
On the PAI-EAS Model Online Service page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

On the Deploy Service page, configure the parameters. The following table describes key parameters.

Parameter	Description
Service Name	The name of the service.
Deployment Mode	Select Deploy Web App by Using Image.
Select Image	Click PAI Image. In the drop-down lists that appear, select chat-llm-webui and then select 2.0. Note The image version is updated frequently. We recommend that you select the latest version.
Command to Run	The command varies based on the model that you want to use. If you use the ChatGLM2-6B model: `python webui/webui_server.py --port=8000 --model-path=THUDM/chatglm2-6b`. If you use the Qwen-7B model: `python webui/webui_server.py --port=8000 --model-path=Qwen/Qwen-7B-Chat`. If you use the Llama-2-7b model: `python webui/webui_server.py --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf`. If you use the Llama-2-13b model: `python webui/webui_server.py --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16`. Port number: Set the value to 8000.
Resource Group Type	Select Public Resource Group.
Resource Configuration Mode	Select General.
Resource Configuration	Select GPU. In the list that appears, select an option for Instance Type. We recommend that you select ml.gu7i.c16m60.1-gu30 for cost efficiency. However, if you use the Llama-2-13b model, we recommend that you select ecs.gn6e-c12g1.3xlarge.
VPC Settings	If you use Hologres, AnalyticDB for PostgreSQL, or Elasticsearch to create the vector database, select the VPC where the vector database resides. If you use Faiss to create the vector database, select any available VPC.

Click Deploy and wait for the deployment to complete.
If the value in the Service Status column changes to Running, the service is deployed.
Obtain the endpoint and token of the service.
1. Click the service name to go to the Service Details page.
2. In the Basic Information section, click View Endpoint Information.
3. In the Call Information dialog box, click the VPC Endpoint tab. Copy the endpoint and token of the service and save them to your on-premises machine.

Step 3: Use EAS to deploy a LangChain chatbot

PAI provides a convenient and efficient method to deploy a LangChain chatbot. You need to only select an image in EAS. For more information about how to use LangChain, see the GitHub sample code.

On the PAI-EAS Model Online Service page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

On the Deploy Service page, configure the parameters. The following table describes key parameters.

Parameter	Description
Service Name	The name of the service. In this example, the name is chatbot_langchain_vpc.
Deployment Mode	Select Deploy Web App by Using Image.
Select Image	Click PAI Image. In the drop-down lists that appear, select chatbot-langchain and then select 1.0. The image version is updated frequently. We recommend that you select the latest version.
Command to Run	Command: `python webui.py --port=8000`. Port number: 8000.
Resource Group Type	Select Public Resource Group.
Resource Configuration Mode	In the Resource Configuration section, click CPU and select ecs.c7.4xlarge as the instance type. Click Extra System Storage and set the Additional System Disk parameter to 60 GB.
VPC Settings	If you use Hologres, AnalyticDB for PostgreSQL, or Elasticsearch to create the vector database, select the VPC where the vector database resides. If you use Faiss to create the vector database, select the VPC that you configured when you deployed the LLM.

Click Deploy and wait for the deployment to complete.
If the value in the Service Status column changes to Running, the service is deployed.
After you deploy the service, click View Web App in the Service Type column to enter the WebUI.

Step 4: Use LangChain to connect the vector database and LLM

WebUI configuration

On the Settings tab of the WebUI, configure the following parameters:
- Emebdding Model: the embedding model. Each model has a corresponding dimension, which is specified by the Emebdding Dimension parameter. We recommend that you set this parameter to SGPT-125M-weightedmean-nli-model.
- Emebdding Dimension: After you configure the Emebdding Model parameter, the system automatically configures this parameter.
- EAS Url: the endpoint that you obtained in Step 2.
- EAS Token: the token that you obtained in Step 2.
- Vector Store: Configure this parameter based on the vector database that you use.
  Hologres
  - Host: the Hologres invocation information that you obtained in Step 1.
  - Database: the name of the database that you created in Step 1.
  - User: the custom account that you created in Step 1.
  - Password: the password of the custom account that you created in Step 1.
  After you configure the preceding parameters, click Connect Hologres and check whether the Hologres instance is connected.
  AnalyticDB AnalyticDB for PostgreSQL
  - Host: the public endpoint that you obtained in Step 1.
    Note
    If the instance resides in the same VPC as EAS, you need only the internal endpoint.
  - User: the database account that you created in Step 1.
  - Database: the name of the database that you created. To view the name of the database, log on to the database. For more information, see Connect to a database.
  - Password: the password of the database that you created in Step 1.
  - Pre_delete: specifies whether to delete the existing database. Valid values: True (delete) and False (do not delete).
  Elasticsearch
  - URL: the private endpoint and port that you obtained in Step 1. Specify the parameter in the http://<private endpoint>: <port> format.
  - Index: the name of the index that you want to use.
  - User: the logon name that you configured when you created the Elasticsearch cluster in Step 1.
  - Password: the logon password that you configured when you created the Elasticsearch cluster in Step 1.
  After you configure the preceding parameters, click Connect Elasticsearch and check whether the Elasticsearch cluster is connected.
  Faiss
  - Path: the name of the database folder. Example: faiss_path.
  - Index: the name of the index folder. Example: faiss_index.
You can also upload a configuration file on the Settings tab and click Parse Config to parse the configuration file. If the parse operation succeeds, the parameters are automatically configured on the WebUI based on the configuration file. Sample configuration files:
Hologres
```
{
  "embedding": {
    "model_dir": "embedding_model/",
    "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
    "embedding_dimension": 768
  },
  "LLM": "EAS",
  "EASCfg": {
    "url": "http://xx.vpc.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
    "token": "xxxxxxx=="
  },

  "vector_store": "Hologres",

  "HOLOCfg": {
    "PG_HOST": "hgpostcn-cn.xxxxxx.vpc.hologres.aliyuncs.com",
    "PG_PORT": "80",
    "PG_DATABASE": "langchain",
    "PG_USER": "user",
    "PG_PASSWORD": "password"
  }
}
```
In the preceding sample file, EASCfg contains the endpoint and token of the LLM service. HOLOCfg contains the configuration of the Hologres instance. You can configure the parameters based on the instructions on the WebUI.
AnalyticDB
```
{
  "embedding": {
    "model_dir": "embedding_model/",
    "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
    "embedding_dimension": 768
  },
  "LLM": "EAS",
  "EASCfg": {
    "url": "http://xx.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
    "token": "xxxxxxx=="
  },

  "vector_store": "AnalyticDB",

  "ADBCfg": {
    "PG_HOST": "gp.xxxxx.rds.aliyuncs.com",
    "PG_USER": "xxxxxxx", 
    "PG_DATABASE": "xxxxxxx", 
    "PG_PASSWORD": "passwordxxxx"
  }
}
```
In the preceding sample file, EASCfg contains the endpoint and token of the LLM service. ADBCfg contains the configuration of the AnalyticDB for PostgreSQL instance. You can configure the parameters based on the instructions on the WebUI.
Elasticsearch
```
{
  "embedding": {
    "model_dir": "embedding_model/",
    "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
    "embedding_dimension": 768
  },
  "LLM": "EAS",
  "EASCfg": {
    "url": "http://xx.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
    "token": "xxxxxxx=="
  },

  "vector_store": "ElasticSearch",

  "ElasticSearchCfg": {
    "ES_URL": "http://es-cn-xxx.elasticsearch.aliyuncs.com:9200",
    "ES_USER": "elastic",
    "ES_PASSWORD": "password",
    "ES_INDEX": "test_index"
  }
}
```
In the preceding sample file, EASCfg contains the endpoint and token of the LLM service. ElasticSearchCfg contains the configuration of the Elasticsearch cluster. You can configure the parameters based on the instructions on the WebUI.
Faiss
```
{
  "embedding": {
    "model_dir": "embedding_model/",
    "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
    "embedding_dimension": 768
  },
  "LLM": "EAS",
  "EASCfg": {
    "url": "http://xx.vpc.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
    "token": "xxxxxxx=="
  },

  "vector_store": "FAISS",

  "FAISS": {
    "index_path": "faiss_index",
    "index_name": "faiss_file"
  }
}
```
In the preceding sample file, EASCfg contains the endpoint and token of the LLM service that you obtained in Step 2. index_path specifies the name of your index folder. index_name specifies the name of your database folder.
Click the Upload tab on the WebUI and configure the following parameters to upload your knowledge base files:
- Chunk Size: the size of each chunk that is generated by splitting the uploaded files. Default value: 200. Unit: bytes.
- Chunk Overlap: the portion of overlap between adjacent chunks. Default value: 0.
- Files: Upload a knowledge base file based on the on-screen instructions and then click Upload. You can upload multiple files. Supported file formats: TXT, DOCS, and PDF.
- Directory: Select a directory that contains the knowledge base file based on the on-screen instructions and then click Upload.
Click the Chat tab on the WebUI and start a conversation. You can configure the following parameters:
- Query method. Valid values:
  - Vector Store: The chatbot returns the top K relevant results from the vector database.
  - LLM: The chatbot uses only the LLM service to generate a response.
  - Vector Store+LLM: The chatbot combines the retrieved results from the vector database and the user input into a prompt, sends the prompt to the LLM service, and then returns the answer from the LLM service.
- Retrieval top K answers: the number of the most relevant results that are returned from the vector database. The default value is 3.
- Please choose the prompt template type: Select a prompt template based on the on-screen instructions.

Inference demo

This section uses Hologres as an example to demonstrate the inference performance. The operations for other types of vector databases are similar.

Configure the required parameters on the Settings tab as shown in the following figure and then check whether the connection is established as expected. For more information about the parameters, see the "WebUI configuration" section of this topic.
On the Upload tab, upload the knowledge base file based on the on-screen instructions and then click Upload.
After you upload the file, you can view the split data and generated vectors in Hologres. For more information, see Manage an internal table.
On the Chat tab, select a query method. Sample question and answer:
Vector Store
LLM
Vector Store + LLM

API operations

Obtain the invocation information of the LangChain chatbot.
1. Click the name of the LangChain chatbot service that you deployed in Step 3 to go to the Service Details page.
2. In the Basic Information section, click View Endpoint Information.
3. On the Public Endpoint tab, obtain the endpoint and token.

Call the service by using APIs. The following steps use Hologres as an example.

Start a conversation by using one of the following methods: chat/vectorstore, chat/llm, and chat/langchain.

cURL command

Method 1: chat/retrieval

curl -X 'POST' '<service_url>chat/vectorstore' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is Machine Learning Platform for AI?"}'

Method 2: chat/llm

curl -X 'POST' '<service_url>chat/llm' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is Machine Learning Platform for AI?"}'

Method 3: chat/rag

curl -X 'POST' '<service_url>chat/langchain' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is Machine Learning Platform for AI?"}'

Replace <service_url> with the service endpoint that you obtained in the previous step, and <service_token> with the service token that you obtained in the previous step.

Python script

import requests

EAS_URL = 'http://chatbot-langchain.xx.cn-beijing.pai-eas.aliyuncs.com'


def test_post_api_chat():
    url = EAS_URL + '/chat/vectorstore'
    # url = EAS_URL + '/chat/llm'
    # url = EAS_URL + '/chat/langchain'
    headers = {
        'accept': 'application/json',
        'Content-Type': 'application/json',
        'Authorization': 'xxxxx==',
    }
    data = {
        'question': 'What is Platform for AI?'
    }
    response = requests.post(url, headers=headers, json=data)

    if response.status_code != 200:
        raise ValueError(f'Error post to {url}, code: {response.status_code}')
    ans = response.json()
    return ans['response']
print(test_post_api_chat())

Set the EAS_URL parameter to the service endpoint that you obtained in the previous step, and the Authorization parameter to the service token that you obtained in the previous step.

References

For more information about EAS, see EAS overview.
You can also directly configure LangChain on the WebUI of an LLM service that you deployed in EAS. For more information, see Quickly deploy Tongyi Qianwen in EAS.

Background information

Prerequisites

Step 1: Prepare a vector database

Hologres

AnalyticDB for PostgreSQL

Elasticsearch

Faiss

Step 2: Use EAS to deploy an LLM for inference

Step 3: Use EAS to deploy a LangChain chatbot

Step 4: Use LangChain to connect the vector database and LLM

WebUI configuration

Hologres

AnalyticDB AnalyticDB for PostgreSQL

Elasticsearch

Faiss

Hologres

AnalyticDB

Elasticsearch

Faiss

Inference demo

Vector Store

LLM

Vector Store + LLM

API operations

cURL command

Python script

References