This topic describes how to deploy an enterprise conversational chatbot using Hologres, Platform for AI, and Llama2.
Overview
An enterprise conversational chatbot requires the following components or services:
LangChain: An open-source framework for creating private Q&A knowledge bases. It can integrate LLMs, vector databases (like Hologres), and custom corpora, simplifying AI application development. See Hologres in the LangChain documentation for details.
PAI EAS: A scalable, serverless platform for deploying AI models as online inference services or AI Web applications, EAS is ideal for real-time and near real-time inference.
Hologres: A real-time data warehouse engine that, through deep integration with Proxima (Alibaba DAMO Academy's vector computing library), provides efficient vector computing for retrieving data from corpora. This capability is essential for fine-tuning LLMs. For more information, see Vector processing based on Proxima.
Llama 2: A next-generation open-source LLM available in various parameter sizes (7B, 13B, 70B), Llama 2 can be deployed using PAI.
Hologres offers a powerful tool that facilitates the integration of these components (Hologres, PAI, Llama 2, custom corpus, LangChain) for rapid enterprise conversational chatbot development. In this architecture, Hologres serves as the real-time vector storage and search engine.
Prerequisites
Create a Hologres instance.
NoteWe recommend using a Hologres instance with 8 CUs. This instance can process millions of vectors. If you have a larger volume of vector data, use an instance with higher specifications.
Activate PAI and create a workspace. For more information, see Activate PAI and create a default workspace.
Prepare a basic environment on an ECS instance or your local machine by completing the following steps:
Install Anaconda. For instructions, see Anaconda.
Install Python 3.8 or later:
conda create --name chatbot python=3.8 conda activate chatbotInstall the dependencies for the knowledge base:
pip install langchain modelscope psycopg2-binary sentence_transformers bottle requestsClone the code and sample data for this topic. The required tool is in the
holo-llmfolder.git clone https://github.com/aliyun/alibabacloud-hologres-connectors.git
Procedure
Deploy the Llama 2 model.
You can use PAI-EAS to quickly deploy Llama 2. For more information, see Deploying Large Language Models Deploy a Llama model as a web application in EAS.
NoteYou can also follow the steps in this topic to use other large models to build a dedicated Q&A knowledge base. For information about how to deploy other large models, see LLM.
After you deploy the large model, view and record the invocation information (endpoint and token) of the service on the Service Details page in the EAS console.
Create a config file.
To build the knowledge base with the tool from this topic, set configuration items in the
configfile. These items include connection information for each resource and embedding model details. Run the following command to open the file:cd alibabacloud-hologres-connectors/holo-llm vim config/config.jsonThe following table describes the configuration options. The
embedding,query_topk, andprompt_templateoptions affect the fine-tuning result of the large model. Modify their values with caution.Configuration item
Description
eas_config
The invocation information for the AI service recorded in step 1. This includes:
url: The endpoint of the Llama 2 model.token: The token that corresponds to the invocation endpoint of the Llama 2 model.
holo_config
The connection information for the Hologres instance. This includes:
HOLO_ENDPOINT: The endpoint of the Hologres instance.Go to the Instance Details page in the Hologres console to obtain it.
HOLO_PORT: The port of the Hologres instance.Go to the Instance Details page in the Hologres console to obtain the port.
HOLO_DATABASE: The name of the database in the Hologres instance.HOLO_USER: The AccessKey ID of your Alibaba Cloud account.Go to the AccessKey Management page to obtain your AccessKey ID.
HOLO_PASSWORD: The AccessKey secret of your Alibaba Cloud account.Go to the AccessKey Management page to obtain your AccessKey secret.
embedding
The information about the embedding model used to vectorize data. This includes:
model_id: The path of the embedding model.This example uses the open source CoRom embedding model from DAMO Academy on ModelScope. The path is:
damo/nlp_corom_sentence-embedding_english-base.model_dimension: The vector dimension.The embedding model used in this topic generates 768-dimension vector data. Therefore, set the
model_dimensionvalue to768. For more information, see coROM.
query_topk
The number of records returned by a vector search.
In this topic, this option is set to
4. You can configure this option based on factors such as the maximum number of characters allowed by the LLM and the fine-tuning effect.prompt_template
The prompt template used for LLM fine-tuning.
The config file already contains a default template; you do not need to modify it.
Process data.
Vectorize your dataset and import it into Hologres. To do this, use the open source tool from this topic by running the following command:
# Before you import the corpus vector data, clear the historical data from the database. python main.py -l --clear # Vectorize the corpus data in the holo-llm/data/example.csv file and import it into Hologres. python main.py -lImportantThe first time you run the command, it can take some time to download the embedding model (about 400 MB). This download is automatic and not required for subsequent runs.
This topic uses a sample professional corpus for Hologres. An example is shown below:
title
content
What is Hologres
Hologres is a one-stop real-time data warehouse independently developed by Alibaba. It supports real-time data write, update, processing and analysis of massive data.
What is proxima
Proxima is a high-performance software library for vector nearest neighbor search from Alibaba's DAMO Academy. Compared with similar open-source products such as Faiss, the Proxima is better in stability and performance.
What is the principle of JSONB column storage in Hologres
Hologres supports column-oriented storage for JSONB type starting from V1.3, reducing the storage size of JSONB data and accelerate query. This document will introduce you to the use of columnar JSONB in the Hologres.
You can query the vector data in the
langchain_embeddingtable of your Hologres instance's database:SELECT * FROM langchain_embedding limit 1; -- Sample result id|embedding|metadata|document 2419815e-1407-11ee-bee5-acde48001122|{0.395261,0.123794,0.761932,0.413286,...}|{"source": "data/example_data.csv", "row": 1}|title: ... content: ...Verify the fine-tuning result.
Invoke the native Llama 2 model to perform a Q&A task:
python main.py -n # Sample conversation Please enter a Question: What is hologres? PAI-LLM answer: Hologres is a term used in the context of digital holography, which refers to the holographic image produced by a digital holographic camera. The term "hologres" is derived from the Greek words "holos" meaning "whole" and "graphein" meaning "to record". It refers to the complete or entire holographic image that is recorded by the camera, rather than just a portion of it. Hologres can be used to create three-dimensional images that appear life-like and can be viewed from different angles, providing a more immersive and realistic viewing experience.Please let me know if you need anything elseInvoke the fine-tuned Llama 2 model to perform a Q&A task:
python main.py # Sample conversation Please enter a Question: What is hologres? PAI-LLM + Hologres answer: Hologres is a one-stop real-time data warehouse independently developed by Alibaba. It supports real-time data write, update, processing and analysis of massive data.
You have now completed the basic steps to deploy a private conversational chatbot using Hologres, PAI, and Llama 2.
Next, you can integrate chatbot into your business scenarios, such as connecting it to a DingTalk group. For more information, see Build a free chatbot with Hologres and LLMs.