Elastic Algorithm Service (EAS) provides simplified deployment methods for different scenarios. You can configure parameters to deploy a Retrieval-Augmented Generation (RAG)-based large language model (LLM) chatbot. This significantly shortens the service deployment time. When you use the chatbot to perform model inference, the chatbot effectively retrieves relevant information from the knowledge base and combines the retrieved information with answers from LLM applications to provide accurate and informative answers. This significantly improves the quality of Q&A and overall performance. The chatbot is applicable to Q&A, summarization, and other natural language processing (NLP) tasks that rely on specific knowledge bases. This topic describes how to deploy a RAG-based LLM chatbot and how to perform model inference.
Background information
LLM applications have limits in generating accurate and real-time responses. Therefore, LLM applications are not applicable to scenarios that require precise information, such as the customer service or Q&A scenario. To resolve these issues, the RAG technique is used to enhance the performance of LLM applications. This significantly improves the quality of Q&A, summarization, and other NLP tasks that rely on specific knowledge bases.
RAG improves the answer accuracy and increases the amount of information about answers by combining LLM applications such as Qwen with information retrieval components. When a query is initiated, RAG uses an information retrieval component to find documents or information fragments related to the query in the knowledge base, and integrates these retrieved contents with the original query into an LLM application. The LLM application uses its induction and generation capabilities to generate factual answers based on the latest information. You do not need to retrain the LLM application.
The chatbot that is deployed in EAS integrates LLM applications with RAG to overcome the limits of LLM applications in terms of accuracy and timeliness. This chatbot provides accurate and informative answers in various Q&A scenarios and helps improve the overall performance and user experience of NLP tasks.
Prerequisites
A virtual private cloud (VPC), vSwitch, and security group are created. For more information, see Create and manage a VPC and Create a security group.
NoteIf you use Facebook AI Similarity Search (Faiss) to build a vector database, the preceding prerequisites are not required.
An Object Storage Service (OSS) bucket or Apsara File Storage NAS (NAS) file system is created to store fine-tuned model files. This prerequisite must be met if you use a fine-tuned model to deploy the chatbot. For more information, see Get started by using the OSS console or Create a file system.
NoteIf you use Faiss to build a vector database, you must prepare an OSS bucket.
Limits
The vector database and EAS must be deployed in the same region.
Precautions
This practice is subject to the maximum number of tokens of an LLM service and is designed to help you understand the basic retrieval feature of a RAG-based LLM chatbot.
The chatbot is limited by the server resource size of the LLM service and the default number of tokens. The conversation length supported by the chatbot is also limited.
If you do not need to perform multiple rounds of conversations, we recommended that you disable the with chat history feature of the chatbot. This effectively reduces the possibility of reaching the limit. For more information, see the How do I disable the with chat history feature of the RAG-based chatbot? section of this topic.
Step 1: Prepare a vector database
You can use one of the following services to build a vector database: Faiss, Elasticsearch, Hologres, OpenSearch, and ApsaraDB RDS for PostgreSQL. When you build a vector database, save the required parameter configurations that are used to connect to the vector database in subsequent operations.
Faiss
Faiss streamlines the process of building an on-premises vector database. You do not need to purchase or activate the service.
Elasticsearch
Create an Elasticsearch cluster. For more information, see Create an Alibaba Cloud Elasticsearch cluster.
Take note of the following items:
Set the Instance Type parameter to Standard Edition.
Copy the values of the Username and Password parameters and save them to your on-premises machine.
Find the cluster that you created and click its name to go to the Basic Information page. Copy the values of the Internal Endpoint and Internal Port parameters and save them to your on-premises machine.
Hologres
Purchase a Hologres instance and create a database. For more information, see Purchase a Hologres instance and Create a database. You must save the name of the database to your on-premises machine.
View the invocation information in the Hologres console.
Find the instance that you created and click its name to go to the instance details page.
In the Network Information section, find Select VPC, click Copy in the Endpoint column, and then save the content before
:80
in the endpoint to your on-premises machine.
In the left-side navigation pane, click Account Management. On the User Management page, create a custom account. Save the account and password to your on-premises machine. This information is used to connect to the Hologres instance in subsequent operations. For information about how to create a custom account, see the Create a custom account section of the "Manage users" topic.
Select Examples of the Super Administrator (SuperUser) for the Select Member Role parameter.
OpenSearch
Create an OpenSearch Vector Search Edition instance. For more information, see Purchase an OpenSearch Vector Search Edition instance.
After the instance is created, the status becomes Pending Configuration.
On the Instances page, view the Instance ID of the OpenSearch Vector Search Edition instance and save the ID to your local computer.
Configure index table.
On the Instances page, click Configure in the Actions column.
On the Table Management page that appears, take the following steps to configure ta table: Basic Table Information > Data Synchronization > Field Configuration > Index Schema. After the index is built, you can perform search. For more information, see Getting started for common scenarios.
You can save the following sample as a JSON file. In the Field Configuration step, click Import Field Index Schema and import the JSON file. Then, fields and index structure are configured based on the imported file.
{ "schema": { "summarys": { "parameter": { "file_compressor": "zstd" }, "summary_fields": [ "id", "embedding", "file_path", "file_name", "file_type", "node_content", "node_type", "doc_id", "text", "source_type" ] }, "file_compress": [ { "name": "file_compressor", "type": "zstd" }, { "name": "no_compressor", "type": "" } ], "indexs": [ { "index_fields": [ { "boost": 1, "field_name": "id" }, { "boost": 1, "field_name": "embedding" } ], "indexer": "aitheta2_indexer", "index_name": "embedding", "parameters": { "enable_rt_build": "true", "min_scan_doc_cnt": "20000", "vector_index_type": "Qc", "major_order": "col", "builder_name": "QcBuilder", "distance_type": "SquaredEuclidean", "embedding_delimiter": ",", "enable_recall_report": "true", "ignore_invalid_doc": "true", "is_embedding_saved": "false", "linear_build_threshold": "5000", "dimension": "1536", "rt_index_params": "{\"proxima.oswg.streamer.segment_size\":2048}", "search_index_params": "{\"proxima.qc.searcher.scan_ratio\":0.01}", "searcher_name": "QcSearcher", "build_index_params": "{\"proxima.qc.builder.quantizer_class\":\"Int8QuantizerConverter\",\"proxima.qc.builder.quantize_by_centroid\":true,\"proxima.qc.builder.optimizer_class\":\"BruteForceBuilder\",\"proxima.qc.builder.thread_count\":10,\"proxima.qc.builder.optimizer_params\":{\"proxima.linear.builder.column_major_order\":true},\"proxima.qc.builder.store_original_features\":false,\"proxima.qc.builder.train_sample_count\":3000000,\"proxima.qc.builder.train_sample_ratio\":0.5}" }, "index_type": "CUSTOMIZED" }, { "has_primary_key_attribute": true, "index_fields": "id", "is_primary_key_sorted": false, "index_name": "id", "index_type": "PRIMARYKEY64" }, { "index_fields": "file_path", "index_name": "file_path", "index_type": "STRING" }, { "index_fields": "file_name", "index_name": "file_name", "index_type": "STRING" }, { "index_fields": "file_type", "index_name": "file_type", "index_type": "STRING" }, { "index_fields": "node_content", "index_name": "node_content", "index_type": "STRING" }, { "index_fields": "node_type", "index_name": "node_type", "index_type": "STRING" }, { "index_fields": "doc_id", "index_name": "doc_id", "index_type": "STRING" }, { "index_fields": "text", "index_name": "text", "index_type": "STRING" }, { "index_fields": "source_type", "index_name": "source_type", "index_type": "STRING" } ], "attributes": [ { "file_compress": "no_compressor", "field_name": "id" }, { "file_compress": "no_compressor", "field_name": "embedding" }, { "file_compress": "no_compressor", "field_name": "file_path" }, { "file_compress": "no_compressor", "field_name": "file_name" }, { "file_compress": "no_compressor", "field_name": "file_type" }, { "file_compress": "no_compressor", "field_name": "node_content" }, { "file_compress": "no_compressor", "field_name": "node_type" }, { "file_compress": "no_compressor", "field_name": "doc_id" }, { "file_compress": "no_compressor", "field_name": "text" }, { "file_compress": "no_compressor", "field_name": "source_type" } ], "fields": [ { "compress_type": "uniq", "field_type": "STRING", "field_name": "id" }, { "user_defined_param": { "multi_value_sep": "," }, "multi_value": true, "compress_type": "uniq", "field_type": "FLOAT", "field_name": "embedding" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "file_path" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "file_name" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "file_type" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "node_content" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "node_type" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "doc_id" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "text" }, { "compress_type": "uniq", "field_type": "STRING", "field_name": "source_type" } ], "table_name": "abc" }, "extend": { "description": [], "vector": [ "embedding" ], "embeding": [] } }
Configure public access for the virtual private cloud (VPC) that is associated with the instance. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
Configure public access for the OpenSearch Vector Search Edition instance.
NoteEAS cannot access OpenSearch instances by using private endpoints. If you need to configure private access, contact your account manager.
View the associated elastic IP address (EIP).
Go to the Resource Management page of the VPC. For more information, see View a VPC.
Click the number under Internet NAT Gateway to go to the Internet NAT Gateway page.
Click the ID of the Internet NAT gateway instance.
On the Associated EIP tab, view the EIP address and save the address to your local computer.
On the Instances page of the OpenSearch console, click the name of the OpenSearch Vector Search Edition instance to go to the Instance Information page.
In the Network Information section, enable Public Access. On the Modify Public Access Whitelist pannel, add the associated EIP address to the whitelist.
In the Network Information section, save Public Endpoint to your local computer.
View username and password
In the API Endpoint section, view the Username and Password that you specified when creating the OpenSearch Vector Search Edition instance.
RDS for PostgreSQL
Create an account and a database for the instance. For more information, see Create a database and an account.
When you create the account, select Privileged Account for Account Type.
When you create the database, specify the created privileged account for Authorized By.
Configure database connection.
Go to the Instances page. In the top navigation bar, select the region in which the RDS instance resides. Then, find the RDS instance and click the ID of the instance.
In the left-side navigation pane, click Database Connection.
On the Database Connection page, view the endpoints and port.
Internal Endpoint: If the RAG application and the database belongs to the same VPC, you can use internal connection.
Public Endpoint: You need to apply for a public endpoint and add 0.0.0.0/0 to the whitelist. For more information, see Apply for or release a public endpoint.
Step 2: Deploy the RAG-based chatbot
Go to the EAS page.
Log on to the Platform for AI (PAI) console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, find the workspace to which you want to deploy the model and click its name to go to the Workspace Details page.
In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS) to go to the Elastic Algorithm Service (EAS) page.
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click RAG-based Smart Dialogue Deployment.
On the RAG-based LLM Chatbot Deployment page, configure the parameters. The following tables describe the key parameters in different sections.
Basic Information
Parameter
Description
Service Name
The name of the service.
Model Source
The source of the model. Valid values: Open Source Model and Custom Fine-tuned Model.
Model Type
The model type. Select a model type based on your business requirements.
If you set the Model Source parameter to Custom Fine-tuned Model, you must configure the parameter quantity and precision for the model type.
Model Settings
If you set the Model Source parameter to Custom Fine-tuned Model, you must configure the path in which the fine-tuned model file is stored. Valid values:
NoteMake sure that the model file format is compatible with Hugging Face transformers.
Mount OSS: the OSS path in which the fine-tuned model file is stored.
Mount NAS: the NAS file system in which the fine-tuned model file is stored and the source path of the NAS file system.
Resource Configuration
Parameter
Description
Resource Configuration
If you set the Model Source parameter to Open Source Model, the system automatically selects an instance type based on the selected model type as the default value.
If you set the Model Source parameter to Custom Fine-tuned Model, select an instance type that matches the model. For more information, see the How do I switch to another open source LLM? section of the "Quickly deploy LLMs in EAS" topic.
Inference Acceleration
Inference acceleration can be enabled for the Qwen, Llama2, ChatGLM, or Baichuan2 model that is deployed on A10 or GU30 instances. The following inference acceleration methods are provided:
BladeLLM Inference Acceleration: The BladeLLM inference acceleration engine ensures high concurrency and low latency. You can use BladeLLM to accelerate LLM inference in a cost-effective manner.
Open-source vLLM Inference Acceleration
Vector Database Settings
Select a service to build a vector database based on your business requirements.
FAISS
Parameter
Description
Vector Database Type
The service that you want to use to build the vector database. Select FAISS.
OSS Path
The OSS path of the vector database. Select an OSS path in the current region. You can create an OSS path if no OSS path is available. For more information, see Get started by using the OSS console.
Elasticsearch
Parameter
Description
Vector Database Type
The service that you want to use to build the vector database. Select Elasticsearch.
Private Endpoint and Port
The private endpoint and port number that you obtained in Step 1. Format:
http://Private endpoint:Port number
.Index Name
The name of the index. You can enter a new index name or an existing index name. If you use an existing index name, the index schema must meet the requirements of the RAG-based chatbot. For example, you can enter the name of the index that is automatically created when you deploy the RAG-based chatbot by using EAS.
Account
The username that you configured when you created the Elasticsearch cluster in Step 1.
Password
The password that you configured when you created the Elasticsearch cluster in Step 1.
Hologres
Parameter
Description
Vector Database Type
The service that you want to use to build the vector database. Select Hologres.
Invocation Information
The Hologres invocation information that you obtained in Step 1.
Database Name
The name of the database that you created in Step 1.
Account
The custom account that you created in Step 1.
Password
The password of the custom account that you created in Step 1.
Table name
The name of the table. You can enter a new table name or an existing table name. If you use an existing table name, the table schema must meet the requirements of the RAG-based chatbot. For example, you can enter the name of the Hologres table that is automatically created when you deploy the RAG-based chatbot by using EAS.
OpenSearch
Parameter
Description
Endpoint
The public endpoint you obtained in Step 1.
Instance ID
The ID of the OpenSearch Vector Search Edition instance you created in Step 1.
Username
The username and password you specified when creating the OpenSearch Vector Search Edition instance in Step 1.
Password
Table Name
The name of the index table you configured for the OpenSearch Vector Search Edition instance in Step 1.
RDS for PostgreSQL
Parameter
Description
Host Address
The internal endpoint or public endpoint you obtained in Step 1.
Port
The actual port number. Default value: 5432.
Database
The name of the database you created in Step 1.
Table Name
Specify a table name.
Account
The priviledged account you created in Step 1.
Password
The password of the priviledged account you created in Step 1.
VPC Configuration
Parameter
Description
VPC
If you use Hologres, Elasticsearch, OpenSearch, or RDS PostgreSQL to build a vector database, select the VPC in which the vector database is deployed.
NoteIf you use OpenSearch to build a vector database, you can select a different VPC. However, you need to make sure that the VPC can be accessed over the Internet and the associated EIP is added to the public access whitelist of the OpenSearch instance. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet and Configure the public access whitelist.
If you use Faiss to build a vector database, you do not need to configure the VPC.
vSwitch
Security Group Name
Click Deploy.
If the value in the Service Status column changes to Running, the RAG-based chatbot is deployed.
Step 3: Perform model inference on the web UI
This step describes how to debug the RAG-based chatbot on the web UI. After you test the Q&A performance of the RAG-based chatbot on the web UI, you can call API operations provided by PAI to apply the RAG-based chatbot to your business system. For more information, see the Step 4: Call API operations to perform model inference section of this topic.
1. Configure the RAG-based chatbot
After the RAG-based chatbot is deployed, click View Web App in the Service Type column to enter the web UI.
Configure the machine learning model.
Embedding Model Name: Four models are available. By default, the optimal model is selected.
Embedding Dimension: After you configure the Embedding Model Name parameter, the system automatically configures this parameter.
Check whether the vector database is connected.
The system automatically recognizes and applies the vector database settings that are configured when you deploy the chatbot. The settings cannot be modified. If you use Hologres to build the vector database, click Connect Hologres to check whether the vector database in Hologres is connected.
2. Upload business data files
On the Upload tab, upload the specified business data files. You can upload files in the following formats: TXT, PDF, XLSX, XLS, CSV, DOCX, DOC, Markdown, and HTML.
Configure semantic-based chunking parameters.
Configure the following parameters to control the granularity of document chunking and enable automatic Q&A information extraction.
Parameter
Description
Chunk Size
The size of each chunk. Unit: bytes. Default value: 500.
Chunk Overlap
The portion of overlap between adjacent chunks. Default value: 10.
Process with QA Extraction Model
Specifies whether to extract Q&A information. If you select Yes, the system automatically extracts questions and corresponding answers in pairs after business data files are uploaded. This way, more accurate answers are returned in data queries.
On the Files tab, upload one or more business data files. You can also upload a directory that contains the business data files on the Directory tab. For example, you can upload the rag_chatbot_test_doc.txt file.
Click Upload. The system performs data cleansing and semantic-based chunking on the business data files before uploading the business data files. Data cleansing includes text extraction and hyperlink replacement.
3. Configure model inference parameters
Configure Q&A policies for retrieval-based queries
On the Chat tab, configure Q&A policies for retrieval-based queries.
Parameter | Description |
Streaming Output | Specifies whether to return results in streaming mode. If you select Streaming Output, the results are returned in streaming mode. |
Top K | The number of the most relevant results that are returned from the vector database. |
Re-Rank Model | Most vector databases compromise data accuracy to provide high computing efficiency. As a result, the top K results that are returned from the vector database may not be the most relevant. In this case, you can use the open source model BAAI/bge-reranker-base, BAAI/bge-reranker-large, or llm-reranker based on your business requirements to perform a higher-precision re-rank operation on the top K results that are returned from the vector database to obtain more relevant and accurate knowledge files. Note If you use a model for the first time, you may need to wait for a period of time before the model is loaded. |
Keyword Model | The retrieval method. Valid values:
Note In most complex scenarios, vector database-based retrieval delivers good performance. However, in some vertical fields in which corpora are scarce or in scenarios in which accurate matching is required, vector database-based retrieval may not achieve the same effect as the traditional retrieval based on sparse and dense vectors. Retrieval based on sparse and dense vectors is simpler and more efficient by calculating the keyword overlap between user queries and knowledge files. PAI provides keyword-based retrieval algorithms, such as BM25, to perform retrieval based on sparse and dense vectors. Vector database-based retrieval and keyword-based retrieval have their own advantages and disadvantages. Combining the results of the two types of retrieval methods can improve the overall accuracy and efficiency. The reciprocal rank fusion (RRF) algorithm calculates the weighted sum value of ranks by which a file is sorted in different retrieval methods to obtain a total score. If you select Hybrid for the Keyword Model parameter, multimodal retrieval is used. In this case, PAI uses the RRF algorithm by default to combine results returned from the vector database-based retrieval and keyword-based retrieval. |
Configure Q&A policies for RAG-based queries
On the Chat tab, configure Q&A policies for RAG-based queries.
PAI provides various prompt policies. You can select a predefined prompt template or specify a custom prompt template for better inference results.
In addition, you can configure the following parameters in RAG (Retrieval + LLM) query mode: Streaming Output, Re-Rank Model, and Keyword Model. For more information, see the Configure Q&A policies for retrieval-based queries section of this topic.
4. Perform model inference
Retrieval
The chatbot returns the top K relevant results from the vector database.
LLM
The chatbot uses only the LLM application to generate an answer.
RAG (retrieval + LLM)
The chatbot enters the results returned from the vector database and the query into the selected prompt template and sends the template to the LLM application to provide an answer.
Step 4: Call API operations to perform model inference
Obtain the invocation information of the RAG-based chatbot.
Click the name of the RAG-based chatbot to go to the Service Details page.
In the Basic Information section, click View Endpoint Information.
On the Public Endpoint tab, obtain the service endpoint and token.
Connect to the vector database on the web UI and upload business data files. For more information, see the 1. Configure the RAG-based chatbot and 2. Upload business data files sections of this topic.
Call API operations to perform model inference.
PAI allows you to call the RAG-based chatbot by using the following API operations in different query modes:
service/query/retrieval
in retrieval mode,service/query/llm
in LLM mode, andservice/query
in RAG mode. Sample code:cURL command
Initiate a single-round conversational search request
Method 1: Call the
service/query/retrieval
operation.curl -X 'POST' '<service_url>service/query/retrieval' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is PAI?"}' # Replace <service_url> and <service_token> with the service endpoint and service token that you obtained in Step 1.
Method 2: Call the
/service/query/llm
operation.curl -X 'POST' '<service_url>service/query/llm' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is PAI?"}' # Replace <service_url> and <service_token> with the service endpoint and service token that you obtained in Step 1.
You can add other adjustable inference parameters such as
{"question":"What is PAI?", "temperature": 0.9}
.Method 3: Call the
service/query
operation.curl -X 'POST' '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is PAI?"}' # Replace <service_url> and <service_token> with the service endpoint and service token that you obtained in Step 1.
You can add other adjustable inference parameters such as
{"question":"What is PAI?", "temperature": 0.9}
.
Initiate a multi-round conversational search request
You can initiate a multi-round conversational search request only in RAG and LLM query modes. The following sample code shows an example on how to initiate a multi-round conversational search request in RAG query mode:
# Send the request. curl -X 'POST' '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is PAI?"}' # Provide the session ID returned for the request. This ID uniquely identifies a conversation in the conversation history. After the session ID is provided, the corresponding conversation is stored and is automatically included in subsequent requests to call an LLM. curl -X 'POST' '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What are the benefits of PAI?","session_id": "ed7a80e2e20442eab****"}' # Provide the chat_history parameter, which contains the conversation history between you and the chatbot. The parameter value is a list in which each element indicates a single round of conversation in the {"user":"Inputs","bot":"Outputs"} format. Multiple conversations are sorted in chronological order. curl -X 'POST' '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question":"What are the features of PAI?", "chat_history": [{"user":"What is PAI", "bot":"PAI is an AI platform provided by Alibaba Cloud..."}]}' # If you provide both the session_id and chat_history parameters, the conversation history is appended to the conversation that corresponds to the specified session ID. curl -X 'POST' '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question":"What are the features of PAI?", "chat_history": [{"user":"What is PAI", "bot":"PAI is an AI platform provided by Alibaba Cloud..."}], "session_id": "1702ffxxad3xxx6fxxx97daf7c"}'
Python script
The following sample code shows an example on how to initiate a single-round conversational search request:
import requests EAS_URL = 'http://xxxx.****.cn-beijing.pai-eas.aliyuncs.com' headers = { 'accept': 'application/json', 'Content-Type': 'application/json', 'Authorization': 'MDA5NmJkNzkyMGM1Zj****YzM4M2YwMDUzZTdiZmI5YzljYjZmNA==', } def test_post_api_query_llm(): url = EAS_URL + '/service/query/llm' data = { "question":"What is PAI?" } response = requests.post(url, headers=headers, json=data) if response.status_code != 200: raise ValueError(f'Error post to {url}, code: {response.status_code}') ans = dict(response.json()) print(f"======= Question =======\n {data['question']}") print(f"======= Answer =======\n {ans['answer']} \n\n") def test_post_api_query_retrieval(): url = EAS_URL + '/service/query/retrieval' data = { "question":"What is PAI?" } response = requests.post(url, headers=headers, json=data) if response.status_code != 200: raise ValueError(f'Error post to {url}, code: {response.status_code}') ans = dict(response.json()) print(f"======= Question =======\n {data['question']}") print(f"======= Answer =======\n {ans['docs']}\n\n") def test_post_api_query_rag(): url = EAS_URL + '/service/query' data = { "question":"What is PAI?" } response = requests.post(url, headers=headers, json=data) if response.status_code != 200: raise ValueError(f'Error post to {url}, code: {response.status_code}') ans = dict(response.json()) print(f"======= Question =======\n {data['question']}") print(f"======= Answer =======\n {ans['answer']}") print(f"======= Retrieved Docs =======\n {ans['docs']}\n\n") # LLM test_post_api_query_llm() # Retrieval test_post_api_query_retrieval() # RAG (Retrieval + LLM) test_post_api_query_rag()
Set the EAS_URL parameter to the endpoint of the RAG-based chatbot. Make sure to remove the forward slash (
/
) at the end of the endpoint. Set the Authorization parameter to the token of the RAG-based chatbot.Initiate a multi-round conversational search request
You can initiate a multi-round conversational search request only in RAG (Retrieval + LLM) and LLM query modes. Sample code:
import requests EAS_URL = 'http://xxxx.****.cn-beijing.pai-eas.aliyuncs.com' headers = { 'accept': 'application/json', 'Content-Type': 'application/json', 'Authorization': 'MDA5NmJkN****jNlMDgzYzM4M2YwMDUzZTdiZmI5YzljYjZmNA==', } def test_post_api_query_llm_with_chat_history(): url = EAS_URL + '/service/query/llm' # Round 1 query data = { "question":"What is PAI?" } response = requests.post(url, headers=headers, json=data) if response.status_code != 200: raise ValueError(f'Error post to {url}, code: {response.status_code}') ans = dict(response.json()) print(f"=======Round 1: Question =======\n {data['question']}") print(f"=======Round 1: Answer =======\n {ans['answer']} session_id: {ans['session_id']} \n") # Round 2 query data_2 = { "question": "What are the benefits of PAI?", "session_id": ans['session_id'] } response_2 = requests.post(url, headers=headers, json=data_2) if response.status_code != 200: raise ValueError(f'Error post to {url}, code: {response.status_code}') ans_2 = dict(response_2.json()) print(f"=======Round 2: Question =======\n {data_2['question']}") print(f"=======Round 2: Answer =======\n {ans_2['answer']} session_id: {ans_2['session_id']} \n\n") def test_post_api_query_rag_with_chat_history(): url = EAS_URL + '/service/query' # Round 1 query data = { "question":"What is PAI?" } response = requests.post(url, headers=headers, json=data) if response.status_code != 200: raise ValueError(f'Error post to {url}, code: {response.status_code}') ans = dict(response.json()) print(f"=======Round 1: Question =======\n {data['question']}") print(f"=======Round 1: Answer =======\n {ans['answer']} session_id: {ans['session_id']}") print(f"=======Round 1: Retrieved Docs =======\n {ans['docs']}\n") # Round 2 query data = { "question":"What are the features of PAI?", "session_id": ans['session_id'] } response = requests.post(url, headers=headers, json=data) if response.status_code != 200: raise ValueError(f'Error post to {url}, code: {response.status_code}') ans = dict(response.json()) print(f"=======Round 2: Question =======\n {data['question']}") print(f"=======Round 2: Answer =======\n {ans['answer']} session_id: {ans['session_id']}") print(f"=======Round 2: Retrieved Docs =======\n {ans['docs']}") # LLM test_post_api_query_llm_with_chat_history() # RAG (Retrieval + LLM) test_post_api_query_rag_with_chat_history()
Set the EAS_URL parameter to the endpoint of the RAG-based chatbot. Make sure to remove the forward slash (
/
) at the end of the endpoint. Set the Authorization parameter to the token of the RAG-based chatbot.
References
You can also use EAS to deploy the following items:
You can deploy an LLM application that can be called by using the web UI or API operations. After the LLM application is deployed, use the LangChain framework to integrate enterprise knowledge bases into the LLM application to implement intelligent Q&A and automation features. For more information, see Quickly deploy LLMs in EAS.
You can deploy an AI video generation model service by using ComfyUI and Stable Video Diffusion models. This helps you complete tasks such as short video generation and animation on social media platforms. For more information, see Use ComfyUI to deploy an AI video generation model service.
FAQ
How do I disable the with chat history feature of the RAG-based chatbot?
On the web UI page of the RAG-based chatbot, do not select Chat history.