AI Search Open Platform provides a Retrieval-Augmented Generation (RAG)-based solution for conversational search based on knowledge bases. The solution consists of three modules: data preprocessing, data retrieval, and response generation. AI Search Open Platform provides the services available to each module as components to select, such as document parsing, re-ranking, and text generation. The platform allows you to use the services by calling APIs. To quickly build a RAG-based conversational search application, download the code to your machine and configure the information such as the API key, API endpoint, and local knowledge base.
How it works
RAG is an AI method that combines the retrieval and generation services and improves the relevance, accuracy, and diversity of the content generated by large language models (LLMs). During the generation process, RAG first retrieves the information most relevant to the input from a large amount of external data or a knowledge base. Then, it imports the retrieved information and the original input into an LLM as a prompt or context, to generate a more precise and informative output. In addition to internal parameters and training data, the LLM can reference the latest external data or domain-specific information to improve the accuracy of the output. 
Scenarios
Conversational search based on knowledge bases is applicable to multiple scenarios, such as retrieval and summary of private knowledge bases, or conversational search in industry verticals. AI Search Open Platform combines RAG and LLMs to understand and respond to complex queries in natural languages based on the domain-specific knowledge bases of enterprises. The RAG-based solution helps enterprises quickly retrieve the required information from PDF and WORD files, tables, and images by using natural languages.
Prerequisites
AI Search Open Platform is activated. For more information, see Activate AI Search Open Platform.
The service endpoint and API key are obtained. For more information, see Query service endpoint and Manage API key.
AI Search Open Platform allows you to call services over the Internet or a virtual private cloud (VPC) and also supports cross-region service calling through VPC.
An Alibaba Cloud Elasticsearch cluster of V8.5 or later is created. For more information, see Create an Alibaba Cloud Elasticsearch cluster. The IP address of your device is added to a public or private IP address whitelist of the Elasticsearch cluster that you want to access over the Internet or a VPC. For more information, see Configure a public or private IP address whitelist for an Elasticsearch cluster.
Python 3.7 or later is installed, and the aiohttp 3.8.6 and elasticsearch 8.14 Python dependencies are installed in the development environment.
Build a RAG-based development solution
To facilitate user access, AI Search Open Platform offers the following development frameworks:
SDK for Java.
SDK for Python.
If your business uses LangChain, select LangChain for development framework.
If your business uses LlamaIndex, select LlamaIndex for development framework.
Step 1: Select service and download code
Select the algorithm services and development framework to be used in the RAG-based solution based on the knowledge. In this example, SDK for Python is used.
Log on to the AI Search Open Platform console.
Select the Germany (Frankfurt) region. Switch to AI Search Open Platform and the target workspace.
In the left-side navigation pane, click Scene Center. Click Enter in the RAG Scene-Knowledge Base Online Q & A section.

Select the services that you want to use from the drop-down lists. Then, on the Service Details tab, view the service details.
NoteTo use an algorithm service in the RAG-based solution through API, specify the service ID by using the service_id. For example, the ID of the document content parsing service is ops-document-analyze-001.
After you select a service, the service_id parameter in the generated code is modified accordingly. After you download the code to the local environment, modify the service_id parameter in the code to call other services.
Service
Description
Document content parsing
Document content parsing service (ops-document-analyze-001) provides a general document parsing service that supports extracting logical hierarchical structures such as titles and segments, as well as text, tables, pictures, and other information from unstructured documents, and outputting them in a structured format.
Image content parsing
The image text recognition service (ops-image-analyze-ocr-001) allows you to use the OCR feature to recognize the text in an image and extract the text information for image retrieval and conversational search.
The image content recognition service (ops-image-analyze-vlm-001) allows you to parse the content of an image based on multi-modal LLMs. You can also use the service to parse the text in the image and use the parsed text for image retrieval and conversational search.
Document chunking
Common Document Slicing Service (ops-document-split-001): provides a general-purpose text slicing service. You can use this service to segment structured data in the HTML, MARKDOWN, and TXT formats based on paragraphs, semantics, and specific rules. You can also extract code, images, and tables from rich text.
Text embedding
OpenSearch text vectorization service -001 (ops-text-embedding-001): provides a text vectorization service that supports more than 40 languages. The maximum length of the input text can be 300 tokens, and the dimension of the generated vectors is 1,536.
OpenSearch Universal Text Vectorization Service -002 (ops-text-embedding-002): provides a text vectorization service that supports more than 100 languages. The maximum length of the input text can be 8,192 tokens, and the dimension of the generated vectors is 1,024.
OpenSearch text vectorization service-Chinese -001 (ops-text-embedding-zh-001): provides a text vectorization service for Chinese text. The maximum length of the input text can be 1,024 tokens, and the dimension of the generated vectors is 768.
OpenSearch text vectorization service-English -001 (ops-text-embedding-en-001): provides a text vectorization service for English text. The maximum length of the input text can be 512 tokens, and the dimension of the generated vectors is 768.
Sparse text embedding
Text sparse vectorization converts text data into sparse vectors that occupy less storage space. You can use sparse vectors to express keywords and the information about frequently used terms. You can perform a hybrid search by using sparse and dense vectors to improve the retrieval performance.
OpenSearch text sparse vectorization service-generic (ops-text-sparse-embedding-001): provides a text vectorization service that supports more than 100 languages. The maximum length of the input text can be 8,192 tokens.
Query analysis
Query Analysis Service 001 (ops-query-analyze-001): provides a general-purpose query analysis service based on LLMs to understand user intents and extend similar questions.
Search engine
Alibaba Cloud Elasticsearch is a fully managed and out-of-the-box cloud service developed based on open source Elasticsearch. Completely compatible with the open source features, Alibaba Cloud Elasticsearch supports the pay-as-you-go billing method and provides the out-of-the-box availability.
NoteIf you select Alibaba Cloud Elasticsearch as a search engine, the text sparse vectorization service is unavailable due to compatibility issues. In this case, we recommend that you use a text vectorization service.
OpenSearch Vector Search Edition is a large-scale distributed vector search engine developed by Alibaba Group. OpenSearch Vector Search Edition supports multiple vector search algorithms and performs well in high-accuracy retrieval. You can use OpenSearch Vector Search Edition for large-scale index building and retrieval in a cost-effective manner. The following features are supported: horizontal scaling and merging of indexes, pipeline creation of indexes, real-time query upon creation, and dynamic update of real-time data.
NoteTo use OpenSearch Vector Search Edition, replace the engine configurations and code in the solution.
Re-ranking
BGE rearrangement model (ops-bge-reranker-larger): provides a general-purpose document scoring service. You can use this service to score documents based on a query and the relevance of content, sort documents in descending order based on scores, and then return the scores.
LLM
OpenSearch-Qwen-Turbo: Uses qwen-turbo as the base model, with supervised model fine-tuning, enhanced retrieval augmentation, and reduced harmfulness.
Qwen-Turbo: The fastest and most cost-effective model in the Qwen series, suitable for simple tasks. For more information, see Qwen LLMs.
Qwen-Plus: Balanced in capabilities, with reasoning effectiveness, cost, and speed between Qwen-Max and Qwen-Turbo, suitable for moderately complex tasks. For more information, see Qwen LLMs.
Qwen-Max (qwen-max) is a Qwen ultra-large language model that supports hundreds of billions of parameters and multiple input languages such as Chinese and English. For more information, see Qwen LLMs.
After you select the services, click After the configuration is completed, enter the code query to view and download the code.
The code consists of two parts based on the offline document processing and online conversational search processes of the RAG-based solution.
Process
Description
Procedure
Offline document processing
This process consists of document parsing, image extraction, document segmentation, text vectorization, and writing the processing results to an Elasticsearch index.
Use the main function document_pipeline_execute to perform the following steps. Pass the document to be processed by using a document URL or Base64-encoded file.
Parse the document. For more information, see Document content parsing.
Call the asynchronous operation of document parsing to extract the content from the URL or decode the content from the Base64-encoded file.
Create a parsing task with the create_async_extraction_task function and poll the task completion status with the poll_task_result function.
Extract the image. For more information, see Image content extraction.
Call the asynchronous operation of image parsing to extract the image content from the URL or decode the image content from the Base64-encoded file.
Create an image parsing task with the create_image_analyze_task function and poll the task completion status with the get_image_analyze_task_status function.
Chunk the document. For more information, see Document chunking.
Call the document chunking operation to chunk the parsed document based on a specific policy.
Use the document_split function to chunk the document. This process includes text chunking and rich text parsing.
Text embedding. For more information, see Text embedding.
Call the text embedding operation to convert the chunked text into dense vectors.
Use the text_embedding function to generate the embedding vector of each chunk.
Write the processing results to an Elasticsearch index. For more information, see Use the kNN search feature of Elasticsearch.
Create an Elasticsearch index whose configurations include the embedding and content fields.
ImportantWhen you create an Elasticsearch index, the system will delete the index with the same name. To prevent the system from deleting an index incorrectly, change the index name in the code.
Call the helpers.async_bulk function to write the embedded results to the Elasticsearch index.
Online conversational search
This process consists of generating query vectors, performing query analysis, retrieving relevant document chunks, re-ranking search results, and generating answers based on search results.
Use the main function query_pipeline_execute to process a user query and generate an answer.
Perform text embedding on the query. For more information, see Text embedding.
Call the text vectorization operation to convert the query into a dense vector.
Use the text_embedding function to generate a query vector.
Call the query analysis service. For more information, see Query analysis.
Call the query analysis operation to identify user intents and generate similar questions by analyzing conversation history.
Retrieve embedded chunks. For more information, see Use the kNN search feature of Elasticsearch.
Use the Elasticsearch index to retrieve the embedded chunks that are similar to the query vector.
Use the search operation of AsyncElasticsearch and the k-nearest neighbor (kNN) search feature to retrieve results based on the vector similarity.
Re-rank the results. For more information, see Re-rank service.
Call the re-ranking operation to score the retrieved chunks and sort the results based on scores.
Call the documents_ranking function to score and re-rank the documents based on the query vector.
Call the text generation operation to generate an answer. For more information, see API details.
Use the text generation service and call the llm_call function to generate an answer based on the retrieved results and the user query.
Select Document processing flow and Online Q & A Process under Code Query. In the code editor, click Copy Code or Download File.
Step 2: Test the code in local environment
After you download the code files to your device, you must specify the parameters in the code. In this example, the online.py and offline.py files are downloaded. The online.py file is used for online conversational search and the offline.py file is used for offline document processing. The following table describes the parameters.
Section | Parameter | Description |
AI Search Open Platform | api_key | The API key. For more information about how to obtain the API key, see Manage API key. |
aisearch_endpoint | The API endpoint. For more information about how to obtain the API endpoint, see Query service endpoint. Note You must remove "http://". You can call API operations over the Internet or a VPC. | |
workspace_name | The workspace name in AI Search Open Platform. | |
service_id | The service ID. To facilitate code development, you can configure services and specify service IDs separately in the offline.py and online.py files by using the service_id_config parameter.
| |
Elasticsearch search engine | es_host | The endpoint of the Elasticsearch cluster. If you want to access the Elasticsearch cluster over the Internet or a VPC, add the IP address of your device to a public or private IP address whitelist of the Elasticsearch cluster. For more information, see Configure a public or private IP address whitelist for an Elasticsearch cluster. |
es_auth | The username and password that are used to access the Elasticsearch cluster. The username is elastic and the password is the password that you set when you created the Elasticsearch cluster. If you forget the password, you can reset it. For more information, see Reset the access password for an Elasticsearch cluster. | |
Other parameters | You do not need to modify other parameters if you use the sample code. | |
After you specify the parameters, separately run the code in the offline.py and online.py files in Python 3.7 or later to check whether the generated answer is correct.
For example, the document What is AI Search Open Platform? is used as the knowledge base. The following question is asked based on the document: What features does AI Search Open Platform provide?
The following figures show the results that are returned.
Offline document processing results

Online conversational search results

Source code files
