Deploy a retrieval-augmented generation (RAG) chatbot that combines large language models (LLMs) with a vector database. Compute Nest provisions all required resources as a single service instance, including an Elastic Compute Service (ECS) instance, an AnalyticDB for PostgreSQL instance, and a Platform for AI (PAI) Elastic Algorithm Service (EAS) endpoint.
How it works
The chatbot consists of three components that Compute Nest deploys together:
| Component | Role |
|---|---|
| ECS instance | Hosts the LangChain application that provides the web UI and API. Handles document upload, chunking, and embedding. |
| AnalyticDB for PostgreSQL instance (elastic storage mode) | Serves as the vector database. Stores document embeddings and metadata, and performs vector similarity searches during retrieval. |
| PAI-EAS endpoint | Hosts the LLM for inference. Supports Llama 2-7b, Llama 2-13b, ChatGLM2-6b, and Qwen-7B. You can switch models after deployment. |
When a user submits a query, the LangChain service retrieves relevant document chunks from AnalyticDB for PostgreSQL, passes them as context to the LLM on PAI-EAS, and returns the generated answer.
Capabilities
Multiple LLMs -- Choose from Tongyi Qianwen-7b, ChatGLM-6b, Llama 2-7b, Llama 2-13b, . Switch between models at any time.
GPU cluster management -- Start with a low-resource GPU instance during testing, then scale elastically as demand grows.
Fine-grained permissions -- Use AnalyticDB for PostgreSQL database-level permissions to control access to the knowledge base. Query permissions through open source code and call API operations for knowledge base management.
Web UI and API access -- Interact with the chatbot through a browser-based UI or integrate it into your applications using the API to build AI-generated content (AIGC) workflows.
Data isolation -- Business data, algorithms, and GPU resources remain within your account.
Billing
Creating the chatbot service instance provisions an ECS instance and an AnalyticDB for PostgreSQL instance in elastic storage mode. You are charged for these resources based on the billing method you select during setup.
| Resource | Billing reference |
|---|---|
| Compute Nest | Billing overview of Compute Nest |
| ECS | Billing overview of ECS |
| AnalyticDB for PostgreSQL | Billable items of AnalyticDB for PostgreSQL |
Prerequisites
Before you begin, make sure that you have:
An Alibaba Cloud account with billing enabled
Resource Access Management (RAM) permissions granted to the RAM user who will create the service instance
Operation video
Create a service instance
This section uses the GenAI-LLM-RAG service template in Compute Nest.
Go to the Service Marketplace page in the Compute Nest console, click GenAI-LLM-RAG, and then click Launch Now.
On the Create Service Instance page, configure the following parameters.
Category Parameter Description Service Instance Name -- A descriptive name for the service instance. The system generates a name automatically. Region -- The region where the service instance, ECS instance, and AnalyticDB for PostgreSQL instance are deployed. PayType Configuration ECS Instance Charge Type The billing method for the ECS instance: Pay-as-you-go or Subscription. ECS Configuration Instance Type The specifications of the ECS instance. Instance Password The password to log on to the ECS instance. IngressIP The IP address whitelist for the ECS instance. Add the IP address of any server that needs to access the LLM. PAI-EAS Configuration ModelType The LLM to deploy. For example, select llama2-7b. pai instance type The GPU specifications for PAI-EAS. AnalyticDB PostgreSQL DBInstanceSpec The compute node specifications for the AnalyticDB for PostgreSQL instance. SegmentStorageSize The storage capacity per compute node, in GB. DB Username The privileged account name for the AnalyticDB for PostgreSQL instance. Instance Password The password for the privileged account. Choose model repo User Name The logon name for the LLM software (web UI). Software Login Password The logon password for the LLM software. Zone Configuration VSwitch Availability Zone The zone where the service instance is deployed. Choose existing Infrastructure Configuration WhetherCreateVpc Whether to create a new virtual private cloud (VPC) or use an existing one. VPC ID The VPC ID. VSwitch ID The vSwitch ID. Tags and Resource Groups Tag A tag to attach to the service instance. Resource Group The resource group for the service instance. For more information, see What is Resource Management? Click Next: Confirm Order.
Review the Dependency Check, Service Instance Information, and Price Preview sections.
Note If required role permissions are not granted, click Authorize in the Dependency Check section. After authorization completes, click the refresh button.Select I have read and agreed to Computing Nest Service Agreement, and then click Create Now.
After the request is submitted, click View Service.
The service instance takes approximately 10 minutes to provision. When the status on the Service Instances page changes from Deploying to Deployed, the instance is ready.
Use the chatbot
Before you can query the chatbot, upload documents to the knowledge base.
On the Service Instances page of the Compute Nest console, click the service instance ID to open the Service Instance Details page.
In the Instance Information section, click the URL in the Endpoint field.
Upload files to the knowledge base.
Click Upload File, Upload File and URL, or Upload Folder.
Supported formats: PDF, Markdown, TXT, and Word.
To remove a file, click Delete File.
Enter a question and click Submit.
Manage resources
View associated resources
On the Service Instances page of the Compute Nest console, click the service instance ID to open the Service Instance Details page.
Click the Resources tab to see all provisioned resources.
AnalyticDB for PostgreSQL management
On the Resources tab, find the resource whose service is AnalyticDB for PostgreSQL and click the resource ID to open the instance management page.
For vector analysis operations, see:
To adjust storage and compute capacity, see:
View knowledge base data
On the AnalyticDB for PostgreSQL instance management page, click Log On to Database in the upper-right corner. For details, see Use DMS to connect to an AnalyticDB for PostgreSQL instance.
Note When you connect through Data Management (DMS), use the DB Username and Instance Password that you set when you created the service instance.After you log on, click Instances Connected in the left-side navigation pane, find the AnalyticDB for PostgreSQL instance, and then double-click the public schema in the
chatglmuserdatabase.The
langchain_collectionstable stores knowledge base metadata.Each uploaded knowledge base or document has a corresponding table named after it. This table contains embedding data, chunks, file metadata, and original file names.
For more information about DMS, see What is DMS?
Enable auto scaling for EAS
EAS provides auto scaling, scheduled scaling, and elastic resource pools. Enable horizontal auto scaling to let EAS automatically adjust instance counts based on traffic.
On the Resources tab of the Service Instance Details page, find the resource whose service is PAI and click the resource ID to go to the Service Details page in the PAI console.
Click the Auto Scaling tab, and then click Enable Auto Scaling.
Configure the scaling parameters:
Parameter Description Minimum Number of Instances The minimum number of EAS instances to maintain. Maximum Number of Instances The upper limit for auto scaling. General Scaling Metrics The metric that triggers scaling. Select QPS Threshold of Individual Instance. QPS Threshold of Individual Instance The queries per second (QPS) threshold per instance. Example configurations:
Scenario Min instances Max instances QPS threshold Behavior Low traffic or testing 0 1 1 Scales up on the first request, scales down when idle. Production with variable load 5 50 2 Scales between 5 and 50 instances based on traffic. Click Enable.
Change the LLM
On the Resources tab of the Service Instance Details page, find the resource whose service is PAI and click the resource ID to go to the Service Details page in the PAI console.
Click Update Service.
On the Deploy Service page, modify the Command to Run field and select the corresponding GPU instance type. Keep the default values for other parameters.
LLM Command to run Recommended GPU Llama 2-13b python api/api_server.py --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16V100 (gn6e) Llama 2-7b python api/api_server.py --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hfGU30 and A10 ChatGLM2-6b python api/api_server.py --port=8000 --model-path=THUDM/chatglm2-6bGU30 and A10 Qwen-7B python api/api_server.py --port=8000 --model-path=Qwen/Qwen-7B-ChatGU30 and A10 Click Deploy.
In the Deploy Service dialog box, click OK.
FAQ
How do I call the vector search API?
How long does deployment take?
The service instance provisions in about 10 minutes, which covers ECS and AnalyticDB for PostgreSQL initialization. The LLM downloads asynchronously and takes an additional 30 to 60 minutes. To check download progress, connect to the ECS instance and view the download logs. After the download finishes, the chatbot appears in the web UI.
How do I connect to the ECS instance?
On the Resources tab of the service instance details page, find the resource whose type is security group and click the resource ID. On the Instance Details tab, click Connect. For more information, see Connect to an instance.
How do I restart the LangChain service?
Connect to the ECS instance and run:
systemctl restart langchain-chatglmHow do I view LangChain logs?
Connect to the ECS instance and run:
journalctl -ef -u langchain-chatglmWhat do I do if the LLM fails to load?
After service instance creation, the system downloads the LLM from Hugging Face. This download takes 30 to 60 minutes in Chinese regions. After it completes, refresh the page to load the LLM.
Where is the deployment code?
See langchain-ChatGLM.
How do I request support?
Apply for the one-stop dedicated enterprise chatbot O&M service.
Where is the LangChain service deployed on ECS?
The service is deployed at /home/admin/langchain-ChatGLM.
How do I enable the LangChain API?
Connect to the ECS instance and run the following commands:
# Create the systemd file for langchain-chatglm-api
cp /lib/systemd/system/langchain-chatglm.service /lib/systemd/system/langchain-chatglm-api.service
# Modify the ExecStart parameter in /lib/systemd/system/langchain-chatglm-api.service
# For EAS:
ExecStart=/usr/bin/python3.9 /home/langchain/langchain-ChatGLM/api.py
# For a GPU single-host:
ExecStart=/usr/bin/python3.9 /home/admin/langchain-ChatGLM/api.py
# Reload the systemd file
systemctl daemon-reload
# Start the API
systemctl restart langchain-chatglm-api
# Verify the API is running. Expected output:
# INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit)
# List all API operations
curl http://0.0.0.0:7861/openapi.json