Deploy a private enterprise chatbot powered by a large language model (LLM) and AnalyticDB for PostgreSQL as the vector database — without writing any infrastructure code. Compute Nest provisions all required resources, giving you a working Retrieval-Augmented Generation (RAG) chatbot with a web UI in about 10 minutes.
How it works
The chatbot uses RAG to answer questions from your private documents:
Upload documents — PDF, Markdown, TXT, or Word files go into a knowledge base.
Chunk and embed — The system splits documents into segments and converts them to vector embeddings stored in AnalyticDB for PostgreSQL.
Retrieve and generate — When a user asks a question, the system retrieves the most relevant chunks from the vector database, then passes them to the LLM to generate a grounded answer.
AnalyticDB for PostgreSQL handles vector storage and similarity search. The LLM (running on Platform for AI (PAI)) generates responses. LangChain on Elastic Compute Service (ECS) orchestrates the pipeline and serves the web UI.
What you'll do
Review billing and prerequisites.
Create a service instance using the GenAI-LLM-RAG template.
Upload documents to build a knowledge base.
Ask questions through the web UI.
(Optional) Manage resources: scale PAI-EAS, switch models, or inspect database tables.
Prerequisites
Before you begin, make sure you have:
An Alibaba Cloud account
(If using a RAM user) The RAM permissions required by Compute Nest — see Grant permissions to a RAM user
Billing
When you create the One-stop Enterprise-specific Chatbot Community Edition (Large Language Model + Vector Database) service, the system automatically creates an ECS instance and an AnalyticDB for PostgreSQL instance in elastic storage mode. You are charged for these resources.
For pricing details, see:
Operation video
Create a service instance
This guide uses the GenAI-LLM-RAG template as an example.
Go to the Create Service Instance page. In the Quick Trial section, click GenAI-LLM-RAG.
On the Create Service Instance page, configure the following parameters.
Section Parameter Description Service instance name — Enter a name that is easy to identify. The system generates a random name by default. Region — The region where all resources (service instance, ECS, and AnalyticDB for PostgreSQL) will be created. Billing method configuration Billing method Select Pay-As-You-Go or Subscription. This guide uses pay-as-you-go. ECS configuration Instance type Select the ECS instance specifications. Instance password The logon password for the ECS instance. Whitelist settings Add the IP addresses of servers that need to call the LLM API. PAI-EAS model configuration Select large model Select a pre-configured LLM. This guide uses llama2-7b. PAI instance type Select the GPU specifications for PAI. Unavailable specifications are grayed out. AnalyticDB for PostgreSQL Instance type The node specifications for the AnalyticDB for PostgreSQL instance. Segment storage size Storage space for compute nodes, in GB. Database account name The initial database account name. Database password The password for the initial database account. Application configuration Software logon name The username for logging in to the LangChain web service. Software logon password The password for the LangChain web service. Zone configuration vSwitch zone The zone where the service instance will be created. Network configuration Create a new VPC Create a new VPC or use an existing one. This guide uses a new VPC. VPC IPv4 CIDR block The IPv4 CIDR block for the VPC. vSwitch subnet CIDR block The CIDR block for the vSwitch. Tags and resource groups Tag Attach a tag to the service instance. Resource group The resource group for the service instance. See What is Resource Management? Click Next: Confirm Order.
Review the Dependency Check, Service Instance Information, and Price Preview sections.
If a role permission shows as disabled in Dependency Check, click Enable Now on the right, then click the refresh button in that section.
Click Create Now.
Click View Service.
The service instance takes about 10 minutes to create. Its status changes to Deployed when ready.
The LLM is downloaded asynchronously from Hugging Face after the service instance is deployed. This download takes an additional 30 to 60 minutes. The chatbot is not usable until the download completes.
Set up the knowledge base and use the chatbot
Before the chatbot can answer questions, upload your documents to a knowledge base.
In the Compute Nest console, go to Service Instance and click the ID of your service instance.
On the service instance details page, in the Use Now section, click the link next to Endpoint.
In the Log On dialog box, enter the Software Logon Name and Software Logon Password you set during service creation, then click Log On.
In the upper-right corner, under Please select a usage mode, select Knowledge Base Q&A.
In the Configure Knowledge Base section on the right, under Please select a knowledge base to load, select Create Knowledge Base. Enter a name for the new knowledge base and click Add to Knowledge Base Options.
Set Sentence Length Limit for Text Storage based on your requirements. The recommended value is 500. Longer segments reduce chunking granularity and can lower retrieval accuracy.
Upload documents to the knowledge base. Supported upload methods: Supported file formats: PDF, Markdown, TXT, and Word. > Tip: Documents with complex layouts (tables, multi-column text, or heavy formatting) may produce lower-quality chunks. For best retrieval accuracy, convert such documents to plain text or structured Markdown before uploading. To remove a file, use the Delete File interface.
Upload File — upload individual files
Upload File and URL — upload files or fetch from a URL
Upload Folder — upload an entire folder
After the upload completes, type a question in the lower-left corner and click Submit
Resource management
View associated resources
In the Compute Nest console, go to Service Instance and click the ID of your service instance.
Click the Resources tab.
Manage AnalyticDB for PostgreSQL
On the Resources tab, find the resource with Product set to AnalyticDB for PostgreSQL and click its Resource ID to open the instance management page.
For more information about vector capabilities:
To scale resources:
View knowledge base data in the database
On the AnalyticDB for PostgreSQL instance management page, click Log On to Database in the upper-right corner. For connection instructions, see Use DMS to log on to a database.
Use the Database Account Name and Database Password you specified when creating the service instance.
In the Logged-in Instances list on the left, find your AnalyticDB for PostgreSQL instance and double-click the
publicschema under thechatglmuserdatabase.The
langchain_collectionstable lists all knowledge bases.Each knowledge base has its own table (named after the knowledge base) containing embeddings, chunks, file metadata, and original file names.
For more information about Data Management (DMS), see What is Data Management (DMS).
Manage PAI-EAS resources
Enable auto scaling
PAI-EAS supports horizontal auto scaling, scheduled scaling, and elastic resource pools. For workloads with significant traffic peaks, enable horizontal auto scaling to avoid over-provisioning at low traffic and prevent resource exhaustion at peak traffic.
On the Resources tab, find the resource with Product set to Platform for AI (PAI) and click its Resource ID to open the service details page.
Click the Auto Scaling tab.
In the Elastic Scaling section, click Enable Auto Scaling.
In the Auto Scaling Settings dialog box, configure the parameters based on your workload:
Scenario Minimum instances Maximum instances Scaling metric QPS threshold Low-traffic (start on demand, stop when idle) 0 1 QPS-based scaling threshold per instance 1 High-traffic (large daily volume with fluctuations) 5 50 QPS-based scaling threshold per instance 2 Click Enable.
For a description of each scaling type, see horizontal auto scaling, scheduled scaling, and elastic resource pools.
Switch to a different LLM
On the Resources tab, find the resource with Product set to Platform for AI (PAI) and click its Resource ID.
Click Update Service in the upper-right corner.
On the deployment page, update the Run Command and GPU Instance Type using the values in the following table. Leave all other parameters at their default values.
Model Run command Recommended instance type Llama 2 13B python api/api_server.py --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16V100 (gn6e) Llama 2 7B python api/api_server.py --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hfGU30, A10 Qwen 7B python api/api_server.py --port=8000 --model-path=Qwen/Qwen-7B-ChatGU30, A10 Click Deploy.
In the Deploy Service dialog box, click OK.
FAQ
How do I check whether the LLM has finished downloading?
After the service instance is deployed, the LLM downloads asynchronously from Hugging Face to the ECS instance. This takes 30 to 60 minutes. To monitor progress, log on to the ECS instance and run:
journalctl -ef -u langchain-chatglmWhen you see a log entry indicating the service is listening (for example, the Uvicorn startup message), the model is loaded and the chatbot is ready. Then log on to the web UI to use the chatbot.
Why does the model fail to load after deployment?
The model download from Hugging Face takes 30 to 60 minutes and may be slower in some regions. The chatbot is unavailable until the download completes. Use the log command above to monitor download progress. Wait for the model-loaded message before accessing the web UI.
Why do I see a blank page when I access the service?
This service runs on the Alibaba Cloud China site (www.aliyun.com). If you access it through a proxy from outside China, the page may appear blank. Disable the proxy before creating and accessing the service.
How do I log on to the ECS instance?
On the Resources tab of your service instance, find the resource with Resource Type set to securitygroup and click its Resource ID. On the ECS basic information page, click Remote Connection. For details, see Connect to an instance.
How do I restart the LangChain service?
Log on to the ECS instance and run:
systemctl restart langchain-chatglmHow do I view LangChain logs?
Log on to the ECS instance and run:
journalctl -ef -u langchain-chatglmHow do I enable the LangChain API?
Log on to the ECS instance and run the following commands:
# Copy the systemd unit file for the API service
cp /lib/systemd/system/langchain-chatglm.service /lib/systemd/system/langchain-chatglm-api.service
# Edit ExecStart in the new unit file:
# For PAI-EAS:
# ExecStart=/usr/bin/python3.9 /home/langchain/langchain-ChatGLM/api.py
# For a single GPU-accelerated instance:
# ExecStart=/usr/bin/python3.9 /home/admin/langchain-ChatGLM/api.py
# Reload systemd and start the API service
systemctl daemon-reload
systemctl restart langchain-chatglm-api
# Verify the API is running (look for the following log entry):
# INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit)
# List all available API endpoints:
curl http://0.0.0.0:7861/openapi.jsonWhere is LangChain deployed on the ECS instance?
LangChain is deployed at /home/admin/langchain-ChatGLM.
How do I use the vector search APIs?
See Import and query vector data through APIs (Java).
How do I get backend support from the product team?
Subscribe to the One-stop Enterprise-specific Chatbot Managed Service to request support.
Where can I find the deployment source code?
See the langchain-ChatGLM repository on GitHub.