Create a dedicated ChatBot using Compute Nest and an AnalyticDB for PostgreSQL instance - AnalyticDB

Deploy a retrieval-augmented generation (RAG) chatbot that combines large language models (LLMs) with a vector database. Compute Nest provisions all required resources as a single service instance, including an Elastic Compute Service (ECS) instance, an AnalyticDB for PostgreSQL instance, and a Platform for AI (PAI) Elastic Algorithm Service (EAS) endpoint.

How it works

The chatbot consists of three components that Compute Nest deploys together:

Component	Role
ECS instance	Hosts the LangChain application that provides the web UI and API. Handles document upload, chunking, and embedding.
AnalyticDB for PostgreSQL instance (elastic storage mode)	Serves as the vector database. Stores document embeddings and metadata, and performs vector similarity searches during retrieval.
PAI-EAS endpoint	Hosts the LLM for inference. Supports Llama 2-7b, Llama 2-13b, ChatGLM2-6b, and Qwen-7B. You can switch models after deployment.

When a user submits a query, the LangChain service retrieves relevant document chunks from AnalyticDB for PostgreSQL, passes them as context to the LLM on PAI-EAS, and returns the generated answer.

Capabilities

Multiple LLMs -- Choose from Tongyi Qianwen-7b, ChatGLM-6b, Llama 2-7b, Llama 2-13b, . Switch between models at any time.
GPU cluster management -- Start with a low-resource GPU instance during testing, then scale elastically as demand grows.
Fine-grained permissions -- Use AnalyticDB for PostgreSQL database-level permissions to control access to the knowledge base. Query permissions through open source code and call API operations for knowledge base management.
Web UI and API access -- Interact with the chatbot through a browser-based UI or integrate it into your applications using the API to build AI-generated content (AIGC) workflows.
Data isolation -- Business data, algorithms, and GPU resources remain within your account.

Billing

Creating the chatbot service instance provisions an ECS instance and an AnalyticDB for PostgreSQL instance in elastic storage mode. You are charged for these resources based on the billing method you select during setup.

Resource	Billing reference
Compute Nest	Billing overview of Compute Nest
ECS	Billing overview of ECS
AnalyticDB for PostgreSQL	Billable items of AnalyticDB for PostgreSQL

Prerequisites

Before you begin, make sure that you have:

An Alibaba Cloud account with billing enabled
Resource Access Management (RAM) permissions granted to the RAM user who will create the service instance

Note If specific role permissions are missing, the Dependency Check section during service creation prompts you to click Authorize.

Operation video

Create a service instance

This section uses the GenAI-LLM-RAG service template in Compute Nest.

Go to the Service Marketplace page in the Compute Nest console, click GenAI-LLM-RAG, and then click Launch Now.

On the Create Service Instance page, configure the following parameters.

Category	Parameter	Description
Service Instance Name	--	A descriptive name for the service instance. The system generates a name automatically.
Region	--	The region where the service instance, ECS instance, and AnalyticDB for PostgreSQL instance are deployed.
PayType Configuration	ECS Instance Charge Type	The billing method for the ECS instance: Pay-as-you-go or Subscription.
ECS Configuration	Instance Type	The specifications of the ECS instance.
	Instance Password	The password to log on to the ECS instance.
	IngressIP	The IP address whitelist for the ECS instance. Add the IP address of any server that needs to access the LLM.
PAI-EAS Configuration	ModelType	The LLM to deploy. For example, select llama2-7b.
PAI-EAS Configuration	pai instance type	The GPU specifications for PAI-EAS.
AnalyticDB PostgreSQL	DBInstanceSpec	The compute node specifications for the AnalyticDB for PostgreSQL instance.
	SegmentStorageSize	The storage capacity per compute node, in GB.
	DB Username	The privileged account name for the AnalyticDB for PostgreSQL instance.
	Instance Password	The password for the privileged account.
Choose model repo	User Name	The logon name for the LLM software (web UI).
Choose model repo	Software Login Password	The logon password for the LLM software.
Zone Configuration	VSwitch Availability Zone	The zone where the service instance is deployed.
Choose existing Infrastructure Configuration	WhetherCreateVpc	Whether to create a new virtual private cloud (VPC) or use an existing one.
	VPC ID	The VPC ID.
	VSwitch ID	The vSwitch ID.
Tags and Resource Groups	Tag	A tag to attach to the service instance.
Tags and Resource Groups	Resource Group	The resource group for the service instance. For more information, see What is Resource Management?

Click Next: Confirm Order.
Review the Dependency Check, Service Instance Information, and Price Preview sections.
Note If required role permissions are not granted, click Authorize in the Dependency Check section. After authorization completes, click the refresh button.
Select I have read and agreed to Computing Nest Service Agreement, and then click Create Now.
After the request is submitted, click View Service.

The service instance takes approximately 10 minutes to provision. When the status on the Service Instances page changes from Deploying to Deployed, the instance is ready.

Use the chatbot

Before you can query the chatbot, upload documents to the knowledge base.

On the Service Instances page of the Compute Nest console, click the service instance ID to open the Service Instance Details page.
In the Instance Information section, click the URL in the Endpoint field.
Upload files to the knowledge base.
- Click Upload File, Upload File and URL, or Upload Folder.
- Supported formats: PDF, Markdown, TXT, and Word.
- To remove a file, click Delete File.
Enter a question and click Submit.

Manage resources

View associated resources

On the Service Instances page of the Compute Nest console, click the service instance ID to open the Service Instance Details page.
Click the Resources tab to see all provisioned resources.

AnalyticDB for PostgreSQL management

On the Resources tab, find the resource whose service is AnalyticDB for PostgreSQL and click the resource ID to open the instance management page.

For vector analysis operations, see:

To adjust storage and compute capacity, see:

View knowledge base data

On the AnalyticDB for PostgreSQL instance management page, click Log On to Database in the upper-right corner. For details, see Use DMS to connect to an AnalyticDB for PostgreSQL instance.
Note When you connect through Data Management (DMS), use the DB Username and Instance Password that you set when you created the service instance.
After you log on, click Instances Connected in the left-side navigation pane, find the AnalyticDB for PostgreSQL instance, and then double-click the public schema in the chatglmuser database.
- The langchain_collections table stores knowledge base metadata.
- Each uploaded knowledge base or document has a corresponding table named after it. This table contains embedding data, chunks, file metadata, and original file names.

For more information about DMS, see What is DMS?

Enable auto scaling for EAS

EAS provides auto scaling, scheduled scaling, and elastic resource pools. Enable horizontal auto scaling to let EAS automatically adjust instance counts based on traffic.

On the Resources tab of the Service Instance Details page, find the resource whose service is PAI and click the resource ID to go to the Service Details page in the PAI console.
Click the Auto Scaling tab, and then click Enable Auto Scaling.

Configure the scaling parameters:

Parameter	Description
Minimum Number of Instances	The minimum number of EAS instances to maintain.
Maximum Number of Instances	The upper limit for auto scaling.
General Scaling Metrics	The metric that triggers scaling. Select QPS Threshold of Individual Instance.
QPS Threshold of Individual Instance	The queries per second (QPS) threshold per instance.

Example configurations:

Scenario	Min instances	Max instances	QPS threshold	Behavior
Low traffic or testing	0	1	1	Scales up on the first request, scales down when idle.
Production with variable load	5	50	2	Scales between 5 and 50 instances based on traffic.

Click Enable.

Change the LLM

On the Resources tab of the Service Instance Details page, find the resource whose service is PAI and click the resource ID to go to the Service Details page in the PAI console.
Click Update Service.

On the Deploy Service page, modify the Command to Run field and select the corresponding GPU instance type. Keep the default values for other parameters.

LLM	Command to run	Recommended GPU
Llama 2-13b	`python api/api_server.py --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16`	V100 (gn6e)
Llama 2-7b	`python api/api_server.py --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf`	GU30 and A10
ChatGLM2-6b	`python api/api_server.py --port=8000 --model-path=THUDM/chatglm2-6b`	GU30 and A10
Qwen-7B	`python api/api_server.py --port=8000 --model-path=Qwen/Qwen-7B-Chat`	GU30 and A10

Click Deploy.
In the Deploy Service dialog box, click OK.

FAQ

How do I call the vector search API?

See Java SDK for vector data import and query.

How long does deployment take?

The service instance provisions in about 10 minutes, which covers ECS and AnalyticDB for PostgreSQL initialization. The LLM downloads asynchronously and takes an additional 30 to 60 minutes. To check download progress, connect to the ECS instance and view the download logs. After the download finishes, the chatbot appears in the web UI.

How do I connect to the ECS instance?

On the Resources tab of the service instance details page, find the resource whose type is security group and click the resource ID. On the Instance Details tab, click Connect. For more information, see Connect to an instance.

How do I restart the LangChain service?

Connect to the ECS instance and run:

systemctl restart langchain-chatglm

How do I view LangChain logs?

Connect to the ECS instance and run:

journalctl -ef -u langchain-chatglm

What do I do if the LLM fails to load?

After service instance creation, the system downloads the LLM from Hugging Face. This download takes 30 to 60 minutes in Chinese regions. After it completes, refresh the page to load the LLM.

Where is the deployment code?

See langchain-ChatGLM.

How do I request support?

Apply for the one-stop dedicated enterprise chatbot O&M service.

Where is the LangChain service deployed on ECS?

The service is deployed at /home/admin/langchain-ChatGLM.

How do I enable the LangChain API?

Connect to the ECS instance and run the following commands:

# Create the systemd file for langchain-chatglm-api
cp /lib/systemd/system/langchain-chatglm.service /lib/systemd/system/langchain-chatglm-api.service

# Modify the ExecStart parameter in /lib/systemd/system/langchain-chatglm-api.service
# For EAS:
ExecStart=/usr/bin/python3.9 /home/langchain/langchain-ChatGLM/api.py
# For a GPU single-host:
ExecStart=/usr/bin/python3.9 /home/admin/langchain-ChatGLM/api.py

# Reload the systemd file
systemctl daemon-reload

# Start the API
systemctl restart langchain-chatglm-api

# Verify the API is running. Expected output:
# INFO:     Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit)

# List all API operations
curl http://0.0.0.0:7861/openapi.json