Use EAS and Tablestore to deploy a RAG-based LLM chatbot - Tablestore

When you deploy a Retrieval-Augmented Generation (RAG)-based large language model (LLM) chatbot in Platform for AI (PAI), you can use Tablestore as the vector database. This topic describes how to use Tablestore as the vector database of a RAG-based LLM chatbot that is deployed by using JSON configurations.

Background information

EAS

PAI provides a one-stop platform for model development and deployment. The Elastic Algorithm Service (EAS) module of PAI allows you to deploy models as online inference services by using the public resource group or dedicated resource groups. The models are loaded on heterogeneous hardware (CPUs and GPUs) to generate real-time responses.

Tablestore

Tablestore is a cost-effective and high-performance system for massive data storage and retrieval. It provides vector retrieval features with high retrieval rate (multimodal retrieval, and scalar and vector mixed retrieval) and high performance (real-time indexing, queries in milliseconds, and support for up to 10 billion vectors per table) to ensure service stability and security (dedicated VPC, 99.99% availability, and 12 nines of data reliability).

RAG

RAG retrieves relevant information from external knowledge bases, combines the information with user input, and then passes the combined input to LLMs. This enhances the knowledge-based Q&A capability of LLMs in specific domains. The following figure shows how to upload, store, and retrieve knowledge base files when Tablestore is used as the vector database of a RAG application.

Usage notes

In this example, the LLM model DeepSeek-R1-Distill-Qwen-1.5B is deployed on an instance of the ecs.gn7i-c16g1.4xlarge type. For information about the billing method of EAS, see Billing of EAS.

Important

If you want to only test the deployment process, you must delete the deployed service immediately after deployment to prevent unexpected charges of resources.

Procedure

Step 1: Prepare a Tablestore vector database

Activate Tablestore

If Tablestore is activated, skip this step.

Log on to the product details page.
Click Get it Free.
On the Table Store (Pay-As-You-Go) page, click Buy Now.
On the Confirm Order page, read the agreement carefully, select I have read and agree to Table Store (Pay-As-You-Go) Agreement of Service, and then click Activate Now.
After you activate Tablestore, click Console to go to the Tablestore console.

Create a Tablestore instance

You can also select an existing instance as the vector database. In this case, you must prepare information such as the instance name, the virtual private cloud (VPC) endpoint, and an AccessKey pair with access permissions on the instance.

Log on to the Tablestore console.
In the top navigation bar, select a resource group and a region and click Create Instance.
In the Billing Method dialog box, specify Instance Name and Instance Type, and then click OK.

Obtain connection information

Click the instance name or Manage Instance to go to the Instance Details tab. On the Instance Details tab, you can view the instance access URL and instance name.
Note
Use the VPC endpoint as the instance access URL.
Create an AccessKey pair for your Alibaba Cloud account or RAM user that has the access permissions on Tablestore.

Step 2: Use EAS to deploy the RAG-based LLM chatbot

Activate PAI and create a default workspace.
Important
You must activate PAI in the same region as the Tablestore instance.
In the left-side navigation pane of the PAI console, choose Model Deployment > Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. On the page that appears, click JSON Deployment.
On the JSON Deployment page, enter the deployment configurations and click Deploy. In the message that appears, click OK.

After you complete the preceding steps, the system immediately starts deploying the RAG-based LLM chatbot. The entire deployment process takes about 5 minutes. After the deployment is complete, the service status changes to Running.

If the specified vector data table does not exist, the system automatically creates a data table and configures search indexes for the Tablestore instance during deployment.

Sample configuration file and parameter description

Sample configuration file

{
  "SupportedInstanceTypes": [
      "ecs.gn7i-c16g1.4xlarge",
      "ecs.gn7i-c32g1.16xlarge",
      "ecs.gn7i-c32g1.32xlarge",
      "ecs.gn7i-c32g1.8xlarge",
      "ecs.gn7i-c8g1.2xlarge",
      "ecs.gn7i-c8g1.2xlarge.limit",
      "ecs.gn8is-2x.8xlarge",
      "ecs.gn8is-4x.16xlarge",
      "ecs.gn8is-8x.32xlarge",
      "ecs.gn8is.2xlarge",
      "ecs.gn8is.4xlarge",
      "ecs.gn8v.6xlarge",
      "ecs.gn8v-2x.12xlarge",
      "ecs.gn8v-4x.24xlarge",
      "ecs.gn8v-8x.48xlarge",
      "ml.gu7i.c128m752.4-gu30",
      "ml.gu7i.c16m60.1-gu30",
      "ml.gu7i.c32m188.1-gu30",
      "ml.gu7i.c64m376.2-gu30",
      "ml.gu7i.c8m30.1-gu30",
      "ml.gu8is.c128m1024.8-gu60",
      "ml.gu8is.c16m128.1-gu60",
      "ml.gu8is.c32m256.2-gu60",
      "ml.gu8is.c64m512.4-gu60",
      "ml.gu8v.c192m1024.8-gu120",
      "ml.gu8v.c24m128.1-gu120",
      "ml.gu8v.c48m256.2-gu120",
      "ml.gu8v.c96m512.4-gu120"
  ],
  "cloud": {
      "computing": {
          "instances": [
              {
                  "type": "ecs.gn7i-c16g1.4xlarge"
              }
          ]
      },
      "networking": {
          "security_group_id": "sg-bp****************dj",
          "vpc_id": "vpc-bp*****************po",
          "vswitch_id": "vsw-bp*****************eu"
      }
  },
  "containers": [
      {
          "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/pai-rag:0.2.0-nginx",
          "port": 8680,
          "script": "/docker-entrypoint.sh nginx"
      },
      {
          "env": [
              {
                  "name": "PAIRAG_RAG__SETTING__interactive",
                  "value": "false"
              }
          ],
          "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/pai-rag:0.2.0-ui",
          "port": 8002,
          "script": "pai_rag ui"
      },
      {
          "env": [
              {
                  "name": "PAIRAG_RAG__INDEX__VECTOR_STORE__type",
                  "value": "tablestore"
              },
              {
                  "name": "PAIRAG_RAG__INDEX__VECTOR_STORE__endpoint",
                  "value": "https://d0********9c.cn-hangzhou.vpc.tablestore.aliyuncs.com"
              },
              {
                  "name": "PAIRAG_RAG__INDEX__VECTOR_STORE__instance_name",
                  "value": "d0********9c"
              },
              {
                  "name": "PAIRAG_RAG__INDEX__VECTOR_STORE__access_key_id",
                  "value": "LT********************u7"
              },
              {
                  "name": "PAIRAG_RAG__INDEX__VECTOR_STORE__access_key_secret",
                  "value": "nI**************************GF"
              },
              {
                  "name": "PAIRAG_RAG__INDEX__VECTOR_STORE__table_name",
                  "value": "pai_rag"
              },
              {
                  "name": "PAIRAG_RAG__DATA_READER__enable_image_ocr",
                  "value": "false"
              },
              {
                  "name": "PAIRAG_RAG__LLM__source",
                  "value": "PaiEas"
              },
              {
                  "name": "PAIRAG_RAG__LLM__endpoint",
                  "value": "http://127.0.0.1:8000"
              },
              {
                  "name": "PAIRAG_RAG__LLM__token",
                  "value": "abc"
              },
              {
                  "name": "PAIRAG_RAG__EMBEDDING__source",
                  "value": "HuggingFace"
              },
              {
                  "name": "PAIRAG_RAG__EMBEDDING__model_name",
                  "value": "bge-small-zh-v1.5"
              }
          ],
          "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/pai-rag:0.2.0",
          "port": 8001,
          "script": "pai_rag serve"
      },
      {
          "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/sglang:v0.4.1.post6-cu124_netcat_accelerated",
          "port": 8000,
          "script": "gpu_count=$(nvidia-smi --query-gpu=count --format=csv,noheader | wc -l); python3 -m sglang.launch_server --model-path /model_dir --host 0.0.0.0 --port 8000 --tp $gpu_count --trust-remote-code --enable-metrics --served-model-name DeepSeek-R1-Distill-Qwen-1.5B"
      }
  ],
  "labels": {
      "PAI_RAG_VERSION": "0.1_custom",
      "system_eas_deployment_type": "rag",
      "system_eas_rag_open_source_model_acc_type": "SGLang-Accelerate:Single-Node-Standard",
      "system_eas_rag_open_source_model_name": "DeepSeek-R1-Distill-Qwen-1.5B"
  },
  "metadata": {
      "cpu": 16,
      "enable_webservice": true,
      "gpu": 1,
      "instance": 1,
      "memory": 60000,
      "name": "rag_tablestore",
      "rpc": {
          "keepalive": 900000000
      },
      "shm_size": 100
  },
  "storage": [
      {
          "mount_path": "/model_dir/",
          "oss": {
              "endpoint": "cn-hangzhou-internal.oss-data-acc.aliyuncs.com",
              "path": "oss://pai-quickstart-cn-hangzhou/modelscope/models/DeepSeek-R1-Distill-Qwen-1.5B/"
          },
          "properties": {
              "resource_type": "model",
              "resource_use": "base"
          }
      }
  ]
}

Parameter description

Vector database settings

You must use environment variables to add vector database settings. The following table lists the required environment variables.

Environment variable	Description
PAIRAG_RAG__INDEX__VECTOR_STORE__type	The vector database type, which is fixed to tablestore.
PAIRAG_RAG__INDEX__VECTOR_STORE__endpoint	The endpoint of the Tablestore instance.
PAIRAG_RAG__INDEX__VECTOR_STORE__instance_name	The name of the Tablestore instance.
PAIRAG_RAG__INDEX__VECTOR_STORE__access_key_id	The AccessKey ID of your Alibaba Cloud account or RAM user.
PAIRAG_RAG__INDEX__VECTOR_STORE__access_key_secret	The AccessKey secret of your Alibaba Cloud account or RAM user.
PAIRAG_RAG__INDEX__VECTOR_STORE__table_name	The name of the source table that stores vectors.

VPC settings
You must use cloud.networking to add VPC settings. If a specific VPC is not required, delete the related settings from the sample configuration file.
Parameter
Description
vpc_id
The VPC ID.
vswitch_id
The vSwitch ID.
security_group_id
The security group ID.

For more information about other parameters of EAS deployment, see Parameters for JSON deployment.

Step 3: Use the RAG-based LLM chatbot

In the service list, find the deployed service and click View Web App in the Service Type column. In the dialog box that appears, click Web App.
On the Upload tab of PAI-RAG Dashboard, upload a knowledge base file.
After the knowledge base file is parsed and uploaded, you can view the vector data written to the file in the Tablestore console.
On the Chat tab of PAI-RAG Dashboard, enter a question and click Submit to start a conversation.

For more information about web UI debugging, such as changing the vector database, LLM, and file types supported by the knowledge base, see RAG-based LLM chatbot.

Parameter	Description
vpc_id	The VPC ID.
vswitch_id	The vSwitch ID.
security_group_id	The security group ID.