All Products
Search
Document Center

Platform For AI:Deploy a RAG chatbot on PAI-EAS

Last Updated:May 27, 2026

Retrieval-Augmented Generation (RAG) enhances LLM answers by retrieving relevant context from an external knowledge base. PAI-EAS provides scenario-based deployment to build a RAG chatbot with flexible LLM and vector database options.

Applicability

Applies to RAG v0.4.x. For v0.3.x, use PAI-RAG (v0.3.x).

Step 1: Deploy the RAG service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. On the Inference Service tab, click Deploy Service. In the Scenario-based Model Deployment section, click RAG-based Smart Dialogue Deployment.

  3. On the RAG-based LLM Chatbot Deployment page, configure the following key parameters:

    • Version: Select LLM Decoupled Deployment to deploy only the RAG service.

      Note

      LLM Integrated Deployment runs the RAG service and LLM in the same EAS instance. Use this only for smaller models — large models require significant resources.

    • RAG Version: pai-rag:0.4.3.

    • Resource Information:

      • Resource Type: Select Public Resources.

      • Deployment: The RAG service is lightweight. Use at least 8 vCPUs and 16 GB memory, such as ecs.c7.2xlarge or ecs.c7.4xlarge.

    • Vector database settings:

      • Vector Database Type: Select FAISS to build a local vector database and get started quickly. For production, use a production-grade vector database. Use an Alibaba Cloud vector database.

      • OSS Path: Select an existing OSS storage directory in the current region to store uploaded knowledge base files. If no path exists, follow Quick start in the console to create one.

    • Virtual private cloud (VPC): To access Alibaba Cloud Model Studio over the public internet, configure a VPC, public NAT gateway, and SNAT entry. Allow an EAS service to access the public internet.

  4. After you configure the parameters, click Deploy. Deployment typically takes about 5 minutes. When the Service Status changes to Running, the deployment is complete.

Step 2: Use knowledge base Q&A

On the Inference Service tab, find your deployed RAG service and open its details page. In the upper-right corner, click Web applications to open the web UI.

image

2.1 Configure an LLM

Click Settings > Model in the lower-left corner. This example uses qwen3-8b from Alibaba Cloud Model Studio. Configure models.

Note
  • Model ID: An identifier for selecting models during a chat. For this example, enter Qwen3-8B_bailian.

  • Endpoint URL: The model service address. The service address for Alibaba Cloud Model Studio in the China (Beijing) region is https://dashscope.aliyuncs.com/compatible-mode/v1.

    Important

    The URL must end with /v1 or /v2. If you are using an EAS service, append /v1 to the service invocation address.

  • API key: Obtain an API key.

  • Model name: Enter qwen3-8b.

image

2.2 Add a knowledge base

A default embedding model is pre-configured. You can directly create a knowledge base and upload documents.

  1. Create a knowledge base. In the left-side navigation pane, click Knowledge Base, and then click Create Knowledge Base.

    imageFor example, to create a knowledge base about iPhone 16 technical specifications, set the knowledge base name to iPhone16 and keep the default values for other parameters.image

  2. Upload a file. On the File Management tab, click Upload File. After the file is uploaded, click Start Parsing. Example file: iPhone 16 and iPhone 16 Plus - Technical Specifications - Apple (Chinese mainland).pdf.image

  3. View the knowledge base file. After the file uploads, click its name to view the document chunks.

  4. Test retrieval. Switch to the Retrieval Test tab and enter a query, such as iPhone16, to test the retrieval.

    image

2.3 Knowledge base Q&A

  1. In the left-side navigation pane, click New Chat. At the top of the chat page, select a model. At the bottom, click Knowledge Base, select the knowledge base to use (for example, iPhone16), click Activate, and then Save.

    Note

    Test the model configuration in a chat before activating a knowledge base.

    image

  2. Enter your question in the chat box.image

Step 3: Explore advanced Q&A modes

Multimodal Q&A (image and text chat)

To use multimodal Q&A, you must configure OSS environment variables for your RAG service and use a multimodal model.

  1. Configure OSS storage environment variables for the RAG service. Scenario-based deployment does not support environment variables directly. To add them, click Convert to Custom Deployment for a new service or Update for an existing one. In the Environment Information section, add the following environment variables:

    • FILE_STORE_TYPE: Set to oss.

    • OSS_BUCKET: Enter your OSS bucket name.

      Note

      Setting FILE_STORE_TYPE to oss creates a pairag_knowledgebases directory in your OSS_BUCKET for knowledge base files and chat attachments. If unset, files go to the mounted OSS directory.

    • OSS_ENDPOINT: The OSS access endpoint (OSS regions and endpoints). Example: oss-cn-hangzhou.aliyuncs.com.

    • OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET: The AccessKey ID and AccessKey secret of an account with the AliyunOSSFullAccess permission.

  2. Configure a multimodal LLM (such as the Qwen-VL series). The following example uses qwen-vl-plus. Enable the multimodal model switch.image

  3. The following image shows a chat example.image

Agentic Q&A (MCP tool calling)

This mode uses the model's reasoning and tool-calling capabilities, such as search and maps, to answer complex questions.

To use this mode:

  1. Configure a model that supports Deep Thinking. In the model configuration, enable the Deep Thinking option.image

  2. Configure search.

    image.png

  3. Configure Amap MCP. In the lower-left corner, click Settings > MCP and configure the following parameters.

    • MCP name: amaps

    • MCP link: https://mcp-server-amap-jitptfyoyw.cn-hangzhou.fcapp.run/sse

    • MCP type: SSE

    • Test the chat. In the left-side navigation pane, click New Chat. At the top of the page, select the Qwen3-8B model. At the bottom of the page, select Deep Thinking, Search, and MCP (activate amaps).

      image

Step 4: Evaluate RAG performance

The built-in evaluation module analyzes Q&A performance across different configurations.

  1. Create a dataset. In the left-side navigation pane, click Evaluation. On the Evaluation page, click Create Dataset.

    image.png

  2. Import samples. Click the dataset you created to open the evaluation task page. On the Samples tab, click Import Data.

    image

  3. Create a run configuration. On the Run Settings tab, click Create Configuration and configure the settings as needed.

    image.png

  4. Create an evaluation configuration. On the Evaluator Settings tab, click Create Configuration and select the configuration and evaluator type as needed.

    image.png

  5. Run an evaluation experiment. On the Samples tab, select the samples to evaluate and click Run Experiment. Enter a name for the experiment and select a Run Configuration and Evaluation Configuration as needed.

    image.png

  6. View the evaluation results. After the experiment is created, you are automatically redirected to the experiment details page. You can also go to the Run History tab and select the target experiment to view its details.

    image.png

Production applications

Use Alibaba Cloud vector database

PAI-RAG supports vector databases built with Elasticsearch, Hologres, OpenSearch, or RDS for PostgreSQL.

Note
  • Hologres, Elasticsearch, and RDS for PostgreSQL support access over an internal network or the public internet. We recommend using internal network access.

  • OpenSearch supports access only over the public internet.

Elasticsearch

Prepare Elasticsearch instance

If you do not have an Elasticsearch instance, sign in to the Alibaba Cloud Elasticsearch console and create one with the following settings. Create an Alibaba Cloud Elasticsearch cluster.

  • Region and availability zone: Select the same region as your EAS service.

  • VPC: Select the same VPC as your EAS service to allow access over the internal network.

  • Instance type: Select Standard.

  • Scenario initialization configuration: Select General-purpose.

Service configuration
Important

You must enable automatic index creation for your Elasticsearch instance. On the instance's Configuration Management > Cluster Configuration page, click Modify Configurations and set Auto Index Creation to Allowed. Configure YML parameters.

  • Vector Database Type: Select Elasticsearch.

  • Private Endpoint and Port: Go to the Elasticsearch instance details page. In the basic information section, find the private endpoint and port. Use the format http://<private_endpoint>:<port>.

  • Index Name: The system performs different actions based on your input.

    • Enter a new name: EAS automatically creates an index that is compatible with PAI-RAG during deployment.

      Important

      By default, Alibaba Cloud Elasticsearch does not allow automatic index creation. On the instance's Configuration Management > Cluster Configuration page, click Modify Configurations, update the YML file, and set Auto Index Creation to Allowed. Configure YML parameters.

    • Enter an existing name: EAS uses the existing index. Ensure that the index was created by a PAI-RAG service to guarantee structural compatibility.

  • Account and Password: The username and password that you configured when you created the Elasticsearch instance. The default username is elastic. If you forgot the password, see Reset the instance password.

  • OSS Path: Select an existing OSS storage directory in the current region. Knowledge base management relies on this mounted OSS path.

Manage indexes with Kibana

Manage indexes through Kibana. Connect to an Elasticsearch cluster by using a Kibana client.

Hologres

Make sure that you have purchased a Hologres instance.

  • Vector Database Type: Select Hologres.

  • Invocation Information: Go to the instance details page in the Hologres console. In the Network Information section, find the Specified VPC endpoint. Use the part of the endpoint before :80 as the host value.

  • Database Name: The database name of the Hologres instance. If you do not have one, see Create a database.

  • Account: A custom user account. To create one, see Create a custom user. For Select Member Role, select SuperUser.

  • Password: The password for the custom user account.

  • Table Name: The system performs different actions based on your input.

    • Enter a new name: EAS automatically creates a table that is compatible with PAI-RAG during deployment.

    • Enter an existing name: EAS uses the existing table. Ensure that the table was created by a PAI-RAG service to guarantee structural compatibility.

  • OSS Path: Select an existing OSS storage directory in the current region. Knowledge base management relies on this mounted OSS path.

OpenSearch

Prepare OpenSearch Vector Search Edition instance

If you do not have an OpenSearch instance, sign in to the OpenSearch console and create one with the following settings. Purchase an OpenSearch Vector Search Edition instance.

  • Product version: Select Vector Search Edition.

  • Region and availability zone and VPC: OpenSearch supports access only over the public internet, so these settings do not need to match your EAS service.

Service configuration
  • Vector Database Type: Select OpenSearch.

  • Endpoint: The public endpoint of your OpenSearch Vector Search Edition instance.

    Note

    You must enable public access for the OpenSearch Vector Search Edition instance and add the EAS public IP address to the allowlist.

  • Instance ID: Obtain the instance ID from the OpenSearch Vector Search Edition instance list.

  • Username and Password: The username and password that you entered when you created the OpenSearch Vector Search Edition instance.

  • Table Name: You must first create a compatible index table. See Configure an instance for creation steps, using the following key parameters:

    • For the scenario template, select the general-purpose template and use the following JSON to configure the fields.

      Field configuration file

      {
      	"schema": {
      		"summarys": {
      			"parameter": {
      				"file_compressor": "zstd"
      			},
      			"summary_fields": [
      				"id",
      				"embedding",
      				"file_path",
      				"file_name",
      				"file_type",
      				"node_content",
      				"node_type",
      				"doc_id",
      				"text",
      				"source_type"
      			]
      		},
      		"file_compress": [
      			{
      				"name": "file_compressor",
      				"type": "zstd"
      			},
      			{
      				"name": "no_compressor",
      				"type": ""
      			}
      		],
      		"indexs": [
      			{
      				"index_fields": [
      					{
      						"boost": 1,
      						"field_name": "id"
      					},
      					{
      						"boost": 1,
      						"field_name": "embedding"
      					}
      				],
      				"indexer": "aitheta2_indexer",
      				"index_name": "embedding",
      				"parameters": {
      					"enable_rt_build": "true",
      					"min_scan_doc_cnt": "20000",
      					"vector_index_type": "Qc",
      					"major_order": "col",
      					"builder_name": "QcBuilder",
      					"distance_type": "SquaredEuclidean",
      					"embedding_delimiter": ",",
      					"enable_recall_report": "true",
      					"ignore_invalid_doc": "true",
      					"is_embedding_saved": "false",
      					"linear_build_threshold": "5000",
      					"dimension": "1536",
      					"rt_index_params": "{\"proxima.oswg.streamer.segment_size\":2048}",
      					"search_index_params": "{\"proxima.qc.searcher.scan_ratio\":0.01}",
      					"searcher_name": "QcSearcher",
      					"build_index_params": "{\"proxima.qc.builder.quantizer_class\":\"Int8QuantizerConverter\",\"proxima.qc.builder.quantize_by_centroid\":true,\"proxima.qc.builder.optimizer_class\":\"BruteForceBuilder\",\"proxima.qc.builder.thread_count\":10,\"proxima.qc.builder.optimizer_params\":{\"proxima.linear.builder.column_major_order\":true},\"proxima.qc.builder.store_original_features\":false,\"proxima.qc.builder.train_sample_count\":3000000,\"proxima.qc.builder.train_sample_ratio\":0.5}"
      				},
      				"index_type": "CUSTOMIZED"
      			},
      			{
      				"has_primary_key_attribute": true,
      				"index_fields": "id",
      				"is_primary_key_sorted": false,
      				"index_name": "id",
      				"index_type": "PRIMARYKEY64"
      			},
      			{
      				"index_fields": "file_path",
      				"index_name": "file_path",
      				"index_type": "STRING"
      			},
      			{
      				"index_fields": "file_name",
      				"index_name": "file_name",
      				"index_type": "STRING"
      			},
      			{
      				"index_fields": "file_type",
      				"index_name": "file_type",
      				"index_type": "STRING"
      			},
      			{
      				"index_fields": "node_content",
      				"index_name": "node_content",
      				"index_type": "STRING"
      			},
      			{
      				"index_fields": "node_type",
      				"index_name": "node_type",
      				"index_type": "STRING"
      			},
      			{
      				"index_fields": "doc_id",
      				"index_name": "doc_id",
      				"index_type": "STRING"
      			},
      			{
      				"index_fields": "text",
      				"index_name": "text",
      				"index_type": "STRING"
      			},
      			{
      				"index_fields": "source_type",
      				"index_name": "source_type",
      				"index_type": "STRING"
      			}
      		],
      		"attributes": [
      			{
      				"file_compress": "no_compressor",
      				"field_name": "id"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "embedding"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "file_path"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "file_name"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "file_type"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "node_content"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "node_type"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "doc_id"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "text"
      			},
      			{
      				"file_compress": "no_compressor",
      				"field_name": "source_type"
      			}
      		],
      		"fields": [
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "id"
      			},
      			{
      				"user_defined_param": {
      					"multi_value_sep": ","
      				},
      				"multi_value": true,
      				"compress_type": "uniq",
      				"field_type": "FLOAT",
      				"field_name": "embedding"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "file_path"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "file_name"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "file_type"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "node_content"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "node_type"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "doc_id"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "text"
      			},
      			{
      				"compress_type": "uniq",
      				"field_type": "STRING",
      				"field_name": "source_type"
      			}
      		],
      		"table_name": "abc"
      	},
      	"extend": {
      		"description": [],
      		"vector": [
      			"embedding"
      		],
      		"embeding": []
      	}
      }
    • In the Index schema, ensure the vector dimension matches the embedding model's dimension. For Distance type, we recommend selecting InnerProduct.

Manage index tables and data
  1. Sign in to the Alibaba Cloud OpenSearch Vector Search Edition console, click the ID of your instance, and go to the Instance Details page.

  2. Go to the table management page to manage the index table. Table management.image

  3. Go to the vector management page to run query tests or manage data. Vector management.

RDS for PostgreSQL

Prepare RDS for PostgreSQL instance
  1. If you do not have an RDS for PostgreSQL instance, go to the RDS instance creation page. Configure the following key parameters and follow the on-screen instructions to complete the purchase. Create an ApsaraDB RDS for PostgreSQL instance.

    • Engine: Select PostgreSQL.

    • VPC: Select the same VPC as your EAS service to allow access over the internal network.

    • Privileged account: In the Advanced Settings section, configure a privileged account. Select Set Now and configure the database account and password.

  2. Create a database.

    1. Click the name of the instance you created. In the left-side navigation pane, click Database Management, and then click Create Database.

    2. In the Create Database panel, configure the Database (DB) Name. For Authorized Account, select the privileged account that you created. For information about other parameters, see Create a database and an account.

    3. After you configure the parameters, click Create.

Service configuration

Ensure you have an RDS for PostgreSQL instance.

  • Vector Database Type: Select RDS for PostgreSQL.

  • Host address: The internal endpoint of your RDS for PostgreSQL instance. You can find this on the Database Connection page for your instance in the ApsaraDB RDS for PostgreSQL console.

  • Port: The default is 5432. Enter the actual port if it is different.

  • Database: The Authorized Account for the database must be a Privileged Account. For instructions, see Create a database and an account. You must also install the vector and jieba extensions for the database.

  • Table Name: A custom name for the database table.

  • Account and Password: The authorized username and password that you configured when you created the database. To learn how to create a privileged account, see Create a database and an account. For Account type, select Privileged Account.

  • OSS Path: Select an existing OSS storage directory in the current region. Knowledge base management relies on this mounted OSS path.

Manage RDS for PostgreSQL database
  1. Go to the RDS instance list, switch to the region of your instance, and then click the instance name.

  2. In the left navigation bar, select Database Management, and then click SQL Query in the Actions column of the target database.

  3. Enter the Database account and Database password, which are the privileged account credentials you set when creating the instance, and then click Sign in.

  4. After you are logged in, you can query the list of imported knowledge bases in the database instance.image