The rds_ai extension lets you call large language models (LLMs) and run embedding operations directly from SQL in ApsaraDB RDS for PostgreSQL. It integrates with Alibaba Cloud Model Studio models — including Qwen and text-embedding-v3 — so you can build AI features without moving data out of your database.
With rds_ai, you can run LLM Q&A, convert text to vectors, retrieve semantically similar records, and run end-to-end retrieval-augmented generation (RAG) pipelines — all from standard PostgreSQL queries. It also supports custom models, so you can connect any HTTP-compatible model API.
Prerequisites
Before you begin, make sure you have:
An ApsaraDB RDS for PostgreSQL instance that meets the following version requirements: To upgrade, see Upgrade a minor engine version.
Major version Minimum minor engine version PostgreSQL 16 20241230 PostgreSQL 14, 15, 17 20250430 PostgreSQL 18 20251130 A privileged account for your instance. To create one, see Create an account.
An active Alibaba Cloud Model Studio subscription with an API key. To get one, see Obtain an API key.
Network configuration
By default, an RDS for PostgreSQL instance cannot access external networks. To enable access to external models, you must configure a NAT gateway for the instance's VPC. For more information about NAT gateways, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
Install the extension
Confirm your instance's major and minor engine versions before installing. See Prerequisites.
Option 1: Install from the console
Go to the RDS Instances page, select a region, and click the target instance ID.
In the left navigation pane, click Plug-ins.
On the Plugin Marketplace page, click Install next to rds_ai.
Select the target database and account, then click Install.
To uninstall, go to the Installed Plugins tab on the Manage Plugins page.
Option 2: Install with SQL
Run the following command as a privileged account:
CREATE EXTENSION IF NOT EXISTS rds_ai CASCADE;Installing rds_ai also installs the pgvector and pgsql-http extensions.
To verify the installation:
SELECT * FROM pg_extension;To remove the extension:
DROP EXTENSION rds_ai;Default models
rds_ai ships with the following default models:
| API | Parameter | Default model | Options |
|---|---|---|---|
| Prompt | rds_ai.default_prompt_model | qwen-plus | qwen-plus, qwen-max, qwen-turbo |
| Embed | rds_ai.default_embed_model | text-embedding-v3 | text-embedding-v3 |
To change the default prompt model, add rds_ai to the Running Value of the shared_preload_libraries parameter, then set the model. For instructions on updating parameters, see Set instance parameters. For example, set Running Value to 'pg_stat_statements,auto_explain,rds_ai', then run:
SET rds_ai.default_prompt_model TO 'qwen-max';Configure the extension
Before calling any model, set the API key and model URLs.
Step 1: Set the API key
-- Set the API key for a specific model
SELECT rds_ai.update_model('qwen-plus', 'token', 'sk-****');
-- Set the same API key for all models
SELECT rds_ai.update_model(model_name, 'token', 'sk-****')
FROM rds_ai.model_list;Step 2: Set the model URLs
-- Prompt models (qwen-plus, qwen-max, qwen-turbo share the same endpoint)
SELECT rds_ai.update_model('qwen-plus', 'uri', 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation');
SELECT rds_ai.update_model('qwen-max', 'uri', 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation');
SELECT rds_ai.update_model('qwen-turbo', 'uri', 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation');
-- Embedding model
SELECT rds_ai.update_model('text-embedding-v3', 'uri', 'https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding');Step 3: (Optional) Configure request timeouts
rds_ai uses the pgsql-http extension to make remote calls. Set timeouts to interrupt long-running requests.
These settings apply to the current session only. Reset them for each new connection.
-- Request timeout (milliseconds)
SET http.timeout_msec TO 200000;
SELECT http.http_set_curlopt('CURLOPT_TIMEOUT', '200000');
-- Connection timeout (milliseconds)
SELECT http.http_set_curlopt('CURLOPT_CONNECTTIMEOUT_MS', '200000');LLM Q&A
Use rds_ai.prompt to send a question to a Qwen text generation model and get a response. The default model is qwen-plus.
Simple call:
SELECT rds_ai.prompt('Give me a recipe using carrots, potatoes, and eggplants.');Full call with explicit parameters:
SELECT rds_ai.prompt(
model_name => 'qwen-plus', -- model name
content => 'Give me a recipe using carrots, potatoes, and eggplants.', -- prompt text
args => '{"top_p": 0.7}'::jsonb -- model parameters
);Text-to-vector conversion
Use rds_ai.embed to convert text to a vector using the text-embedding-v3 model. The output is a 1024-dimensional dense vector by default. To request a sparse vector, pass {"output_type": "sparse"} — the output is 250,002 dimensions.
Dense vector (default):
SELECT rds_ai.embed(
'A strong wind blows from a high sky, and monkeys cry sadly.'
);Sparse vector:
SELECT rds_ai.embed(
content => 'A strong wind blows from a high sky, and monkeys cry sadly.',
args => '{"output_type": "sparse"}'::jsonb
);Store embeddings in a table:
Create the table:
CREATE TABLE test_embed ( a text, b vector(1024), -- dense vector column c sparsevec(250002) -- sparse vector column );Insert text:
INSERT INTO test_embed (a) VALUES ('hello world');Write the embeddings:
-- Write a dense vector UPDATE test_embed SET b = rds_ai.embed(a, '{"output_type": "dense"}'::jsonb)::vector; -- Write a sparse vector UPDATE test_embed SET c = rds_ai.embed(a, '{"output_type": "sparse"}'::jsonb)::sparsevec;
Semantic vector retrieval
Use rds_ai.retrieve to find the top-N most similar records to a query. It embeds the query using text-embedding-v3 and compares against stored vectors using the distance type you specify.
Choose a distance type
| Distance type | When to use |
|---|---|
cosine | Comparing text meaning regardless of length — the most common choice for semantic search |
L2 | When the magnitude of vectors matters, not just their direction |
L1 | For sparse or high-dimensional vectors where L2 is computationally expensive |
negative | For normalized vectors where dot product is equivalent to cosine similarity but faster |
The default distance type for rds_ai.retrieve is cosine.
Choose a vector index type
Without an index, PostgreSQL performs an exact sequential scan — accurate but slow for large datasets. Add a vector index for faster approximate nearest neighbor (ANN) search.
| Index type | Speed | Accuracy | Memory | When to use |
|---|---|---|---|---|
| None | Slow | 100% recall | Low | Small tables or when exact results are required |
| HNSW | Fast | High (tunable) | Higher | Most production workloads — best query speed |
| IVFFlat | Moderate | Moderate (tunable) | Lower | Large datasets where memory is constrained |
Example: Set up a table and run retrieval
Create a table and add an embedding column:
CREATE TABLE test_rag ( id SERIAL PRIMARY KEY, chunk TEXT ); ALTER TABLE test_rag ADD COLUMN embedding VECTOR(1024); UPDATE test_rag SET embedding = rds_ai.embed(chunk)::vector;Create a vector index:
Both index types require the
vector_cosine_opsoperator class.-- HNSW index (recommended for most workloads) CREATE INDEX ON test_rag USING hnsw (embedding vector_cosine_ops); -- IVFFlat index (for memory-constrained environments) CREATE INDEX ON test_rag USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);Run a similarity search:
-- Simple call (uses defaults: text-embedding-v3, top 10, cosine distance) SELECT * FROM rds_ai.retrieve( 'Why is PostgreSQL considered the most advanced open-source database?', 'public', 'test_rag', 'chunk', 'embedding' );Full call with explicit parameters:
SELECT * FROM rds_ai.retrieve( embed_model => 'text-embedding-v3', question => 'Why is PostgreSQL considered the most advanced open-source database?', source_schema => 'public', -- schema of the source table source_table => 'test_rag', -- table to search chunk_col => 'chunk', -- text column vector_col => 'embedding', -- vector column topn => 10, -- number of results to return embed_args => '{}'::jsonb, -- extra parameters for the embedding model distance_type => 'cosine' -- distance algorithm: L1, L2, cosine, or negative );
RAG Q&A
rds_ai.rag combines vector retrieval and LLM generation in a single call. It retrieves the top-N relevant chunks from your table and passes them as context to the prompt model.
The default models are text-embedding-v3 for embedding and qwen-plus for text generation.
Simple call:
SELECT * FROM rds_ai.rag(
'Why is PostgreSQL considered the most advanced open-source database?',
'public', 'test_rag', 'chunk', 'embedding'
);Full call with explicit parameters:
SELECT * FROM rds_ai.rag(
embed_model => 'text-embedding-v3', -- embedding model
prompt_model => 'qwen-plus', -- text generation model
question => 'Why is PostgreSQL considered the most advanced open-source database?',
source_schema => 'public', -- schema of the source table
source_table => 'test_rag', -- table to search
chunk_col => 'chunk', -- text column
vector_col => 'embedding', -- vector column
topn => 10, -- number of chunks to retrieve
embed_args => '{}'::jsonb, -- extra parameters for the embedding model
prompt_args => '{}'::jsonb, -- extra parameters for the prompt model
distance_type => 'L2' -- distance algorithm: L1, L2, cosine, or inner product
);Custom models
rds_ai stores model configurations in the rds_ai.model_list table. Add any HTTP-compatible model by inserting a row into this table.
If you encounter issues when adding a custom model, contact us.
rds_ai.model_list fields
| Field | Type | Description |
|---|---|---|
model_name | name | Model name. Primary key. |
request_type | text | HTTP method (e.g., POST) |
request_header | http.http_header[] | HTTP request headers — used for authentication |
uri | text | Model endpoint URL |
content_type | text | Request content type (e.g., application/json) |
content_template | text | Request body template with %s placeholders filled at call time |
json_path | text | SQL expression to parse the HTTP response |
token | text | Authentication token placed into the request header |
Add a custom model
SELECT rds_ai.add_model(
'test-model', -- model name
'POST', -- HTTP method
ARRAY[('Authorization', 'Bearer %s')]::http.http_header[], -- request header
'https://****.com', -- model URL
'application/json', -- content type
'{"key":"%s"}', -- request body template
'SELECT %L' -- JSON response parser
);Call a custom model
Use rds_ai.raw_invoke_model to get the full HTTP response, or rds_ai.invoke_model to extract just the field defined in json_path:
-- Returns the full HTTP response
SELECT * FROM rds_ai.raw_invoke_model('qwen-plus', ARRAY['who are you']);
-- Returns the parsed result from json_path
SELECT * FROM rds_ai.invoke_model('qwen-plus', ARRAY['who are you']);Delete a custom model
SELECT rds_ai.del_model('test-model');Add a PAI-RAG model
Deploy your RDS instance and PAI-EAS service in the same VPC to keep all traffic within your private network.
Deploy the RAG service in PAI-EAS. See Deploy an LLM-based RAG chatbot using PAI-EAS and ApsaraDB RDS for PostgreSQL.
Get the service endpoint and token:
Click the RAG service name to open the Service Details page.
In Basic Information, click View Endpoint Information.
In the Invocation Information dialog box, copy the service endpoint and token.
Add the PAI-RAG model. Replace the URL with your actual endpoint and append
/service/query:SELECT rds_ai.add_model( 'pai-rag', 'POST', ARRAY[('Authorization', '%s')]::http.http_header[], 'http://rds-pai-rag-demo.****.cn-hangzhou.pai-eas.aliyuncs.com/service/query', 'application/json', '{"question": "%s"}', 'SELECT (%L::jsonb->''answer'')::text' );Set the authentication token:
SELECT rds_ai.update_model('pai-rag', 'token', 'MTFkYjMwZjgzYzA1YmE2N2YyNWMxM2NkNDVjMjEzNjYxMDAzMzE5****');Verify the model:
SELECT rds_ai.invoke_model('pai-rag', ARRAY['What parameters are related to WAL log accumulation?']);
For more information on calling PAI-RAG models, see Call PAI models using APIs.
Add a Function Compute RAG model
Deploy the AgentCraft application in Function Compute. See Cloud Deployment of AgentCraft.
Create an agent and a Client Access integration. When you create the Client Integration (as in Integrate an agent into a DingTalk robot), save the endpoint and token.
Add the Function Compute RAG model. Replace the URL with your actual endpoint and append
/completions:SELECT rds_ai.add_model( 'fc-rag', 'POST', ARRAY[('Authorization', 'Bearer %s')]::http.http_header[], 'https://agentcrckend-de-obnhjsknam.cn-hangzhou.fcapp.run/v1/chat/completions', 'application/json', '{ "messages": [{"role": "user", "content": "%s"}], "stream": false, "max_tokens": 1024 }', 'SELECT (%L::jsonb->''choices''->0->''message''->>''content'')::text' );Set the authentication token:
SELECT rds_ai.update_model('fc-rag', 'token', '8UiGAziWgYGPxM3qR5sAChBfDJRt****');Verify the model:
SELECT rds_ai.invoke_model('fc-rag', ARRAY['What parameters are related to WAL log accumulation?']);
API reference
What's next
pgvector User Guide — learn more about vector storage and search in PostgreSQL
Deploy an LLM-based RAG chatbot using PAI-EAS and ApsaraDB RDS for PostgreSQL — end-to-end RAG chatbot deployment guide