The rds_embedding extension for RDS PostgreSQL converts text to vectors directly within your database. The extension provides custom model configuration and model calling capabilities. This simplifies and accelerates text-to-vector conversion to meet your data processing needs.
Background
Embedding is the process of mapping high-dimensional data to a low-dimensional representation. In machine learning and natural language processing (NLP), an embedding represents discrete symbols or objects as points in a continuous vector space.
When you generate embeddings, the values of the vector data depend on the data in the referenced model. RDS PostgreSQL supports the rds_embedding extension, which generates vector data from text in the database based on an imported external model. The extension also supports using a vector similarity operator to calculate the similarity between text in the database and specified text in the model. This capability enables more business scenarios.
Prerequisites
The RDS instance runs PostgreSQL 14 or later.
If your major engine version meets the requirement but the extension is not supported, upgrade the minor engine version. For example, for an RDS instance that runs PostgreSQL 17, the minor engine version must be 20241030 or later. For more information, see Upgrade the minor engine version.
This topic uses the text embedding model from Alibaba Cloud Model Studio. Activate Alibaba Cloud Model Studio and obtain an API key. For more information, see Get your API key.
NoteIn addition to the text embedding model used in this topic, you can use the functions provided by the rds_embedding extension to add other models. For more information, see Functions provided by the rds_embedding extension.
By default, an RDS PostgreSQL database cannot access the Internet. To allow access to external models, configure a NAT Gateway for the VPC where the RDS PostgreSQL instance resides. For more information about NAT Gateways, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
Enable and disable the extensions
Use a privileged account to execute the following commands.
Enable the extensions.
Before you enable the
rds_embeddingextension, you must enable thevectorextension. Thevectorextension provides the required vector data type support and basic vector operations, such as calculating the distance and similarity between vectors. Therds_embeddingextension focuses on converting high-dimensional text data to vectors.CREATE EXTENSION vector; CREATE EXTENSION rds_embedding;Disable the extensions.
DROP EXTENSION rds_embedding; DROP EXTENSION vector;
Usage examples
This topic uses the text-embedding-v3 model provided by Alibaba Cloud Model Studio as an example. For more information about text embedding models, see Model introduction.
Create a test table named test.
CREATE TABLE test(info text, vec vector(1024) NOT NULL);Add a model.
SELECT rds_embedding.add_model( 'text-embedding-v3', 'https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding', 'Authorization: Bearer sk-****', '{"input":{"texts":["%s"]},"model":"text-embedding-v3","parameters":{"text_type":"query"}}', '->''output''->''embeddings''->0->>''embedding''' );NoteFor more information about the rds_embedding.add_model() function, see rds_embedding.add_model().
Insert text and its corresponding vector data.
INSERT INTO test SELECT 'Windy high sky, apes cry sadly', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', 'Windy high sky, apes cry sadly')::real[]; INSERT INTO test SELECT 'Clear islet, white sand, birds fly back', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', 'Clear islet, white sand, birds fly back')::real[]; INSERT INTO test SELECT 'Boundless falling leaves rustle down', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', 'Boundless falling leaves rustle down')::real[]; INSERT INTO test SELECT 'Endless Yangtze River rolls on', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', 'Endless Yangtze River rolls on')::real[];NoteFor more information about the rds_embedding.get_embedding_by_model() function, see rds_embedding.get_embedding_by_model().
Calculate the vector similarity between the text
Endless Yangtze River rolls onand each text entry in the test table.SELECT info, vec <=> rds_embedding.get_embedding_by_model( 'text-embedding-v3', 'sk-****', 'Endless Yangtze River rolls on' )::real[]::vector AS distance FROM test ORDER BY vec <=> rds_embedding.get_embedding_by_model( 'text-embedding-v3', 'sk-****', 'Endless Yangtze River rolls on' )::real[]::vector;Sample result:
info | distance ------------------------------------+-------------------- Endless Yangtze River rolls on | 0 Boundless falling leaves rustle down | 0.42740682200152647 Windy high sky, apes cry sadly | 0.5247695147991147 Clear islet, white sand, birds fly back | 0.5161883811726116 (4 rows)
