The rds_embedding extension of ApsaraDB RDS for PostgreSQL allows you to convert text in your ApsaraDB RDS for PostgreSQL instance into vectors. The extension provides custom model configuration and model invocation capabilities to facilitate the conversion and meet specific data processing requirements.
Background information
Embedding is a technique that translates high-dimensional data into a low-dimensional representation. In machine learning and natural language processing (NLP), embedding is a common method that is used to represent sparse symbols or objects as points within a vector space.
During embedding generation, the vectors are obtained based on the model that is referenced. ApsaraDB RDS for PostgreSQL allows you to use the rds_embedding extension to convert text in your RDS instance into vectors based on an external model that is referenced. ApsaraDB RDS for PostgreSQL also allows you to use a vector similarity operator to calculate the similarity between the text in the RDS instance and the specified text in the referenced model. This helps meet your business requirements in various scenarios.
Prerequisites
The RDS instance runs PostgreSQL 14 or later.
The minor engine version of the RDS instance is updated if the major engine version of the RDS instance meets the requirements but the extension is still not supported. For example, if your RDS instance runs PostgreSQL 17, the minor engine version of the RDS instance must be 20241030 or later. For more information, see Update the minor engine version.
In this topic, the text embedding model provided by Alibaba Cloud Model Studio is used. You must activate Alibaba Cloud Model Studio and obtain the required API key. For more information, see Obtain an API key.
NoteIn addition to the text embedding model, you can use the functions supported by the rds_embedding extension to add other models. For more information, see Functions supported by the rds_embedding extension.
The RDS instance is connected over the Internet. By default, you cannot connect to the RDS instance over the Internet. You must create a NAT gateway for the virtual private cloud (VPC) in which the RDS instance resides. This way, you can connect to the RDS instance over the Internet and the RDS instance can access external models. For more information about NAT gateways, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
Enable or disable the extensions
You must use a privileged account to execute the statements in this section.
Enable the extensions.
Before you enable the
rds_embedding
extension, you must enable thevector
extension. Thevector
extension supports the required vector data types and basic vector operations, such as calculations of the distance and similarity between vectors. Therds_embedding
extension only translates high-dimensional text into vectors.CREATE EXTENSION vector; CREATE EXTENSION rds_embedding;
Disable the extensions.
DROP EXTENSION rds_embedding; DROP EXTENSION vector;
Examples
In this example, the text-embedding-v3 model provided by Alibaba Cloud Model Studio is used. For more information about text embedding models, see Model overview.
Create a test table named test.
CREATE TABLE test(info text, vec vector(1024) NOT NULL);
Add a model.
SELECT rds_embedding.add_model( 'text-embedding-v3', 'https://dashscope-intl.aliyuncs.com/api/v1/services/embeddings/text-embedding/text-embedding', 'Authorization: Bearer sk-****', '{"input":{"texts":["%s"]},"model":"text-embedding-v3","parameters":{"text_type":"query"}}', '->''output''->''embeddings''->0->>''embedding''' );
NoteFor more information about rds_embedding.add_model(), see rds_embedding.add_model().
Insert text and the required vector data.
INSERT INTO test SELECT '风急天高猿啸哀', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', '风急天高猿啸哀')::real[]; INSERT INTO test SELECT '渚清沙白鸟飞回', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', '渚清沙白鸟飞回')::real[]; INSERT INTO test SELECT '无边落木萧萧下', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', '无边落木萧萧下')::real[]; INSERT INTO test SELECT '不尽长江滚滚来', rds_embedding.get_embedding_by_model('text-embedding-v3', 'sk-****', '不尽长江滚滚来')::real[];
NoteFor more information about rds_embedding.get_embedding_by_model(), see rds_embedding.get_embedding_by_model().
Calculate the similarities between the text
不尽长江滚滚来
and the vectors of each piece of text in the test table.SELECT info, vec <=> rds_embedding.get_embedding_by_model( 'text-embedding-v3', 'sk-****', '不尽长江滚滚来' )::real[]::vector AS distance FROM test ORDER BY vec <=> rds_embedding.get_embedding_by_model( 'text-embedding-v3', 'sk-****', '不尽长江滚滚来' )::real[]::vector;
Sample output:
info | distance ----------------+-------------------- 不尽长江滚滚来 | 0 无边落木萧萧下 | 0.42740682200152647 风急天高猿啸哀 | 0.5247695147991147 渚清沙白鸟飞回 | 0.5161883811726116 (4 rows)