This topic describes the basic methods for using vector databases.
To use the pgvector extension, make sure that the revision version of your cluster meets the requirements. You can check the revision version of your cluster in the PolarDB console or by executing the SHOW polardb_version;
statement. If the revision version does not meet the requirements, upgrade it.
Enable the pgvector extension and create a vector table
Use the privileged account to enable the pgvector extension in the database where you want to use it.
The functionality and capabilities of the pgvector extension are limited to the database in which it is installed. If you want to use the pgvector extension in multiple databases in the same PostgreSQL cluster, separately install and configure the pgvector extension for each database.
CREATE EXTENSION IF NOT EXISTS vector;
After you enable the pgvector extension, you can perform the following steps to create a vector table.
Create a table with a column to hold 3-dimensional vectors.
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
Insert vectors into the table.
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
Find the nearest neighbors based on the Euclidean distance (L2 distance).
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
The inner product (
<#>
), cosine distance (<=>
), and L1 distance (<+>
) metrics are also supported.Note<#>
returns the negative inner product because PostgreSQL only supports ascending index scans on operators.
Store
Create a table with a column to hold 3-dimensional vectors.
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
NoteYou can also add a vector column to an existing table by executing the
ALTER TABLE items ADD COLUMN embedding vector(3);
statement.Insert vectors.
Insert vectors into the table.
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
Use COPY to load vectors in bulk.
COPY items (embedding) FROM STDIN WITH (FORMAT BINARY);
Insert vectors while handling potential conflicts, such as primary key conflicts.
INSERT INTO items (id, embedding) VALUES (1, '[1,2,3]'), (2, '[4,5,6]') ON CONFLICT (id) DO UPDATE SET embedding = EXCLUDED.embedding;
In this case, the
embedding
column of the existing row is updated with the new value if a primary key conflict occurs between the inserted and existing data. This ensures data consistency and uniqueness.
Update vectors.
UPDATE items SET embedding = '[1,2,3]' WHERE id = 1;
Delete vectors.
DELETE FROM items WHERE id = 1;
Query
Distance
Find the nearest neighbors to a vector.
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
The supported distance metrics include:
<->
: Euclidean distance (L2).<#>
: Inner product.<=>
: Cosine distance.<+>
: Manhattan distance (L1).<~>
: Hamming distance (binary vectors).<%>
: Jaccard distance (binary vectors).
Examples
Combine with ORDER BY
and LIMIT
to use an index.
Find the top five nearest neighbors to a row.
SELECT * FROM items WHERE id != 1 ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
Find rows within a specific distance.
SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;
Get the distance.
SELECT embedding <-> '[3,1,2]' AS distance FROM items;
Get the inner product.
Note<#>
returns the negative of the inner product. The result is multiplied by-1
to obtain the actual inner product.SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
Get the cosine similarity, which is derived by subtracting the cosine distance from 1.
SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;
Aggregate
Average vectors.
SELECT AVG(embedding) FROM items;
Average groups of vectors.
SELECT id, AVG(embedding) FROM items GROUP BY id;