All Products
Search
Document Center

PolarDB:Vectors and index precisions

Last Updated:Apr 25, 2025

This topic describes vector representations and index precisions.

Half-precision vectors

Compared with single-precision vectors, half-precision vectors can save 50% of storage space, making them suitable for scenarios that require efficient storage instead of high precision. Use the halfvec type to store half-precision vectors.

CREATE TABLE items (id bigserial PRIMARY KEY, embedding halfvec(3));

Binary vectors

Use the bit type to store binary vectors.

CREATE TABLE items_bit (id bigserial PRIMARY KEY, embedding bit(3));
INSERT INTO items_bit (embedding) VALUES ('000'), ('111');

Find the nearest neighbors based on the Hamming distance.

SELECT * FROM items_bit ORDER BY embedding <~> '101' LIMIT 5;

You can also find the nearest neighbors based on the Jaccard distance <%> metric.

Sparse vectors

Use the sparsevec type to store sparse vectors.

CREATE TABLE items_spa (id bigserial PRIMARY KEY, embedding sparsevec(5));
INSERT INTO items_spa (embedding) VALUES ('{1:1,3:2,5:3}/5'), ('{1:4,3:5,5:6}/5');

Specify the sparse vectors in the {index1:value1,index2:value2}/dimension format. The indexes start at 1, similar to SQL arrays.

Find the nearest neighbors based on the L2 distance.

SELECT * FROM items_spa ORDER BY embedding <-> '{1:3,3:1,5:2}/5' LIMIT 5;

Half-precision indexing

Index vectors at half precision for smaller indexes.

CREATE INDEX ON items USING hnsw ((embedding::halfvec(3)) halfvec_l2_ops);

Find the nearest neighbors based on the indexes.

SELECT * FROM items ORDER BY embedding::halfvec(3) <-> '[1,2,3]' LIMIT 5;

Binary quantization

Use expression indexing for binary quantization.

CREATE INDEX ON items USING hnsw ((binary_quantize(embedding)::bit(3)) bit_hamming_ops);

Find the nearest neighbors based on the Hamming distance.

SELECT * FROM items ORDER BY binary_quantize(embedding)::bit(3) <~> binary_quantize('[1,-2,3]'::halfvec) LIMIT 5;

Re-rank based on the original vectors to improve recall.

SELECT * FROM (
    SELECT * FROM items ORDER BY binary_quantize(embedding)::bit(3) <~> binary_quantize('[1,-2,3]'::halfvec) LIMIT 20
) AS foo ORDER BY embedding <=> '[1,-2,3]' LIMIT 5;