This topic describes how to perform vector processing in Hologres.

## Background information

Proxima is a high-performance software library developed by Alibaba DAMO Academy. It allows you to search for the nearest neighbors of vectors. Proxima provides higher stability and performance than similar open source software such as Facebook AI Similarity Search (Fassi). Proxima provides basic modules that have leading performance and effects in the industry and allows you to search for similar images, videos, or human faces. Hologres is deeply integrated with Proxima to provide a high-performance vector search service.

## Proxima

- Terms
- Feature vector: A vector is the algebraic representation of an entity or application. The vector abstracts the relationship between entities into the distance in the vector space, and the distance indicates the degree of similarity. Examples: height, age, gender, and region.
- Vector search: fast search and match performed on a feature vector dataset. K-nearest neighbors (KNN) and Radius nearest neighbors (RNN) searches are commonly involved.
- KNN: searches for the K points nearest a point.
- RNN: searches for all points within a circle whose center is a specified point and radius is specified.

- Basic model of Proxima
The basic model of Proxima is divided into two parts: index building and online searches. An index file is built from original vector data and passed to the online search module for loading and use. After the index file is loaded, you can perform vector searches.
- Index building: supports brute force, k-dimensional (k-d) tree, product quantification, KNN graph, and locality-sensitive hashing (LSH).
- Online search: performs KNN and RNN searches on a clustered dataset. Users sets the parameters during the searches.

- Mappings between terms in Proxima and Hologres
Term in Proxima Term in Hologres Feature vector Array Vector index Index of a special type Distance calculation - proxima_distance(): one type of user-defined function (UDF).
- Each type of distance calculation corresponds to a UDF.

KNN search order by distance(x, [x1, x2]) asc limit k RNN search where distance(x, [x1,x2]) < r

## Use Proxima to perform vector processing

To use Proxima to perform vector processing in Hologres, perform the following steps:

## Example

```
create extension proxima;
CALL HG_CREATE_TABLE_GROUP ('tg_1', 1);
begin;
create table feature_tb (
id bigint,
feature float4[] check(array_ndims(feature) = 1 and array_length(feature, 1) = 4)
);
call set_table_property('feature_tb', 'proxima_vectors', '{"feature":{"algorithm":"Graph","distance_method":"SquaredEuclidean","builder_params":
{"min_flush_proxima_row_count" : 1000}, "searcher_init_params":{}}}');
call set_table_property('feature_tb','table_group','tg_1');
end;
insert into feature_tb select i, array[random(), random(), random(), random()]::float4[] from generate_series(1, 10000) i;
analyze feature_tb;
select pm_approx_squared_euclidean_distance(feature, '{0.1,0.2,0.3,0.4}') as distance from feature_tb order by distance desc limit 10 ;
```

## Distance calculation functions

Hologres supports the following three functions that are used to calculate the vector distance:

- The SquaredEuclidean function uses the following calculation formula:
- The Euclidean function uses the following calculation formula:
- The InnerProduct function uses the following calculation formula:

**Note**For example, you use the Euclidean or SquaredEuclidean function to perform vector processing. In comparison with the Euclidean function, the SquaredEuclidean function does not need to extract the square root to obtain the same top K list as the Euclidean function. Therefore, the SquaredEuclidean function provides better performance. When the functional requirements are met, we recommend that you use the SquaredEuclidean function.

## FAQ

- The error message
`ERROR: function pm_approx_inner_product_distance(real[], unknown) does not exist`

is returned.Cause: The

`create extension proxima;`

statement is not executed in the database to initialize the Proxima plug-in.Solution: Execute the

`create extension proxima;`

statement to initialize the Proxima plug-in. - The error message
`Writting column: feature with array size: 5 violates fixed size list (4) constraint declared in schema`

is returned.Cause: The dimension of data that is written to the feature vector column is different from the dimension that is defined for the vector field in the table.

Solution: Check whether dirty data exists.

- The error message
`The size of two array must be the same in DistanceFunction, size of left array: 4, size of right array:`

is returned.Cause: In the pm_xx_distance(left, right) function, the dimension of the left variable is different from that of the right variable.

Solution: Change the dimension of the left variable to be the same as that of the right variable in the pm_xx_distance(left, right) function.

- How do I write data to a vector column in Java?
The following sample code provides an example on how to write data to a vector column in Java:
`private static void insertIntoVector(Connection conn) throws Exception { try (PreparedStatement stmt = conn.prepareStatement("insert into feature_tb values(?,?);")) { for (int i = 0; i < 100; ++i) { stmt.setInt(1, i); Float[] featureVector = {0.1f,0.2f,0.3f,0.4f}; Array array = conn.createArrayOf("FLOAT4", featureVector); stmt.setArray(2, array); stmt.execute(); } } }`

- How do I check based on the execution plan whether the Proxima index is used?
If
`Proxima filter: xxxx`

exists in the execution plan, the index is used, as shown in the following figure. Otherwise, the index is not used. Generally, this is because the table creation statement does not match the query statement.