All Products
Search
Document Center

Hologres:Use HGraph indexes (Recommended)

Last Updated:Dec 11, 2025

The vector computation feature in Hologres is useful for similarity searches, image retrieval, and scene recognition. This feature improves data processing and analysis and lets you build more precise search and recommendation functions. This topic provides complete examples to demonstrate how to use vector computation in Hologres.

Notes

  • Hologres V4.0 and later supports the HGraph vector retrieval algorithm.

  • Vector indexes are supported only for column-oriented and row-column hybrid tables. They are not supported for row-oriented tables.

  • If you delete or rebuild a table that has a vector index, for example, using Insert Overwrite, do not enable the recycle bin feature. This is because tables in the recycle bin still consume memory.

  • After you create a vector index, the manifest is built during the compaction process that follows data import.

  • Data in a memory table (Mem Table) does not have a vector index. When you run a vector retrieval request, this data is processed using brute-force computation.

  • You can use Serverless Computing resources to perform batch data imports. Serverless resources synchronously complete compaction and index building during data import. For more information, see Use Serverless Computing to run read and write tasks and Use Serverless Computing to run compaction tasks.

  • If you do not use Serverless resources, you must manually run the following command to trigger compaction after you import data in batches or modify an index. For more information, see Compaction (Beta).

  • SELECT hologres.hg_full_compact_table('<schema_name>.<table_name>', 'max_file_size_mb=4096');
  • You can also use Serverless Computing resources to run vector retrieval queries.

Manage vector indexes

Create an index

You can create a vector index when you create a table. The syntax is as follows:

Note: In Hologres, a vector is represented as a float4 array. The vector dimension is the length of the one-dimensional array, which is specified by `array_length` in the following syntax.

CREATE TABLE <table_name> (
    <vector_column_name> float4[] CHECK (array_ndims(<vector_column_name>) = 1 AND array_length(<vector_column_name>, 1) = <dim>)
)
WITH (
    vectors = '{
    "<vector_column_name>": {
        "algorithm": "<algorithm>",
        "distance_method": "<distance_method>",
        "builder_params": {
            "<builder_parameters_name>": <value>
            [, ...]
        }
    }
    [ , "<vector_column_name_2>": { ... } ]
  }'
);

Parameters:

Parameter

Description

table_name

The name of the target table.

vector_column_name

The name of the target vector column.

dim

The vector dimension of the target column.

The `vectors` parameter for the vector index has the following requirements:

  • The value must be a JSON string. The top level supports only one `vector_column_name` key, which specifies the name of the vector column on which to build an index.

  • The value of the `vector_column_name` key is a JSON object that is used to configure vector index parameters. The following keys are supported.

Key

Description

algorithm

The vector index algorithm. This parameter is required. Only HGraph is supported.

distance_method

The method to calculate vector distance. This parameter is required. The following values are supported:

  • Euclidean: Euclidean distance. Only ascending order is supported, which is `ORDER BY distance ASC`.

  • InnerProduct: Inner product. Only descending order is supported, which is `ORDER BY distance DESC`.

  • Cosine: Cosine distance. Only descending order is supported, which is `ORDER BY distance DESC`.

Note: The distance function used for vector retrieval must correspond to the distance method used for the vector index. The sorting requirement must also be met. Otherwise, the vector index cannot be used.

builder_params

The parameters for index building. The value must be a JSON string. For more information about the parameters, see the following section.

  • max_degree

  • ef_construction

  • base_quantization_type

  • use_reorder

  • precise_quantization_type

  • precise_io_type

The `builder_params` parameter supports the following parameters:

Parameter

Description

max_degree

During the index building process, each vertex attempts to establish connections with its max_degree nearest vertices. This parameter is optional, and the default value is 64. A larger value leads to a wider search scope for each vertex and higher search efficiency, but it also increases the costs of graph building and storage. In general, a value greater than 96 is not recommended.

ef_construction

Controls the search depth during the index building process. This parameter is optional. The default value is 400. A larger value increases the number of candidates considered as neighbors for a vertex during index building. This improves index precision but also increases the time and computational complexity of index building. Do not set this value to more than 600.

base_quantization_type

The quantization method for the low-precision HGraph index. This parameter is required. The following methods are supported:

  • sq8

  • sq8_uniform

  • fp16

  • fp32

  • rabitq

use_reorder

Specifies whether to use a high-precision HGraph index. This parameter is optional. The default value is FALSE.

precise_quantization_type

The quantization method for the high-precision HGraph index. This parameter is optional. The default value is fp32. Do not change this value. The following methods are supported. Select a quantization method with higher precision than `base_quantization_type`.

  • sq8

  • sq8_uniform

  • fp16

  • fp32

precise_io_type

The storage medium for the hybrid high-precision and low-precision HGraph index. This parameter is optional and takes effect only when `use_reorder` is TRUE. The default value is `block_memory_io`. The following values are supported:

  • block_memory_io: Both low-precision and high-precision indexes are stored in memory.

  • reader_io: The low-precision index is stored in memory, and the high-precision index is stored on disk.

Modify an index

The syntax is as follows:

ALTER TABLE <table_name>
SET (
    vectors = '{
    "<vector_column_name>": {
        "algorithm": "<algorithm>",
        "distance_method": "<distance_method>",
        "builder_params": {
            "<builder_parameters_name>": <value>
            [, ...]
        }
    }
  }'
);

Delete an index

-- Delete the vector indexes for all columns in the table
ALTER TABLE <table_name>
SET (
    vectors = '{}'
);

-- If a table has vector indexes on both the col1 and col2 columns, and you need to delete the index for the col2 column, run the ALTER TABLE statement to retain only the index for the col1 column.
ALTER TABLE <table_name>
SET (
    vectors = '{
    "col1": { ... }
  }'
);

View an index

You can query the `hologres.hg_table_properties` system table to view created vector indexes.

SELECT
    *
FROM
    hologres.hg_table_properties
WHERE 
    table_name = '<table_name>'
    AND property_key = 'vectors';

Use vector indexes for vector retrieval

Vector distance functions

Hologres vector retrieval supports both approximate retrieval and exact retrieval. Only approximate retrieval functions can use a created vector index to accelerate queries. The function must correspond to the `distance_method` of the vector index. Exact retrieval functions cannot use vector indexes.

Note: Vector distance functions do not support all-constant request parameters.

Function

Retrieval type

Request parameters

Return value

Description

approx_euclidean_distance

Approximate retrieval




float4[], float4[]

float4

An approximate retrieval function for Euclidean distance.

approx_inner_product_distance

Approximate retrieval

float4[], float4[]

float4

An approximate retrieval function for inner product.

approx_cosine_distance

Approximate retrieval

float4[], float4[]

float4

An approximate retrieval function for cosine distance.

euclidean_distance

Exact retrieval

float4[], float4[]

float4

An exact retrieval function for Euclidean distance.

inner_product_distance

Exact retrieval

float4[], float4[]

float4

An exact retrieval function for inner product.

cosine_distance

Exact retrieval

float4[], float4[]

float4

An exact retrieval function for cosine distance.

Verify vector index usage

You can check the execution plan to determine whether a SQL statement uses a vector index. If the plan contains "Vector Filter", the index is used successfully. For more information, see EXPLAIN and EXPLAIN ANALYZE.

  • Example:

    SELECT
        id,
        approx_euclidean_distance (feature, '{0.1,0.2,0.3,0.4}') AS distance
    FROM
        feature_tb
    ORDER BY
        distance
    LIMIT 40;
  • Execution plan:

    Limit  (cost=0.00..182.75 rows=40 width=12)
      ->  Sort  (cost=0.00..182.75 rows=160 width=12)
            Sort Key: (VectorDistanceRef)
            ->  Gather  (cost=0.00..181.95 rows=160 width=12)
                  ->  Limit  (cost=0.00..181.94 rows=160 width=12)
                        ->  Sort  (cost=0.00..181.94 rows=40000 width=12)
                              Sort Key: (VectorDistanceRef)
                              ->  Local Gather  (cost=0.00..91.53 rows=40000 width=12)
                                    ->  Limit  (cost=0.00..91.53 rows=40000 width=12)
                                          ->  Sort  (cost=0.00..91.53 rows=40000 width=12)
                                                Sort Key: (VectorDistanceRef)
                                                ->  Project  (cost=0.00..1.12 rows=40000 width=12)
                                                      ->  Index Scan using Clustering_index on feature_tb  (cost=0.00..1.00 rows=40000 width=8)
                                                            Vector Filter: VectorCond => KNN: '40'::bigint distance_method: approx_euclidean_distance search_params: {NULL} args: {feature'{0.100000001,0.200000003,0.300000012,0.400000006}'::real[]}
    Query Queue: init_warehouse.default_queue
    Optimizer: HQO version 4.0.0

Examples

  • Create a table.

    -- Create a table group with a shard count of 4.
    CALL HG_CREATE_TABLE_GROUP ('test_tg_shard_4', 4);
    
    -- Create a table.
    CREATE TABLE feature_tb (
        id bigint,
      	feature float4[] CHECK(array_ndims(feature) = 1 AND array_length(feature, 1) = 4)
    )
    WITH (
        table_group = 'test_tg_shard_4',
        vectors = '{
        "feature": {
            "algorithm": "HGraph",
            "distance_method": "Cosine",
            "builder_params": {
                "base_quantization_type": "rabitq",
                "max_degree": 64,
                "ef_construction": 400,
                "precise_quantization_type": "fp32",
                "use_reorder": true,
                "max_total_size_to_merge_mb" : 4096
            }
        }
        }'
    );
  • Import data.

    -- (Optional) Use Serverless Computing to run large-volume offline data import and extract, transform, and load (ETL) jobs. Compaction and index building are synchronously completed during the import.
    SET hg_computing_resource = 'serverless';
    SET hg_serverless_computing_run_compaction_before_commit_bulk_load = on;
    
    INSERT INTO feature_tb SELECT i, array[random(), random(), random(), random()]::float4[] FROM generate_series(1, 100000) i;
    
    -- Reset the configuration to ensure that unnecessary SQL statements do not use Serverless resources.
    RESET hg_computing_resource;
  • Perform an approximate vector retrieval.

    -- Calculate the top 40 results based on cosine distance.
    SELECT
        id,
        approx_cosine_distance (feature, '{0.1,0.2,0.3,0.4}') AS distance
    FROM
        feature_tb
    ORDER BY
        distance DESC
    LIMIT 40;
  • Perform an exact vector retrieval.

    -- Exact retrieval does not use a vector index. Therefore, the distance function does not need to be the same as the distance_method of the vector index.
    SELECT
        id,
        cosine_distance (feature, '{0.1,0.2,0.3,0.4}') AS distance
    FROM
        feature_tb
    ORDER BY
        distance DESC
    LIMIT 40;

Performance tuning

Use vector indexes appropriately

If the data volume is small, such as tens of thousands of rows, or if the instance has sufficient compute resources, do not set a vector index. Use brute-force computation instead. You should use a vector index only when brute-force computation cannot meet your requirements for latency or throughput. The reasons are as follows:

  • Vector indexes are lossy. The result accuracy, or recall rate, cannot reach 100%.

  • A vector index might return fewer items than requested. For example, a query with `LIMIT 1000` might return only 500 items.

When you use a vector index, configure it as follows. The following example uses a non-partitioned table that has a single column of 768-dimension vectors.

  • Latency-sensitive scenarios: Use an in-memory-only index. Set the index quantization method to `sq8_uniform` or `rabitq`. The data volume in a single shard should not exceed 5 million rows.

  • Non-latency-sensitive or large data volume scenarios: Use a hybrid memory-disk index. Set the index quantization method to `rabitq`. The data volume in a single shard should not exceed 30 million to 50 million rows.

  • Note: If you set vector indexes on multiple columns, you must reduce the recommended data volume for a single shard proportionally. The vector dimension also affects this recommendation.

Example:

-- Complete example of a hybrid index
CREATE TABLE feature_tb (
    id bigint,
  	feature float4[] CHECK(array_ndims(feature) = 1 AND array_length(feature, 1) = 4)
)
WITH (
    table_group = 'test_tg_shard_4',
    vectors = '{
    "feature": {
        "algorithm": "HGraph",
        "distance_method": "Cosine",
        "builder_params": {
            "base_quantization_type": "rabitq",
            "max_degree": 64,
            "ef_construction": 400,
            "precise_quantization_type": "fp32",
            "precise_io_type": "reader_io",
            "use_reorder": true,
            "max_total_size_to_merge_mb" : 4096
        }
    }
    }'
);


-- Complete example of an in-memory-only index
CREATE TABLE feature_tb (
    id bigint,
  	feature float4[] CHECK(array_ndims(feature) = 1 AND array_length(feature, 1) = 4)
)
WITH (
    table_group = 'test_tg_shard_4',
    vectors = '{
    "feature": {
        "algorithm": "HGraph",
        "distance_method": "Cosine",
        "builder_params": {
            "base_quantization_type": "sq8_uniform",
            "max_degree": 64,
            "ef_construction": 400,
            "precise_quantization_type": "fp32",
            "use_reorder": true,
            "max_total_size_to_merge_mb" : 4096
        }
    }
    }'
);

Improve the recall rate

This section uses the VectorDBBench dataset as an example to describe how to improve the recall rate.

Multiple factors can affect the recall rate. With the following index parameter settings, the default system recall rate can typically exceed 95%:

  • `base_quantization_type` is `rabitq` or `sq8_uniform`.

  • `precise_quantization_type` is `fp32`.

  • `max_degree` is `64`.

  • `ef_construction` is `400`.

  • `hg_vector_ef_search` is set to the system default of 80.

To further increase the recall rate to over 99%, you can set `hg_vector_ef_search` to 400 while keeping other parameters unchanged. However, improving the recall rate increases query latency and compute resource usage.

To increase the recall rate to a range of 99.5% to 99.7%, you can further adjust the `max_degree`, `ef_construction`, and `hg_vector_ef_search` parameters. This adjustment increases query latency, query resource consumption, index building time, and index building resource consumption. For example:

  • `max_degree` = 96.

  • `ef_construction` = 500 or 600.

  • `hg_vector_ef_search` = 500 or 600.

Set an appropriate shard count

A higher shard count results in more manifest files and lower query throughput. In practice, you should set an appropriate shard count based on your instance resources. You can typically set the shard count to the number of workers. For example, for a 64-core instance, you can set the shard count to 4. To reduce the latency of a single query, you can decrease the shard count, but this also reduces write performance.

-- Create a vector table and place it in a table group with a shard count of 4.
CALL HG_CREATE_TABLE_GROUP ('test_tg_shard_4', 4);

CREATE TABLE feature_tb (
    id bigint,
  	feature float4[] CHECK(array_ndims(feature) = 1 AND array_length(feature, 1) = 4)
)
WITH (
    table_group = 'test_tg_shard_4',
    vectors = '{
    "feature": {
        "algorithm": "HGraph",
        "distance_method": "Cosine",
        "builder_params": {
            "base_quantization_type": "sq8_uniform",
            "max_degree": 64,
            "ef_construction": 400,
            "precise_quantization_type": "fp32",
            "use_reorder": true,
            "max_total_size_to_merge_mb" : 4096
        }
    }
    }'
);

Hybrid vector and scalar query scenarios

For vector retrieval with filter conditions, several common filtering scenarios exist, as described in this section.

Query scenario 1: A string column is used as a filter condition

The following query is an example. A common scenario is to find corresponding vector data within an organization, such as finding face data within a class.

SELECT(feature, '{1,2,3,4}') AS d FROM feature_tb WHERE uuid = 'x' ORDER BY d LIMIT 10;

You can perform the following optimizations:

  • Set `uuid` as the distribution key. This ensures that data with the same filter value is stored in the same shard. A query then accesses only a single shard.

  • Set `uuid` as the clustering key for the table. The data within files is then sorted by the clustering key.

Query scenario 2: A time field is used as a filter condition

The following query is an example. Typically, a time field is used to filter vector data. You can set the time field `time_field` as the segment key for the table. This allows for the rapid location of the files that contain the data.

SELECT xx_distance(feature, '{1,2,3,4}') AS d FROM feature_tb WHERE time_field BETWEEN '2020-08-30 00:00:00' AND '2020-08-30 12:00:00' ORDER BY d LIMIT 10;

Therefore, for vector retrieval with any filter conditions, the `CREATE TABLE` statement is typically as follows:

-- Note: If you do not filter by time, you can remove the indexes related to time_field.
CREATE TABLE feature_tb (
    time_field timestamptz NOT NULL,
    uuid text NOT NULL,
    feature float4[] CHECK(array_ndims(feature) = 1 AND array_length(feature, 1) = 4)
)
WITH (
    distribution_key = 'uuid',
    segment_key = 'time_field',
    clustering_key = 'uuid',
    vectors = '{
    "feature": {
        "algorithm": "HGraph",
        "distance_method": "Cosine",
        "builder_params": {
            "base_quantization_type": "sq8_uniform",
            "max_degree": 64,
            "ef_construction": 400,
            "precise_quantization_type": "fp32",
            "use_reorder": true,
            "max_total_size_to_merge_mb" : 4096
        }
    }
    }'
);

FAQ

  • Q: An error is reported: "Writting column: feature with array size: 5 violates fixed size list (4) constraint declared in schema".

    A: This error occurs because the dimension of the data written to the feature vector column does not match the dimension defined in the table. Check for dirty data.

  • Q: An error is reported: "The size of two array must be the same in DistanceFunction, size of left array: 4, size of right array: x".

    A: This error occurs because the dimension of `left` does not match the dimension of `right` in `xx_distance(left, right)`.

  • Q: How do I write vector data using Java?

    A: You can use the following sample Java code:

    private static void insertIntoVector(Connection conn) throws Exception {
        try (PreparedStatement stmt = conn.prepareStatement("INSERT INTO feature_tb VALUES(?,?);")) {
            for (int i = 0; i < 100; ++i) {
               stmt.setInt(1, i);
               Float[] featureVector = {0.1f,0.2f,0.3f,0.4f};
               Array array = conn.createArrayOf("FLOAT4", featureVector);
               stmt.setArray(2, array);
               stmt.execute();
            }
        }
    }
  • Q: How do I change a Proxima Graph index to an HGraph index?

    A: To change a Proxima Graph index to an HGraph index, you must perform the following two steps in order:

    • Step 1: Delete the existing Proxima Graph index from the table. The SQL command is as follows:

    CALL set_table_property ('<TABLE_NAME>', 'proxima_vectors', '{}');
    Replace `<TABLE_NAME>` with the actual table name.
    • Step 2: After the original Proxima Graph index is deleted, you can create an HGraph index. For more information, see Create an index.