All Products
Search
Document Center

Realtime Compute for Apache Flink:VECTOR_SEARCH

Last Updated:Dec 09, 2025

This topic explains how to use the VECTOR_SEARCH function for vector searches. It finds the most semantically similar items based on a specified high-dimensional numerical vector.

Limitations

  • Version support: Ververica Runtime (VVR) 11.3+ support stream mode and VVR 11.4+ support batch mode.

  • Vector table: Only Milvus is supported as the vector table.

  • Stream type: Only non-updating streams are supported (containing only INSERT messages).

  • Execution mode: This function runs only in stream mode; batch mode is not supported.

Syntax

VECTOR_SEARCH(
  TABLE <SEARCH_TABLE>,
  DESCRIPTOR(<COLUMN_TO_SEARCH>),
  <COLUMN_TO_QUERY>,
  <TOP_K>[,
  <CONFIG>]
)

Input parameters

Parameter

Data type

Description

TABLE <SEARCH_TABLE>

TABLE

The name of the vector table.

DESCRIPTOR(<COLUMN_TO_SEARCH>)

DESC

The indexed vector column within the vector table. Input data is compared against this column to compute similarity.

COLUMN_TO_QUERY

ARRAY<FLOAT>/ARRAY<DOUBLE>

The vector feature column from the input data, such as the embedding of an uploaded image or text. This column is matched against the indexed vector column to find similarities.

TOP_K

INT

The maximum number of similar data entries to return.

CONFIG

MAP<STRING,STRING>

Configurable runtime parameters.

Return value

The VECTOR_SEARCH function returns a table, where each row contains all columns from the vector table and an additional score column. The score column is of the DOUBLE data type and indicates the similarity between the input data and the output data.

Runtime parameters

Parameter

Data type

Default

Description

async

Boolean

(none)

Specifies whether to enable asynchronous mode. If the connector for the vector table does not support the specified mode, the engine reports an error.

By default, the engine selects the execution mode based on what the connector supports. If the connector supports both asynchronous and synchronous modes, the engine prioritizes asynchronous mode to improve throughput.

max-concurrent-operations

Integer

10

The maximum number of concurrent requests in asynchronous mode.

output-mode

Enum

ORDERED

The output mode for asynchronous operations.

Valid values:

  • ORDERED

  • ALLOW_UNORDERED

For more information about these values, see Async I/O.

timeout

Duration

3 min

The timeout for an asynchronous operation, from the first call until completion. This period can include multiple retries and is reset on failover.

Example

Test data

Assume vector_table contains the following data:

id

topic

vector_index

1

"BigData"

[1, 1, 0]

2

"Streaming"

[-5, -12, -13]

3

"Batch"

[5, 12, 13]

Assume query_table contains the following data:

id

user_keyword

embedding

1

"Spark"

[5, 12, 13]

2

"Flink"

[-5, -12, -13]

Test statement

The following SQL statement uses each row in query_table to search vector_table and retrieve the top two most similar records.

SELECT user_keyword, topic
FROM 
  query_table,
  LATERAL TABLE (VECTOR_SEARCH(
    SEARCH_TABLE => TABLE vector_table, 
    COLUMN_TO_SEARCH => DESCRIPTOR(vector_index), 
    COLUMN_TO_QUERY => query_table.embedding, 
    TOP_K => 2,
    MAP['async', 'false'] -- Enable synchronous mode
    ))

Results

user_keyword

topic

"Spark"

"Batch"

"Spark"

"BigData"

"Flink"

"Streaming"

"Flink"

"BigData"