All Products
Search
Document Center

OpenSearch:Hybrid query based on vectors and text

Last Updated:Mar 26, 2025

Query syntax

URL

curl -X POST /search or curl -X POST /vector-service/search

  • The sample URL omits information such as parameters in request headers and the encoding method.

  • The sample URL also omits the endpoint that is used to connect to an OpenSearch instance.

  • For more information about the definitions, usage, and example values of all the request parameters that are concatenated in the preceding URL, see the "Request parameters" section of this topic.

Protocol

HTTP

Request method

POST

Supported format

JSON

Signature method

You can use the following method to calculate the request signature. The request signature is stored in the authorization header.

Parameter

Type

Description

accessUserName

string

The username. You can view the username in the API Endpoint section of the Instance Details page.

accessPassWord

string

The password. You can modify the password in the API Endpoint section of the Instance Details page.

import com.aliyun.darabonba.encode.Encoder;
import com.aliyun.darabonbastring.Client;

public class GenerateAuthorization {

    public static void main(String[] args) throws Exception {
        String accessUserName = "username";
        String accessPassWord = "password";
        String realmStr = "" + accessUserName + ":" + accessPassWord + "";
        String authorization = Encoder.base64EncodeToString(Client.toBytes(realmStr, "UTF-8"));
        System.out.println(authorization);
    }
}

Valid format for the value of the authorization header:

cm9vdDp******mdhbA==

You must add the Basic prefix when you specify the authorization header in an HTTP request.

Example:

authorization: Basic cm9vdDp******mdhbA==

Request parameters

Parameter

Type

Required

Default value

Description

tableName

string

Yes

No default value

The name of the table to be queried.

knn

Object

No

No default value

The k-nearest neighbor (kNN) query parameter.

knn.vector

list[float]

Yes

No default value

The dense vector data to be queried.

knn.topk

int

No

No default value

The number of results to be returned.

knn.filter

String

No

""

The expression of the filter condition.

knn.weight

Float

No

1.0

The weight of kNN query results. The result of the score multiplied by the weight is used as the sorting score.

text

Object

No

No default value

The text query parameter.

text.queryString

String

Yes

No default value

Syntax of the query clauses supported by HA3, which supports conditions that are nested by using AND and OR operators in multiple text indexes.

text.queryParams

Map<String, String>

No

{}

Parameters for data query:

  • default_op: specifies the logical relationship between the terms that are returned after the system tokenizes the search query by using the default analyzer. Valid values: AND and OR. Default value: AND.

  • no_token_indexes: specifies the fields whose values you do not want the system to convert into terms. The configuration of this parameter does not affect other operations that are performed on the values in the specified fields, such as normalization and stop word removal. Separate multiple fields with semicolons (;).

  • remove_stopwords: specifies whether to delete stop words when you configure the analyzer. Valid values: true and false. Default value: true.

text.filter

String

No

""

The expression of the filter condition.

text.weight

Float

No

1.0

The weight of text query results. The result of the score multiplied by the weight is used as the sorting score.

text.terminateAfter

Integer

No

0

The maximum number of documents that meet the query conditions in each shard. After the specified value is reached, the query ends. The default value is 0. The value has no limit.

size

Integer

No

100

The number of returned results.

from

Integer

No

0

The value from which the system starts to return documents in the result set.

outputFields

List[String]

No

[]

The fields to be returned in the results.

order

String

No

DESC

The order in which the results are sorted.

  • DESC: descending order

  • ASC: ascending order

rank

Object

No

{}

The policy that is used to merge two result sets. Supported policies:

  • Default policy: The scores of documents with the same primary key in the two result sets are combined based on weight. The results are sorted based on the weighted scores.

  • rrf: The two result sets are merged based on the reciprocal rank fusion (RRF) algorithm.

Usage notes

  • kNN only supports single-vector queries, and the query parameters are the same as those of API queries.

  • The vector scores must be converted.

    To combine the kNN and text scores, you must convert the kNN vector scores.

    # Euclidean distance
    score = 1.0 / (1.0 + l2_distance^2)
    
    # Inner product distance
    score = (1.0 + ip_distance) / 2.0
  • The order parameter in the kNN query does not take effect. The order parameter at the outer layer is used.

  • Default sorting order

    • Configuration method: Do not configure the rank parameter, or leave the rank parameter empty.

      {
        "rank": {}
      }
    • Combine the scores of documents with the same primary key in the two result sets based on weight. Sort the results based on the weighted scores.

      score(i) = knn_score(i) * knn_weight + text_score(i) * text_weight

  • RRF

    • Configuration method:

      {
        "rank": {
          "rrf": {
            "rankConstant": 60
          }
        }
      }
      
      # rankConstant is optional. Default value: 60.
    • Formula

      score = 0.0
      if d in result(q):
          score += 1.0 / (rankConstant + rank(result(q), d))
      return score
      
      # rankConstant is a sorting constant that determines the extent to which the ranking of the documents in a result set influences the final score. The larger the value, the greater the impact of the low-ranked documents on the final score. The value of this parameter must be an integer greater than or equal to 1. Default value: 60.
      # rank(result(q), d) indicates the position of the document in a result set, starting from 1.

Query example

{
    "tableName": "test",
    "text": {
        "queryString": "title:'Alibaba'",
        "queryParams": {
            "default_op": "OR"
        },
        "filter": "count > 0",
        "terminateAfter": 100000
    },
    "knn": {
        "vector": [0.1, 0.2, 0.3, 0.4, 0.5],
        "namespace": "1",
        "topK": 100
    },
    "order": "DESC",
    "size": 10,
    "rank": {
        "rrf": {
            "rankConstant": 1
        }
    },
    "outputFields": ["title"]
}

Returned results

{
  "totalTime": 8.522,
  "coveredPercent": 1.0,
  "totalCount": 5,
  "result": [
    {
      "__source__": 2,
      "score": 0.833333,
      "namespace": 1,
      "id": 2,
      "fields": {
        "title": "a b c"
      }
    },
    {
      "__source__": 3,
      "score": 0.666666,
      "namespace": 1,
      "id": 1,
      "fields": {
        "title": "a b"
      }
    },
    {
      "__source__": 3,
      "score": 0.333333,
      "namespace": 2,
      "id": 5,
      "fields": {
        "title": "c"
      }
    },
    {
      "__source__": 2,
      "score": 0.25,
      "namespace": 2,
      "id": 4,
      "fields": {
        "title": "b c"
      }
    },
    {
      "__source__": 2,
      "score": 0.2,
      "namespace": 1,
      "id": 0,
      "fields": {
        "title": "a"
      }
    }
  ]
}

Result parameters:

Field

Type

Description

id

The data type configured in the schema.

The value of the primary key.

namespace

The data type configured in the schema.

The namespace of the vector index that is queried by kNN. This field is available only if a namespace is configured.

vector

List[Float]

The value of the vector index that is queried by kNN. You need to add the "includeVector": true configuration in the kNN query.

score

Float

The score for sorting.

fields

Map<String, FieldType>

The returned fields, which are in the key-value pair format.

totalTime

Integer

The response time. Unit: milliseconds.

totalCount

Integer

The number of returned results.

__source__

Integer

The type of query from which the data is retrieved. Valid values:

1: kNN queries.

2: text queries.

3: kNN and text queries.

coveredPercent

Float

The percentage of shards that successfully return data. For example, the value 1.0 indicates that all shards successfully return data.