All Products
Search
Document Center

DashVector:Grouped document search

Last Updated:Apr 18, 2024

This topic describes how to perform grouped similarity searches in a collection by using the HTTP API.

Prerequisites

Method and URL

POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by

Example

Note
  1. You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.

  2. This example uses a collection named group_by_demo. For more information about this collection, see Grouped vector search.

Perform a grouped similarity search by using a vector

l -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

The sample output is as follows:

{
    "code": 0,
    "request_id": "d6df634a-683d-445e-abe0-d547091d6b3a",
    "message": "Success",
    "output": [
        {
            "docs": [
                {
                    "id": "4",
                    "vector": [
                        0.621783971786499,
                        0.5220040082931519,
                        0.8403469920158386,
                        0.995602011680603
                    ],
                    "fields": {
                        "document_id": "paper-02",
                        "content": "xxxD",
                        "chunk_id": 2
                    },
                    "score": 0.028402328
                }
            ],
            "group_id": "paper-02"
        },
        {
            "docs": [
                {
                    "id": "1",
                    "vector": [
                        0.26870301365852356,
                        0.8718249797821045,
                        0.6066280007362366,
                        0.6342290043830872
                    ],
                    "fields": {
                        "document_id": "paper-01",
                        "content": "xxxA",
                        "chunk_id": 1
                    },
                    "score": 0.08141637
                }
            ],
            "group_id": "paper-01"
        },
        {
            "docs": [
                {
                    "id": "6",
                    "vector": [
                        0.661965012550354,
                        0.730430006980896,
                        0.6105219721794128,
                        0.22164000570774078
                    ],
                    "fields": {
                        "document_id": "paper-03",
                        "content": "xxxF",
                        "chunk_id": 1
                    },
                    "score": 0.2513085
                }
            ],
            "group_id": "paper-03"
        }
    ]
}

Perform a grouped similarity search by using the vector associated with the primary key

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

Perform a grouped similarity search by using the vector or primary key and a conditional filter

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filter": "chunk_id > 1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query
  

Perform a grouped search by using both dense and sparse vectors

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

Request parameters

Note

You must specify the vector or id parameter.

Parameter

Location

Type

Required

Description

{Endpoint}

path

str

Yes

The endpoint of the cluster. You can view the endpoint on the cluster details page in the console.

{CollectionName}

path

str

Yes

The name of the collection.

dashvector-auth-token

header

str

Yes

The API key.

group_by_field

body

str

Yes

The name of the field by which a grouped search is performed. Schema-free fields are not supported.

group_count

body

int

No

The maximum number of groups to be returned. This is a best-effort parameter. In general, the specified number of groups can be returned.

group_topk

body

int

No

The number of similar results to be returned per group. This is a best-effort parameter and has a lower priority than group_count.

vector

body

array

No

The vector.

sparse_vector

body

dict

No

The sparse vector.

id

body

str

No

The primary key. The similarity search is performed based on the vector associated with the primary key.

filter

body

str

No

The conditional filter, which must comply with the syntax of an SQL WHERE clause. For more information, see Conditional filtering.

include_vector

body

bool

No

Specifies whether to return the vector. Default value: false.

output_fields

body

array

No

The list of fields to be returned. All fields are returned by default. If the value is [], no fields are returned.

partition

body

str

No

The name of the partition.

Response parameters

Parameter

Type

Description

Example

code

int

The returned status code. For more information, see Status codes.

0

message

str

The returned message.

success

request_id

str

The unique ID of the request.

19215409-ea66-4db9-8764-26ce2eb5bb99

output

array

Grouped similar results. For more information, see the "Group" section of the Data types topic.