Perform grouped similarity searches in a collection by using the HTTP API - DashVector

This topic describes how to perform grouped similarity searches in a collection by using the HTTP API.

Prerequisites

A cluster is created. For more information, see Create a cluster.
An API key is obtained. For more information, see Manage API keys.

Method and URL

HTTP

POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by

Example

Note

You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.
This example uses a collection named group_by_demo. For more information about this collection, see Grouped vector search.

Perform a grouped similarity search by using a vector

Shell

l -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

The sample output is as follows:

{
    "code": 0,
    "request_id": "d6df634a-683d-445e-abe0-d547091d6b3a",
    "message": "Success",
    "output": [
        {
            "docs": [
                {
                    "id": "4",
                    "vector": [
                        0.621783971786499,
                        0.5220040082931519,
                        0.8403469920158386,
                        0.995602011680603
                    ],
                    "fields": {
                        "document_id": "paper-02",
                        "content": "xxxD",
                        "chunk_id": 2
                    },
                    "score": 0.028402328
                }
            ],
            "group_id": "paper-02"
        },
        {
            "docs": [
                {
                    "id": "1",
                    "vector": [
                        0.26870301365852356,
                        0.8718249797821045,
                        0.6066280007362366,
                        0.6342290043830872
                    ],
                    "fields": {
                        "document_id": "paper-01",
                        "content": "xxxA",
                        "chunk_id": 1
                    },
                    "score": 0.08141637
                }
            ],
            "group_id": "paper-01"
        },
        {
            "docs": [
                {
                    "id": "6",
                    "vector": [
                        0.661965012550354,
                        0.730430006980896,
                        0.6105219721794128,
                        0.22164000570774078
                    ],
                    "fields": {
                        "document_id": "paper-03",
                        "content": "xxxF",
                        "chunk_id": 1
                    },
                    "score": 0.2513085
                }
            ],
            "group_id": "paper-03"
        }
    ]
}

Perform a grouped similarity search by using the vector associated with the primary key

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

Perform a grouped similarity search by using the vector or primary key and a conditional filter

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filter": "chunk_id > 1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

Perform a grouped search by using both dense and sparse vectors

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

Request parameters

Note

You must specify the vector or id parameter.

Parameter	Location	Type	Required	Description
{Endpoint}	path	str	Yes	The endpoint of the cluster. You can view the endpoint on the cluster details page in the console.
{CollectionName}	path	str	Yes	The name of the collection.
dashvector-auth-token	header	str	Yes	The API key.
group_by_field	body	str	Yes	The name of the field by which a grouped search is performed. Schema-free fields are not supported.
group_count	body	int	No	The maximum number of groups to be returned. This is a best-effort parameter. In general, the specified number of groups can be returned.
group_topk	body	int	No	The number of similar results to be returned per group. This is a best-effort parameter and has a lower priority than group_count.
vector	body	array	No	The vector.
sparse_vector	body	dict	No	The sparse vector.
id	body	str	No	The primary key. The similarity search is performed based on the vector associated with the primary key.
filter	body	str	No	The conditional filter, which must comply with the syntax of an SQL WHERE clause. For more information, see Conditional filtering.
include_vector	body	bool	No	Specifies whether to return the vector. Default value: false.
output_fields	body	array	No	The list of fields to be returned. All fields are returned by default. If the value is [], no fields are returned.
partition	body	str	No	The name of the partition.

Response parameters

Parameter	Type	Description	Example
code	int	The returned status code. For more information, see Status codes.	0
message	str	The returned message.	success
request_id	str	The unique ID of the request.	19215409-ea66-4db9-8764-26ce2eb5bb99
output	array	Grouped similar results. For more information, see the "Group" section of the Data types topic.