This topic describes how to perform grouped similarity searches in a collection by using the HTTP API.
Prerequisites
A cluster is created. For more information, see Create a cluster.
An API key is obtained. For more information, see Manage API keys.
Method and URL
POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_byExample
You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.
This example uses a collection named
group_by_demo. For more information about this collection, see Grouped vector search.
Perform a grouped similarity search by using a vector
l -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"vector": [0.1, 0.2, 0.3, 0.4],
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by
The sample output is as follows:
{
"code": 0,
"request_id": "d6df634a-683d-445e-abe0-d547091d6b3a",
"message": "Success",
"output": [
{
"docs": [
{
"id": "4",
"vector": [
0.621783971786499,
0.5220040082931519,
0.8403469920158386,
0.995602011680603
],
"fields": {
"document_id": "paper-02",
"content": "xxxD",
"chunk_id": 2
},
"score": 0.028402328
}
],
"group_id": "paper-02"
},
{
"docs": [
{
"id": "1",
"vector": [
0.26870301365852356,
0.8718249797821045,
0.6066280007362366,
0.6342290043830872
],
"fields": {
"document_id": "paper-01",
"content": "xxxA",
"chunk_id": 1
},
"score": 0.08141637
}
],
"group_id": "paper-01"
},
{
"docs": [
{
"id": "6",
"vector": [
0.661965012550354,
0.730430006980896,
0.6105219721794128,
0.22164000570774078
],
"fields": {
"document_id": "paper-03",
"content": "xxxF",
"chunk_id": 1
},
"score": 0.2513085
}
],
"group_id": "paper-03"
}
]
}Perform a grouped similarity search by using the vector associated with the primary key
curl -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"id": "1",
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by
Perform a grouped similarity search by using the vector or primary key and a conditional filter
curl -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"filter": "chunk_id > 1",
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query
Perform a grouped search by using both dense and sparse vectors
curl -XPOST \
-H 'dashvector-auth-token: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"vector": [0.1, 0.2, 0.3, 0.4],
"sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},
"group_by_field": "document_id",
"group_topk": 1,
"group_count": 3,
"include_vector": true
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query
Request parameters
You must specify the vector or id parameter.
Parameter | Location | Type | Required | Description |
{Endpoint} | path | str | Yes | The endpoint of the cluster. You can view the endpoint on the cluster details page in the console. |
{CollectionName} | path | str | Yes | The name of the collection. |
dashvector-auth-token | header | str | Yes | The API key. |
group_by_field | body | str | Yes | The name of the field by which a grouped search is performed. Schema-free fields are not supported. |
group_count | body | int | No | The maximum number of groups to be returned. This is a best-effort parameter. In general, the specified number of groups can be returned. |
group_topk | body | int | No | The number of similar results to be returned per group. This is a best-effort parameter and has a lower priority than group_count. |
vector | body | array | No | The vector. |
sparse_vector | body | dict | No | The sparse vector. |
id | body | str | No | The primary key. The similarity search is performed based on the vector associated with the primary key. |
filter | body | str | No | The conditional filter, which must comply with the syntax of an SQL WHERE clause. For more information, see Conditional filtering. |
include_vector | body | bool | No | Specifies whether to return the vector. Default value: false. |
output_fields | body | array | No | The list of fields to be returned. All fields are returned by default. If the value is [], no fields are returned. |
partition | body | str | No | The name of the partition. |
Response parameters
Parameter | Type | Description | Example |
code | int | The returned status code. For more information, see Status codes. | 0 |
message | str | The returned message. | success |
request_id | str | The unique ID of the request. | 19215409-ea66-4db9-8764-26ce2eb5bb99 |
output | array | Grouped similar results. For more information, see the "Group" section of the Data types topic. |