調用HTTP API執行分組相似性檢索-向量檢索服務 DashVector-阿里雲

本文介紹如何通過HTTP API在Collection中進行分組相似性檢索。

前提條件

已建立Cluster：建立Cluster。
已獲得API-KEY：API-KEY管理。

Method與URL

HTTP

POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by

使用樣本

說明

需要使用您的api-key替換樣本中的YOUR_API_KEY、您的Cluster Endpoint替換樣本中的YOUR_CLUSTER_ENDPOINT，代碼才能正常運行。
本樣本需要參考分組向量檢索提前建立好名稱為group_by_demo的Collection，並插入部分資料。

根據向量進行分組相似性檢索

Shell

l -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

樣本輸出

{
    "code": 0,
    "request_id": "d6df634a-683d-445e-abe0-d547091d6b3a",
    "message": "Success",
    "output": [
        {
            "docs": [
                {
                    "id": "4",
                    "vector": [
                        0.621783971786499,
                        0.5220040082931519,
                        0.8403469920158386,
                        0.995602011680603
                    ],
                    "fields": {
                        "document_id": "paper-02",
                        "content": "xxxD",
                        "chunk_id": 2
                    },
                    "score": 0.028402328
                }
            ],
            "group_id": "paper-02"
        },
        {
            "docs": [
                {
                    "id": "1",
                    "vector": [
                        0.26870301365852356,
                        0.8718249797821045,
                        0.6066280007362366,
                        0.6342290043830872
                    ],
                    "fields": {
                        "document_id": "paper-01",
                        "content": "xxxA",
                        "chunk_id": 1
                    },
                    "score": 0.08141637
                }
            ],
            "group_id": "paper-01"
        },
        {
            "docs": [
                {
                    "id": "6",
                    "vector": [
                        0.661965012550354,
                        0.730430006980896,
                        0.6105219721794128,
                        0.22164000570774078
                    ],
                    "fields": {
                        "document_id": "paper-03",
                        "content": "xxxF",
                        "chunk_id": 1
                    },
                    "score": 0.2513085
                }
            ],
            "group_id": "paper-03"
        }
    ]
}

根據主鍵（對應的向量）進行分組相似性檢索

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

帶過濾條件的分組相似性檢索

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filter": "chunk_id > 1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

帶有Sparse Vector的分組向量檢索

Shell

curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

入參描述

說明

vector和id兩個入參需要二選一使用，並保證其中一個不為空白。

參數	Location	類型	必填	說明
{Endpoint}	path	str	是	Cluster的Endpoint，可在控制台Cluster詳情中查看
{CollectionName}	path	str	是	Collection名稱
dashvector-auth-token	header	str	是	api-key
group_by_field	body	str	是	按指定欄位的值來分組檢索，目前不支援schema-free欄位
group_count	body	int	否	最多返回的分組個數，儘力而為參數，一般可以返回group_count個分組。
group_topk	body	int	否	每個分組返回group_topk條相似性結果，儘力而為參數，優先順序低於group_count。
vector	body	array	否	向量資料
sparse_vector	body	dict	否	稀疏向量
id	body	str	否	主鍵，表示根據主鍵對應的向量進行相似性檢索
filter	body	str	否	過濾條件，需滿足SQL where子句規範，詳見
include_vector	body	bool	否	是否返迴向量資料，預設false
output_fields	body	array	否	返回field的欄位名列表，預設返回所有Fields
partition	body	str	否	Partition名稱

出參描述

欄位	類型	描述	樣本
code	int	傳回值，參考返回狀態代碼說明	0
message	str	返回訊息	success
request_id	str	請求唯一id	19215409-ea66-4db9-8764-26ce2eb5bb99
output	array	分組相似性檢索結果，Group列表