如何使用向量檢索服務 DashVector的條件過濾檢索 - DashVector

背景介紹

在大多數業務情境中，單純使用向量進行相似性檢索並無法滿足業務需求，通常需要在滿足特定過濾條件、或者特定的“標籤”的前提下，再進行相似性檢索。

向量檢索服務DashVector支援條件過濾和向量相似性檢索相結合，在精確滿足過濾條件的前提下進行高效的向量檢索。

條件過濾檢索樣本

說明

需要使用您的api-key替換樣本中的 YOUR_API_KEY、您的Cluster Endpoint替換樣本中的YOUR_CLUSTER_ENDPOINT，代碼才能正常運行。
本樣本需要參考建立Collection-使用樣本提前建立好名稱為quickstart的Collection。

插入帶有Field的資料

Python

import dashvector
import numpy as np

client = dashvector.Client(
    api_key='YOUR_API_KEY',
    endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')

ret = collection.insert([
    ('1', np.random.rand(4), {'name':'zhangsan', 'age': 10, 'male': True, 'weight': 35.0}),
    ('2', np.random.rand(4), {'name':'lisi', 'age': 20, 'male': False, 'weight': 45.0}),
    ('3', np.random.rand(4), {'name':'wangwu', 'age': 30, 'male': True, 'weight': 75.0}),
    ('4', np.random.rand(4), {'name':'zhaoliu', 'age': 5, 'male': False, 'weight': 18.0}),
    ('5', np.random.rand(4), {'name':'sunqi', 'age': 40, 'male': True, 'weight': 70.0})
])
assert ret

說明

在建立Collection-使用樣本中，建立了名稱為quickstart的Collection，該Collection定義了3個Field（{'name': str, 'weight': float, 'age': int}）。DashVector具有Schema Free的特性，因此可以在插入Doc時，隨意指定建立Collection時未定義的Field，如上述樣本中的maleField。

通過filter進行條件過濾檢索

Python

import dashvector

client = dashvector.Client(
    api_key='YOUR_API_KEY',
    endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')

# 要求年齡(age)大於18，並且體重(weight)大於65.0的男性(male=true)
docs = collection.query(
  [0.1, 0.1, 0.1, 0.1],
  topk=10,
  filter = 'age > 18 and weight > 65.0 and male = true'
)
print(docs)

DashVector支援的資料類型

當前DashVector支援Python的基礎資料類型如下：

str
float
int
bool
long
list[int]
list[float]
list[str]
list[long]

重要

Python的int類型可表達無限大小的整數，當前DashVector int僅支援32位整數，範圍為-2,147,483,648~2,147,483,647，需要使用者自行保證資料未溢出，long支援64位整數，範圍-9,223,372,036,854,775,808～9,223,372,036,854,775,807（即-2^63到2^63-1）。
list[int]、list[float]、list[str]、list[long] 不支援Schema Free，如需使用，請在建立Collection時預定義Field。

說明

Java SDK 和 HTTP API 中支援的資料類型為：str、float、int、long、ARRAY_STRING 、ARRAY_INT 、ARRAY_FLOAT、ARRAY_LONG。

比較子

通過Field 比較子常量的組合產生比較運算式，說明及樣本如下：

符號	描述	支援資料類型	運算式樣本	樣本解釋
<	小於	int float long	age < 10 weight < 60.0 total <100	age小於10則為`True` weight小於60.0則為`True` total 小於100則為`True`
<=	小於或等於	int float long	age <= 10 weight <= 60.0 total <=100	age小於或等於10則為`True` weight小於或等於60.0則為`True` total 小於或等於100則為`True`
=	等於	int float bool str long	age = 10 weight = 60.0 male = true name = 'lisi' total = 100	age等於10則為`True` weight等於60.0則為`True` male等於true則為`True` name等於lisi則為`True` total 等於100則為`True`
!=	不等於	int float bool str long	age != 10 weight != 60.0 male != true name != 'lisi' total !=100	age不等於10則為`True` weight不等於60.0則為`True` male不等於true則為`True` name不等於lisi則為`True` total 不等於100則為`True`
>=	大於或等於	int float long	age >= 10 weight >= 60.0 total >=100	age大於或等於10則為`True` weight大於或等於60.0則為`True` total 大於或等於100則為`True`
>	大於	int float long	age > 10 weight > 60.0 total >100	age大於10則為`True` weight大於60.0則為`True` total 大於100則為`True`

成員運算子

通過Field 成員運算子常量的組合產生比較運算式，說明及樣本如下：

符號	描述	支援資料類型	運算式樣本	樣本解釋
in	包含	int float str long	age in (10,20) floal in (89.5,90.5) name in ("lisi","zhangsan") total in (100,200)	age 包含10和20則為`True` float包含89.5和90.5則為`True` name包含lisi和zhangsan則為`True` total 包含100和200則為`True`
not in	不包含	int float str long	age not in (10,20) floal not in (89.5,90.5) name not in ("lisi","zhangsan") total not in (100,200)	age 不包含10和20則為`True` float不包含89.5和90.5則為`True` name不包含lisi和zhangsan則為`True` total 不包含100和200則為`True`
contain_all	包含全部	list[int] list[float] list[str] list[long]	tags contain_all (10,20) tags contain_all (89.5,90.5) tags contain_all ("lisi","zhangsan") tags contain_all (100,200)	tags 包含10和20則為`True` tags 包含89.5和90.5則為`True` tags 包含lisi和zhangsan則為`True` tags 包含100和200則為`True`
not contain_all	不包含全部	list[int] list[float] list[str] list[long]	tags not contain_all (10,20) tags not contain_all (89.5,90.5) tags not contain_all ("lisi","zhangsan") tags not contain_all (100,200)	tags 不包含10和20則為`True` tags 不包含89.5和90.5則為`True` tags 不包含lisi和zhangsan則為`True` tags 不包含100和200則為`True`
contain_any	包含任意一個	list[int] list[float] list[str] list[long]	tags contain_any (10,20) tags contain_any (89.5,90.5) tags contain_any ("lisi","zhangsan") tags contain_any (100,200)	tags 包含10或20則為`True` tags 包含89.5或90.5則為`True` tags 包含lisi或zhangsan則為`True` tags 包含100或200則為`True`
not contain	不包含任意一個	list[int] list[float] list[str] list[long]	tags not contain_any (10,20) tags not contain_any (89.5,90.5) tags not contain_any ("lisi","zhangsan") tags not contain_any (100,200)	tags 不包含10或20則為`True` tags 不包含89.5或90.5則為`True` tags 不包含lisi或zhangsan則為`True` tags 不包含100或200則為`True`

字串運算子

通過Field 字串運算子常量的組合產生匹配運算式，說明及樣本如下：

符號	描述	支援資料類型	運算式樣本	樣本解釋
like	首碼匹配	str	name like 'li%'	name以li開頭則為`True`

邏輯運算子

邏輯運算子用於組合多個運算式。

符號	描述	樣本	樣本解釋
and	與	expr1 and expr2	expr1、expr2同時為`True`時則為`True`，否則`False`
or	或	expr1 or expr2	expr1、expr2同時為`False`時則為`False`，否則`True`

說明

可通過括弧()組合邏輯運算子，()擁有更高優先順序，如：expr1 and (expr2 or expr3)，會優先計算(expr2 or expr3)