Collection.query() searches a DashVector collection for documents similar to a given vector or a stored document's vector. You can also retrieve documents by metadata filter alone.
Query modes
Collection.query() supports five query modes depending on the parameters you provide:
| Mode | Required parameters | Description |
|---|---|---|
| Vector search | vector | Find documents closest to a given dense vector |
| Primary key search | id | Find documents closest to the vector of an existing document |
| Filtered vector search | vector or id + filter | Combine similarity search with metadata filtering |
| Hybrid search | vector + sparse_vector | Combine dense and sparse vectors for keyword-aware semantic search |
| Match query | filter only | Retrieve documents by metadata filter without similarity ranking |
If neithervectornoridis specified,query()performs a match query using only the conditional filter.
Prerequisites
Before you begin, make sure that you have:
A DashVector cluster. See Create a cluster
An API key. See Manage API keys
DashVector SDK (latest version). See Install DashVector SDK
A collection with documents already inserted. See Create a collection and Insert documents
API signature
Collection.query(
vector: Optional[Union[List[Union[int, float]], np.ndarray]] = None,
id: Optional[str] = None,
topk: int = 10,
filter: Optional[str] = None,
include_vector: bool = False,
partition: Optional[str] = None,
output_fields: Optional[List[str]] = None,
sparse_vector: Optional[Dict[int, float]] = None,
async_req: False
) -> DashVectorResponseRequest parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
vector | Optional[Union[List[Union[int, float]], np.ndarray]] | None | Dense vector for similarity search. |
id | Optional[str] | None | Primary key of an existing document. The search uses that document's vector. |
topk | int | 10 | Maximum number of results to return, ranked by similarity. |
filter | Optional[str] | None | Conditional filter using SQL WHERE clause syntax. See Conditional filtering. |
include_vector | bool | False | Whether to include vector data in the response. |
partition | Optional[str] | None | Partition name. Limits the search scope to a specific partition. |
output_fields | Optional[List[str]] | None | Fields to return. By default, all fields are returned. |
sparse_vector | Optional[Dict[int, float]] | None | Sparse vector for keyword-aware semantic search. Each key is a dimension index, and each value is the weight. |
async_req | bool | False | Whether to enable asynchronous mode. |
Response
query() returns a DashVectorResponse object:
| Field | Type | Description | Example |
|---|---|---|---|
code | int | Status code. 0 indicates success. See Status codes. | 0 |
message | str | Status message. | success |
request_id | str | Unique request identifier. | 19215409-ea66-4db9-8764-26ce2eb5bb99 |
output | List[<code data-tag="code" class="inline-code___exakR" id="code_d6787a3c">Doc</code>] | Similarity search results. | -- |
Examples
All examples below use the following client setup. Replace the placeholders with your actual values:
| Placeholder | Description |
|---|---|
YOUR_API_KEY | Your API key from the DashVector console |
YOUR_CLUSTER_ENDPOINT | Your cluster endpoint URL |
import dashvector
import numpy as np
client = dashvector.Client(
api_key='YOUR_API_KEY',
endpoint='YOUR_CLUSTER_ENDPOINT'
)
# Get the target collection
collection = client.get(name='quickstart')Create the quickstart collection and insert documents before running these examples. See Create a collection and Insert documents.Search by vector
Pass a dense vector to find the most similar documents.
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4]
)
# Check whether the query method is successfully called.
if ret:
print('query success')
print(len(ret))
for doc in ret:
print(doc)
print(doc.id)
print(doc.vector)
print(doc.fields)To customize the result set, specify topk, output_fields, and include_vector:
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4],
topk=100,
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)Search by primary key
Use the id parameter to search with a stored document's vector, without providing the vector values directly.
ret = collection.query(
id='1'
)
# Check whether the query method is successfully called.
if ret:
print('query success')
print(len(ret))
for doc in ret:
print(doc)
print(doc.id)
print(doc.vector)
print(doc.fields)Combine id with topk and output_fields the same way as vector search:
ret = collection.query(
id='1',
topk=100,
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)Search with a conditional filter
Add a filter parameter to narrow results by metadata. The filter follows SQL WHERE clause syntax.
# Perform a similarity search by using the vector or primary key and a conditional filter.
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4], # Specify a vector for search. Alternatively, you can specify a primary key for search.
topk=100,
filter='age > 18', # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)Tip: Combinefilterwithidinstead ofvectorto filter results from a primary key-based search.
Hybrid search with dense and sparse vectors
Combine a dense vector with a sparse vector to perform keyword-aware semantic search. The sparse vector represents keyword weights that complement the dense embedding.
# Perform a similarity search by using both dense and sparse vectors.
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4], # Specify a vector for search.
sparse_vector={1: 0.3, 20: 0.7}
)See Keyword-aware semantic search for configuration details.
Match query with filter only
Omit both vector and id to retrieve documents based solely on metadata conditions, without similarity ranking.
# Perform a match query only by using a conditional filter without specifying a vector or primary key.
ret = collection.query(
topk=100,
filter='age > 18', # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)What's next
Keyword-aware semantic search -- Improve retrieval accuracy by combining dense and sparse vectors.
Conditional filtering -- Full filter expression syntax reference.