DashVector FAQ vector retrieval partition collection SDK - DashVector

1. What happens during Doc operations if the Partition parameter is not specified?

Every Collection has a non-deletable default partition created automatically. Doc operations without a specified partition use default. For example, retrieving a Doc without specifying a partition searches only default, not other partitions.

2. Insert Doc, Update Doc, and Insert or Update Doc operations: What are the differences?

Insert Doc: Fails if the Doc ID already exists. The existing Doc is not overwritten.
Update Doc: Overwrites an existing Doc. Fails if the Doc ID does not exist.
Insert or Update Doc: Updates the Doc if the ID exists; inserts a new Doc if it does not.

3. How do I clear a collection?

You cannot clear a Collection directly. Instead, delete the Collection and create a new Collection.

4. How do I use the asynchronous feature for Doc operations?

Insert Doc, Update Doc, Insert or Update Doc, Retrieve Doc, Delete Doc, and Get Doc all support asynchronous execution. Set async_req=True:

# Asynchronously write 1,000 times. Dimension = 20000, batch size = 8.
batch_size = 8
loop = 1000
start = time.time()

async_results = [
    collection.insert(
        [(j + i * batch_size, np.random.rand(20000)) for j in range(batch_size)],
        async_req=True
    ) for i in range(loop)
]

# Wait for all write operations to complete.
print([async_result.get() for async_result in async_results])

print(f"async insert {loop} times with batch-size = {batch_size}, cost = {time.time() - start}")

# output: 
# async insert 1000 times with batch-size = 8, cost = 31.13356590270996

# For comparison, synchronous write (code omitted)
# sync insert 1000 times with batch-size = 8, cost = 408.63447427749634

Important

Asynchronous operations can trigger limits described in Constraints and Limitations. Handle these situations appropriately.

5. Are Doc IDs unique at the collection level or the partition level?

Doc IDs are unique at the partition level. Different partitions in the same collection can have Docs with the same ID.

6. Why is there a loss of precision in inserted vector data?

DashVector stores vectors as single-precision floating-point (FP32/float32). The precision range is:

Input data outside this range is rounded to the nearest FP32 value, causing a loss of precision.

7. Can you specify multiple Partitions when retrieving a Doc?

No. Each Retrieve Doc call accepts only one partition. To query multiple partitions, call the operation multiple times.

8. The `pip install dashvector` command is very slow. How can I speed it up?

Slow downloads are typically caused by high latency to the default package index. Use a mirror to speed up the installation.

For example, to use the Alibaba Cloud mirror:

Open a terminal (for Linux or macOS) or Command Prompt/PowerShell (for Windows).

Use the -i or --index-url parameter to specify the mirror.

pip3 install dashvector -i https://mirrors.aliyun.com/pypi/simple/

9. Can I create a cluster using an SDK?

No. Clusters can only be created in the Management Console. Create a cluster.

10. Is there a limit on the number of collections I can create in a paid cluster?

Yes. A paid cluster supports up to 32 collections. The partition limit depends on cluster specifications and is independent of the collection count. Constraints and Limits.

11. Do filter conditions support fuzzy text search when retrieving a Doc?

No. In AISearch, filters support only prefix matching on text. Filtered retrieval.

12. DashVector returns the error: "Query qps exceeds limit 0 for collection ××××"

This error occurs when an incorrect collectionName is provided in an HTTP retrieval request on a free cluster, or when the free cluster's QPS limit is exceeded. Paid clusters have no hard QPS or capacity limits. Constraints and Limits.

13. When I use an SDK to call DashVector, it returns the error: DashVectorSDK RPCHandler endpoint({××××}) is invalid and cannot contain protocol header

The endpoint provided to the client is invalid. Verify that the endpoint parameter does not contain `{}`.