All Products
Search
Document Center

DashVector:Partition

Last Updated:Apr 11, 2024

Concepts

DashVector allows documents to be physically or logically sorted into multiple partitions inside a collection. If you perform a document-related operation, such as inserting a document or searching for a document in a specific partition, the operation is performed only in the specified partition. An appropriate partition setting can effectively improve the efficiency of operations on documents.

  • Several partitions can be created in a collection. For more information, see Limits.

  • Each partition is identified by a unique name. Partitions in the same collection cannot have the same name.

  • All partitions in a collection have the same schema, including the number of vector dimensions, vector data types, distance metrics, and field definitions.

  • By default, each collection has a partition that cannot be deleted. If you do not specify a partition when you perform document-related operations, such as inserting a document or searching for a doc, the operation is performed in the default partition.

  • You can call API operations to create and delete partitions.

Common application scenarios of partitions

Query performance can be greatly increased by using partitions in collections. However, in some scenarios, partitions are not recommended. If the amount of data is small, using partitions is not cost-effective. If the amount of data is large but no field is appropriate to be used to create partitions, we do not recommend that you create partitions. For example, if a collection is divided into multiple partitions based on fields that are not so appropriate, many queries may be required across multiple partitions for a search request. In this case, the search performance is lower than that in one partition.

The following content describes the typical scenarios in which we recommend that you use partitions.

Search by images on e-commerce platforms

For example, a cross-border e-commerce platform contains 20 million images of clothing products. The business requirement of the platform is to allow consumers to search for product images by image. Products are sorted into predefined categories, such as shoes, skirts, and pants. Each category corresponds to a partition. If a user initiates a search request, the user can specify a category or let the category model analyze and decide.

image.png

Video surveillance

For example, a video surveillance manufacturer needs to extract frames from videos collected by 1,000 cameras in an industrial park, identify and extract vehicle features, and then import the data to a vector database of DashVector for subsequent business scenarios such as search and vehicle trajectories generation. However, all data needs to be retained for only 30 days. In this case, partitions are created every day and deleted when they expire.

image.png

Trademark infringement detection

For example, a trademark agent has a database of 50 million trademarks and needs to query similar trademarks to determine whether a trademark infringement is constituted. Trademarks can be divided by structure into nine categories, including text trademarks, graph trademarks, number trademarks, and letter trademarks. Each category corresponds to a partition in a DashVector collection, and trademark data is imported to the partitions based on category. If you specify a partition when you initiate a query request, trademarks of the corresponding categories are queried.

Multilingual question-and-answer system

The international knowledge base team of an e-commerce enterprise needs to query similar questions based on the language used by users. For example, the team needs to respond to users who use Chinese, English, or French. Content in the knowledge base is embedded and imported into three partitions that correspond to Chinese, English, and French. Then, a query is performed in the partition that corresponds to the language of the query.

Multi-tenant scenarios

Partitions can also be used in multi-tenant scenarios. For example, an e-commerce service provider needs to provide small and micro e-commerce companies with the capability of searching for images by images. In this case, the service provider can create multiple partitions in a collection to correspond to multiple customers. This realizes physical isolation of data, ensures security, and saves costs.

Example

Prerequisites

Sample code

Note

You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with your cluster endpoint for the sample code to run properly.

import dashvector

# Create a client.
client = dashvector.Client(
    api_key='YOUR_API_KEY',
    endpoint='YOUR_CLUSTER_ENDPOINT'
)
assert client

# Create a collection.
client.create(name='understand_partition', dimension=4)
collection = client.get('understand_partition')
assert collection


# Create a partition named shoes.
collection.create_partition(name='shoes')

# Describe the partition.
ret = collection.describe_partition('shoes')
print(ret)

# View partitions.
partitions = collection.list_partitions()
print(partitions)

# Insert a document into the partition.
collection.insert(
  ('1', [0.1,0.1,0.1,0.1]), partition='shoes'
)

# Specify the partition when you perform a vector-based similarity search.
docs = collection.query(
  vector=[0.1, 0.1, 0.2, 0.1],
  partition='shoes'
)
print(docs)

# Remove a document from the specified partition.
collection.delete(ids=['1'], partition='shoes')

# View partition statistics.
ret = collection.stats_partition('shoes')
print(ret)

# Delete the partition.
collection.delete_partition('shoes')