A vector bucket is a bucket type in Alibaba Cloud Object Storage Service (OSS) designed for storing, querying, and managing vector data. It is a low-cost, large-scale, and easy-to-use solution that provides vector storage and query capabilities for AI scenarios, such as multi-modal retrieval, knowledge bases, retrieval-augmented generation (RAG), and AI agents. You can write vector data from any third-party service to a vector bucket. Vector buckets also support unified administration of large amounts of raw data and vector data. For example, you can configure the same bucket policy for both raw data and vector buckets or export logs in a unified format for auditing.
Core concepts
Vector bucket: A bucket type that serves as a cloud resource for managing large-scale vector data.
Vector index: A vector index is an index table within a vector bucket that stores vector data. You can create multiple vector indexes in a single vector bucket to store vector data for different business scenarios. When you initiate a retrieval query, the results are returned based on the similarity of the vector data in the vector index.
Vector data: High-dimensional numerical arrays created by converting unstructured data, such as images, videos, and documents, using a vector model. These arrays represent the content features of the data. Vector retrieval returns results based on the similarity of this vector data. You can use any vectorization service, such as ECS, PAI, or Alibaba Cloud Model Studio, to generate vectors. Then, you can write them to a specified vector index using the OSS API, a software development kit (SDK), or the ossutil tool. When you write data, you can also attach metadata to use for subsequent scalar filtering queries.
Benefits
Low cost: Vector data is essential infrastructure for various AI applications and is growing exponentially. Vector buckets use a simple and transparent billing model. You are charged only for vector data storage capacity and the amount of data scanned during retrieval. This model can reduce costs by more than 90% compared to traditional methods.
Large scale: OSS vector buckets are designed with an architecture that supports large-scale vector data storage and can handle massive storage requirements. OSS uses a serverless architecture that scales elastically. When you use vector buckets, you do not need to manage scaling.
Easy to use: OSS vector buckets provide a complete set of APIs, SDKs, and the ossutil command line interface. You can also manage and perform read and write operations on vector data in the OSS console, such as retrieving, adding, and performing bulk inserts of vector data.
Unified management: You can manage vector buckets and buckets that store large amounts of raw data in the same way. For example, you can configure the same bucket policy for permission management or set the same log export path for operation audits.
Semantic retrieval: You can use the QueryVectors operation provided by vector buckets to query vector data in an index table and retrieve results sorted by similarity. OSS vector buckets also support filtering queries based on scalar metadata. You can include scalar metadata when you write vector data to an OSS vector bucket to enable post-filtering. When you create a vector index, you can also set non-filterable metadata. Non-filterable metadata cannot be used as a filter condition but is returned with the retrieval results as descriptive information for the vector results.
Scenarios
Scenario 1: Build low-cost RAG applications
As AI businesses grow, the scale of vector data increases exponentially, which in turn increases storage and retrieval costs. For multi-modal retrieval scenarios, such as knowledge bases, AI assistants, and medical image retrieval, users are becoming more tolerant of retrieval latency, which can range from tens to hundreds of milliseconds. In these cases, you can use a vector bucket as the storage foundation for RAG applications to meet business requirements at an extremely low cost.
Scenario 2: Build AI agents with tiered retrieval
Different AI agents have different requirements for retrieval performance. You can store all vector data centrally in a low-cost OSS vector bucket. For business scenarios that require high performance and low latency, you can synchronize hot data to other products, such as Tablestore, for high-performance retrieval. This approach lets you build an AI agent application architecture with tiered retrieval.
Scenario 3: Build an AI content management platform with unified data management
AI applications generate massive amounts of unstructured content, such as user-generated content (UGC), internal documents, and AI-generated content, along with their corresponding vectorized results. This process can lead to fragmented storage and retrieval systems. You can store raw data in standard OSS buckets and vector data in OSS vector buckets to build an efficient AI data management platform for use cases such as AI-generated content (AIGC) data management. You need only one set of APIs or SDKs to manage and access both raw files and vector indexes, which makes it easy to build an efficient and unified AI content management platform.
Enterprise features
Domain name access
Provides dedicated public and internal network endpoints and is isolated from general-purpose OSS buckets.
Public endpoint:
$bucketname-$uid.regionID.oss-vectors.aliyuncs.comInternal endpoint:
$bucketname-$uid.regionID-internal.oss-vectors.aliyuncs.com
Note: You must use a third-level domain for all operations except ListVectorBuckets.Secure transfer
Use HTTPS to encrypt data in transit.
Access control
Bucket policy: Supports resource-based authorization policies that allow you to control permissions at the vector bucket level or for one or more vector indexes.
RAM policy: Supports identity-based RAM authorization policies for fine-grained permission control over vector buckets, vector indexes, and data operations. These policies also support cross-account access authorization.
Logs
Access log export: Supports exporting access logs to a specified bucket in real time or near-real time.
Unified log format: The log format is fully compatible with standard OSS logs. It includes an additional
BucketARNfield to uniquely identify the vector bucket resource, which simplifies unified log analysis.
Quotas and limits
Vector buckets have certain quotas and limits. When you design and implement your vector storage and retrieval solution, plan your bucket quantity, index scale, metadata structure, and API call strategy according to the following limits.
A single Alibaba Cloud account can create a maximum of 10 vector buckets in a single region. To increase this quota, contact Technical Support.
A single vector bucket can contain a maximum of 100 vector indexes. To increase this quota, contact Technical Support.
A single vector index table can store up to 2 billion rows of vector data.
Vector dimensions: 1 to 4096.
TopK range for vector retrieval requests: 1 to 100 by default.
Total size of a single vector array: 1 KB to 500 KB.
Total size of metadata (filterable and non-filterable) for a single vector: 40 KB.
Size of filterable metadata for a single vector: 2 KB.
Number of non-filterable metadata fields for a single vector: 10.
When filtering vectors using metadata:
The cumulative length of filterable metadata in a single filter instruction cannot exceed 64 KB.
The number of filterable metadata items in a single filter instruction cannot exceed 1024.
The filter condition supports a maximum of 8 nested levels.
The request frequency for the PutVectorIndex API is limited to a maximum of 5 calls per second.
You can write up to 500 entries in a single batch using the PutVectors API. The API supports up to 5 requests per second (QPS).
A maximum of 500 indexes are returned per page in the response of the ListVectorIndexes API. You can use paging to retrieve the next batch of indexes.
The maximum concurrency for the ListVectorIndexes API is 16.