Overview of vector buckets - Object Storage Service - Alibaba Cloud Documentation Center

A vector bucket is a bucket type offered by Alibaba Cloud Object Storage Service (OSS) specifically for storing, querying, and managing vector data. It is low-cost, large-scale, and easy to use. It provides vector storage and query capabilities for AI scenarios such as multi-modal retrieval, knowledge bases, retrieval-augmented generation (RAG), and AI agents. You can write vector data generated by any third-party service to a vector bucket. It also supports unified administration for large amounts of raw data and vector data. For example, you can configure the same bucket policy for both raw data buckets and vector buckets, or export logs in a unified format for auditing.

Core concepts

Vector bucket: A new bucket type that serves as a cloud resource for managing large-scale vector data.
Vector index: You can create vector indexes in a vector bucket. A vector index is an index table that stores vector data. You can create multiple vector indexes in the same vector bucket to store vector data for different business types. When you initiate a retrieval query, the results are returned based on the similarity of the vector data in the vector index.
Vector data: High-dimensional numerical arrays created by converting unstructured data, such as images, videos, and documents, using a vector model. These arrays represent the content features of the data. Vector retrieval returns results based on the similarity of this vector data. You can use any vectorization service, such as ECS, PAI, or Alibaba Cloud Model Studio, to generate vectors. Then, you can write them to a specified vector index using the OSS API, a software development kit (SDK), or the ossutil tool. When you write data, you can also attach metadata for subsequent scalar filtering queries.

Benefits

Low cost: Vector data has become essential infrastructure for various AI applications and is growing exponentially. Vector buckets use a simple and transparent billing model. You are charged only for vector data storage capacity and the amount of data scanned during retrieval. This can reduce costs by more than 90% compared to traditional methods.
Large scale: OSS vector buckets are designed with an architecture for large-scale vector data storage and can handle massive storage requirements. OSS uses a serverless architecture that scales elastically. When you use vector buckets, you do not need to worry about scaling out.
Easy to use: OSS vector buckets provide a complete set of APIs, SDKs, and the ossutil command line interface. You can also manage and perform read/write operations on vector data in the OSS console, such as retrieving, adding, and performing bulk inserts of vector data.
Unified management: You can manage vector buckets and buckets that store large amounts of raw data in the same way. For example, you can configure the same bucket policy for permission management or set the same log export path for operation audits.
Semantic retrieval: You can use the QueryVectors operation provided by vector buckets to query vector data in an index table and retrieve results sorted by similarity. OSS vector buckets also support filtering queries based on scalar metadata. You can include scalar metadata when you write vector data to an OSS vector bucket to enable post-filtering. When you create a vector index, you can also set non-filterable metadata. Non-filterable metadata cannot be used as a post-filtering condition but is returned with the retrieval results as descriptive information for the vector results.

Scenarios

Scenario 1: Build low-cost RAG applications

As AI businesses grow, the scale of vector data increases exponentially, which increases storage and retrieval costs. For multi-modal retrieval scenarios such as knowledge bases, AI assistants, and medical image retrieval, users are becoming more tolerant of retrieval latency, which can range from tens to hundreds of milliseconds. In these cases, using a vector bucket as the vector storage foundation for RAG applications can meet business requirements at an extremely low cost.

Scenario 2: Build AI agents with tiered retrieval

Different AI agents have different requirements for retrieval performance. You can store all vector data centrally in a low-cost OSS vector bucket. For business scenarios that require high performance and low latency, you can synchronize hot data to other products, such as Tablestore, for high-performance retrieval. This lets you build an AI agent application architecture with tiered retrieval.

Scenario 3: Build an AI content management platform with unified data management

AI applications generate massive amounts of unstructured content, such as user-generated content (UGC), internal documents, and AI-generated content, along with their corresponding vectorized results. This can lead to fragmented storage and retrieval systems. By storing raw data in standard OSS buckets and vector data in OSS vector buckets, you can build an efficient AI data management platform for use cases such as AIGC data management. You need only one set of APIs or SDKs to manage and access both raw files and vector indexes, making it easy to build an efficient and unified AI content management platform.

Enterprise features

Endpoint access

Vector buckets provide separate public and internal endpoints that are isolated from standard OSS buckets.

Public endpoint: $bucketname-$uid.regionID.oss-vectors.aliyuncs.com
Internal endpoint: $bucketname-$uid.regionID-internal.oss-vectors.aliyuncs.com

Note: You must use a third-level domain for all operations except ListVectorBuckets.

Secure transfer

It uses HTTPS to encrypt data in transit.

Access control

Bucket policy: Supports resource-based authorization policies that allow you to control permissions at the vector bucket level or for one or more vector indexes.
RAM policy: Supports identity-based RAM authorization policies for fine-grained permission control over vector buckets, vector indexes, and data operations. These policies also support cross-account access authorization.

Logs

Access log export: Supports exporting access logs to a specified bucket in real time or near-real time.
Unified log format: The log format is fully compatible with standard OSS logs. It includes an additional BucketARN field to uniquely identify the vector bucket resource, which simplifies unified log analysis.

Quotas and limits

Vector buckets have certain quotas and limits. When you design and implement your vector storage and retrieval solution, plan your bucket quantity, index scale, metadata structure, and API call strategy based on the following limits.

A single Alibaba Cloud account can create a maximum of 10 vector buckets in a single region. To increase this quota, contact Technical Support.
A single vector bucket can contain a maximum of 100 vector indexes. To increase this quota, contact Technical Support.
A single vector index table can store a maximum of 50 million rows of vector data. To increase this quota to 2 billion rows per table, contact Technical Support.
Vector dimensions: 1 to 4096.
TopK range for vector retrieval requests: 1 to 30 by default. To increase the upper limit of TopK to 100, contact Technical Support.
Total size of a single vector array: 1 KB to 500 KB.
Total size of metadata (filterable and non-filterable) for a single vector: 40 KB.
Size of filterable metadata for a single vector: 2 KB.
Number of non-filterable metadata fields for a single vector: 10.
When filtering vectors using metadata:
- The cumulative length of filterable metadata in a single filter instruction cannot exceed 64 KB.
- The number of filterable metadata items in a single filter instruction cannot exceed 1024.
- The filter condition supports a maximum of 8 nested levels.
The request frequency for the PutVectorIndex API is limited to a maximum of 5 calls per second.
A maximum of 500 entries can be written in a single batch using the PutVectors API.
A maximum of 500 indexes are returned per page in the response of the ListVectorIndexes API. You can use paging to retrieve the next batch of indexes.
The maximum concurrency for the ListVectorIndexes API is 16.

Billing information

This feature is currently in a free invitational preview. To apply for a trial, go to the Vector Bucket page.