All Products
Search
Document Center

Object Storage Service:Overview of vector buckets

Last Updated:Mar 23, 2026

Vector buckets store and query massive-scale vector data at a fraction of traditional costs. Built on OSS's serverless architecture, vector buckets deliver scalable vector storage for AI applications like retrieval-augmented generation (RAG), multi-modal search, and AI agents.

Reduce vector storage costs by over 90% compared to traditional vector database solutions while handling billions of vectors with elastic scaling.

Capabilities

Vector buckets provide cost-efficient vector storage and semantic retrieval for AI-driven applications:

  • Low-cost RAG applications: Store embeddings from knowledge bases, documents, and multi-modal content with query latency of tens to hundreds of milliseconds—suitable for scenarios where moderate response times are acceptable.

  • Tiered retrieval architectures: Store all vector data in a low-cost vector bucket as your primary storage layer. Sync frequently accessed data to high-performance services like Tablestore for latency-sensitive queries.

  • AI content at scale: Store raw files (documents, images, videos) in standard OSS buckets alongside their vector embeddings in vector buckets. Use a single API to manage both.

Benefits

Low cost

Pay only for vector data storage capacity and the amount of data scanned during retrieval—reducing costs by more than 90% compared to traditional vector database deployments.

Large scale

Serverless architecture scales elastically to accommodate growing data volumes. No capacity provisioning or scaling operations required.

Easy to use

  • Full API and SDK support for programmatic access

  • ossutil for batch operations

  • OSS console for visual management, including vector retrieval, data insertion, and bulk imports

Unified management

Manage vector buckets using the same workflows as standard OSS buckets. Apply consistent bucket policies for permission management, configure identical log export paths for operation audits, and use familiar OSS tools across both raw data and vector data storage.

Semantic retrieval

Query vector data using the QueryVectors operation, which returns results ranked by similarity. Vector buckets support scalar filtering through filterable metadata—attach metadata when writing vector data, then use it to narrow query results. Non-filterable metadata returns with query results as descriptive information but cannot be used as filter conditions.

Core concepts

  • Vector bucket: A bucket type designed for managing large-scale vector data as a cloud resource. Like standard OSS buckets, vector buckets provide storage and access control, optimized for vector data operations.

  • Vector index: An index table that stores vector data within a vector bucket. Create multiple vector indexes in a single bucket to organize vectors by business type or use case. Query results are ranked by similarity based on the data in the target index.

  • Vector data: High-dimensional numerical arrays created by converting unstructured data (images, videos, documents) using vectorization services. Generate vectors using any service—ECS, PAI, Alibaba Cloud Model Studio, or third-party platforms—then write them to a vector index via the OSS API, SDK, or ossutil. Attach metadata when writing to enable scalar filtering queries.

Use cases

Low-cost RAG applications

As AI businesses scale, vector data grows exponentially, driving up storage and retrieval costs. Multi-modal retrieval scenarios like knowledge bases, AI assistants, and medical image search increasingly tolerate retrieval latency in the tens to hundreds of milliseconds range.

Store embeddings from documents, images, and other content sources at scale, then query them using semantic similarity. Storage costs are optimized for large data volumes while maintaining retrieval performance suitable for user-facing applications.

AI agents with tiered retrieval

Different AI agents have varying retrieval performance needs. Store all vector data centrally in a low-cost vector bucket as your primary storage layer. For scenarios requiring high performance and low latency, synchronize hot data to high-performance products like Tablestore.

This tiered approach balances cost and performance: cold data remains in affordable storage while hot data is cached in fast-retrieval systems. The architecture scales as your application grows, with clear separation between storage and performance layers.

AI content management platform with unified data management

AI applications generate massive amounts of unstructured content—user-generated content (UGC), internal documents, AI-generated content—along with their vectorized representations. Managing these assets often leads to fragmented storage and retrieval systems.

Store raw data in standard OSS buckets and vector data in OSS vector buckets to build an efficient AI data management platform. Use a single set of APIs and SDKs to manage and access both raw files and vector indexes, simplifying your infrastructure for AIGC data management and similar use cases.

Enterprise features

Endpoint access

Vector buckets provide separate public and internal endpoints that are isolated from standard OSS buckets.

Endpoint format:

  • Public: $bucketname-$uid.$regionID.oss-vectors.aliyuncs.com

  • Internal: $bucketname-$uid.$regionID-internal.oss-vectors.aliyuncs.com

Where:

  • $bucketname: The name of your vector bucket

  • $uid: Your Alibaba Cloud account ID

  • $regionID: The region identifier where your vector bucket is located (for example, cn-hangzhou, us-west-1)

Example:

  • Public: my-vectors-123456789.cn-hangzhou.oss-vectors.aliyuncs.com

  • Internal: my-vectors-123456789.cn-hangzhou-internal.oss-vectors.aliyuncs.com

Note

Use a third-level domain for all operations except ListVectorBuckets.

Secure transfer

Vector buckets use HTTPS to encrypt data in transit, protecting your vector data during transmission between clients and OSS.

Access control

Vector buckets support granular access control through two mechanisms:

  • Bucket policy: Resource-based authorization policies that control permissions at the vector bucket level or for one or more vector indexes within a bucket. Use bucket policies to grant cross-account access or manage permissions based on resources.

  • RAM policy: Identity-based authorization policies for fine-grained permission control over vector buckets, vector indexes, and data operations. RAM policies support cross-account access authorization and integrate with your existing identity management workflows.

Logs

Vector buckets provide comprehensive logging:

  • Access log export: Export access logs to a specified bucket in real time or near-real time for auditing and analysis.

  • Unified log format: Log format is fully compatible with standard OSS logs, with an additional BucketARN field to uniquely identify the vector bucket resource. This compatibility simplifies unified log analysis across standard and vector buckets.

Quotas and limits

The following quotas apply to all regions unless otherwise specified. To request a quota increase, submit a ticket to contact Technical Support.

Resource

Quota

Notes

Vector buckets per account per region

10

Increase available on request

Vector indexes per vector bucket

100

Increase available on request

Vector data rows per vector index

50 million

Increase to 2 billion rows available on request

Vector dimensions

1 to 4096

-

TopK range for retrieval requests

1 to 30

Increase to 100 available on request

Single vector array size

1 KB to 500 KB

-

Total metadata size per vector

40 KB

Includes both filterable and non-filterable metadata

Filterable metadata size per vector

2 KB

-

Non-filterable metadata fields per vector

10

-

Filterable metadata cumulative length per filter instruction

64 KB

-

Filterable metadata items per filter instruction

1024

-

Filter condition nested levels

8

Maximum nesting depth

PutVectorIndex API request frequency

5 calls per second

-

PutVectors API batch write entries

500 per request

-

ListVectorIndexes API page size

500 indexes

Use paging to retrieve additional results

ListVectorIndexes API concurrency

16

Maximum concurrent requests