Vector buckets store and query massive-scale vector data at a fraction of traditional costs. Built on OSS's serverless architecture, vector buckets deliver scalable vector storage for AI applications like retrieval-augmented generation (RAG), multi-modal search, and AI agents.
Reduce vector storage costs by over 90% compared to traditional vector database solutions while handling billions of vectors with elastic scaling.
Capabilities
Vector buckets provide cost-efficient vector storage and semantic retrieval for AI-driven applications:
Low-cost RAG applications: Store embeddings from knowledge bases, documents, and multi-modal content with query latency of tens to hundreds of milliseconds—suitable for scenarios where moderate response times are acceptable.
Tiered retrieval architectures: Store all vector data in a low-cost vector bucket as your primary storage layer. Sync frequently accessed data to high-performance services like Tablestore for latency-sensitive queries.
AI content at scale: Store raw files (documents, images, videos) in standard OSS buckets alongside their vector embeddings in vector buckets. Use a single API to manage both.
Benefits
Low cost
Pay only for vector data storage capacity and the amount of data scanned during retrieval—reducing costs by more than 90% compared to traditional vector database deployments.
Large scale
Serverless architecture scales elastically to accommodate growing data volumes. No capacity provisioning or scaling operations required.
Easy to use
Full API and SDK support for programmatic access
ossutil for batch operations
OSS console for visual management, including vector retrieval, data insertion, and bulk imports
Unified management
Manage vector buckets using the same workflows as standard OSS buckets. Apply consistent bucket policies for permission management, configure identical log export paths for operation audits, and use familiar OSS tools across both raw data and vector data storage.
Semantic retrieval
Query vector data using the QueryVectors operation, which returns results ranked by similarity. Vector buckets support scalar filtering through filterable metadata—attach metadata when writing vector data, then use it to narrow query results. Non-filterable metadata returns with query results as descriptive information but cannot be used as filter conditions.
Core concepts
Vector bucket: A bucket type designed for managing large-scale vector data as a cloud resource. Like standard OSS buckets, vector buckets provide storage and access control, optimized for vector data operations.
Vector index: An index table that stores vector data within a vector bucket. Create multiple vector indexes in a single bucket to organize vectors by business type or use case. Query results are ranked by similarity based on the data in the target index.
Vector data: High-dimensional numerical arrays created by converting unstructured data (images, videos, documents) using vectorization services. Generate vectors using any service—ECS, PAI, Alibaba Cloud Model Studio, or third-party platforms—then write them to a vector index via the OSS API, SDK, or ossutil. Attach metadata when writing to enable scalar filtering queries.
Use cases
Low-cost RAG applications
As AI businesses scale, vector data grows exponentially, driving up storage and retrieval costs. Multi-modal retrieval scenarios like knowledge bases, AI assistants, and medical image search increasingly tolerate retrieval latency in the tens to hundreds of milliseconds range.
Store embeddings from documents, images, and other content sources at scale, then query them using semantic similarity. Storage costs are optimized for large data volumes while maintaining retrieval performance suitable for user-facing applications.
AI agents with tiered retrieval
Different AI agents have varying retrieval performance needs. Store all vector data centrally in a low-cost vector bucket as your primary storage layer. For scenarios requiring high performance and low latency, synchronize hot data to high-performance products like Tablestore.
This tiered approach balances cost and performance: cold data remains in affordable storage while hot data is cached in fast-retrieval systems. The architecture scales as your application grows, with clear separation between storage and performance layers.
AI content management platform with unified data management
AI applications generate massive amounts of unstructured content—user-generated content (UGC), internal documents, AI-generated content—along with their vectorized representations. Managing these assets often leads to fragmented storage and retrieval systems.
Store raw data in standard OSS buckets and vector data in OSS vector buckets to build an efficient AI data management platform. Use a single set of APIs and SDKs to manage and access both raw files and vector indexes, simplifying your infrastructure for AIGC data management and similar use cases.
Enterprise features
Endpoint access
Vector buckets provide separate public and internal endpoints that are isolated from standard OSS buckets.
Endpoint format:
Public:
$bucketname-$uid.$regionID.oss-vectors.aliyuncs.comInternal:
$bucketname-$uid.$regionID-internal.oss-vectors.aliyuncs.com
Where:
$bucketname: The name of your vector bucket$uid: Your Alibaba Cloud account ID$regionID: The region identifier where your vector bucket is located (for example,cn-hangzhou,us-west-1)
Example:
Public:
my-vectors-123456789.cn-hangzhou.oss-vectors.aliyuncs.comInternal:
my-vectors-123456789.cn-hangzhou-internal.oss-vectors.aliyuncs.com
Use a third-level domain for all operations except ListVectorBuckets.
Secure transfer
Vector buckets use HTTPS to encrypt data in transit, protecting your vector data during transmission between clients and OSS.
Access control
Vector buckets support granular access control through two mechanisms:
Bucket policy: Resource-based authorization policies that control permissions at the vector bucket level or for one or more vector indexes within a bucket. Use bucket policies to grant cross-account access or manage permissions based on resources.
RAM policy: Identity-based authorization policies for fine-grained permission control over vector buckets, vector indexes, and data operations. RAM policies support cross-account access authorization and integrate with your existing identity management workflows.
Logs
Vector buckets provide comprehensive logging:
Access log export: Export access logs to a specified bucket in real time or near-real time for auditing and analysis.
Unified log format: Log format is fully compatible with standard OSS logs, with an additional BucketARN field to uniquely identify the vector bucket resource. This compatibility simplifies unified log analysis across standard and vector buckets.
Quotas and limits
The following quotas apply to all regions unless otherwise specified. To request a quota increase, submit a ticket to contact Technical Support.
Resource | Quota | Notes |
Vector buckets per account per region | 10 | Increase available on request |
Vector indexes per vector bucket | 100 | Increase available on request |
Vector data rows per vector index | 50 million | Increase to 2 billion rows available on request |
Vector dimensions | 1 to 4096 | - |
TopK range for retrieval requests | 1 to 30 | Increase to 100 available on request |
Single vector array size | 1 KB to 500 KB | - |
Total metadata size per vector | 40 KB | Includes both filterable and non-filterable metadata |
Filterable metadata size per vector | 2 KB | - |
Non-filterable metadata fields per vector | 10 | - |
Filterable metadata cumulative length per filter instruction | 64 KB | - |
Filterable metadata items per filter instruction | 1024 | - |
Filter condition nested levels | 8 | Maximum nesting depth |
PutVectorIndex API request frequency | 5 calls per second | - |
PutVectors API batch write entries | 500 per request | - |
ListVectorIndexes API page size | 500 indexes | Use paging to retrieve additional results |
ListVectorIndexes API concurrency | 16 | Maximum concurrent requests |