All Products
Search
Document Center

Object Storage Service:OSS accelerator overview

Last Updated:Oct 10, 2025

Businesses that use AI, data warehouses, and big data analytics require lower data access latency, higher queries per second (QPS), and greater throughput from services that run on Object Storage Service (OSS). To meet these demands, OSS provides the accelerator feature. The accelerator caches hot objects from OSS on high-performance NVMe SSD storage media in the same zone as your compute services. This provides data access with millisecond-level latency and high QPS.

Benefits

  • Low latency

    The OSS accelerator is a zone-level service that you can deploy in the same zone as your computing resources to reduce network latency. The accelerator uses NVMe SSD media to lower storage access latency, providing an end-to-end download process with millisecond-level latency. This is highly effective for downloading model files for inference and querying hot data in data warehouses.

  • High IOPS

    High-performance NVMe SSD media delivers hundreds of thousands of read input/output operations per second (IOPS). This is ideal for frequent reads of small files or data blocks that are hundreds of kilobytes in size.

  • High throughput density

    The OSS accelerator provides high throughput for small amounts of data. This meets the burst read demands for small sets of hot data.

  • High throughput

    The accelerator's bandwidth scales linearly with its capacity. It provides burst throughput of up to hundreds of GB/s.

  • Elastic scaling

    Compute tasks are often periodic, and resource requirements vary by task. You can scale the accelerator out or in online as needed. This prevents resource waste and lowers your costs. The accelerator supports cache sizes from 50 GB to hundreds of terabytes. It also inherits the massive data storage benefits of OSS and can directly cache multiple tables or partitions from a data warehouse.

  • Separation of storage and compute

    Compared to cache space on a compute server, the OSS accelerator's space and performance can be adjusted online and independently of the compute server.

  • Data consistency

    The accelerator provides strong consistency for OSS data, a feature not available in traditional caching solutions. When a file in OSS is updated, the accelerator automatically detects and caches the updated file. This ensures that the compute engine always reads the latest data.

  • Multiple prefetch policies

    The OSS accelerator automatically detects file updates in OSS, ensuring that the DPI engine reads the latest data. The OSS accelerator provides the following prefetch policies.

    • Read-through prefetch: If a read request results in a cache miss in the accelerator, the accelerator automatically performs an origin fetch from OSS. The data is then retrieved and stored in the accelerator.

    • Synchronous prefetch: When data is written to OSS, it is also synchronously cached in the accelerator.

    • Asynchronous prefetch: You can configure the accelerator to batch-cache data from OSS to the accelerator.

      Note
      • Read-through prefetch is enabled by default and cannot be configured.

      • Synchronous and asynchronous prefetch must be manually enabled. You can use both features at the same time.

How it works

After you create an accelerator, it is assigned a zone-specific internal accelerated domain name. This domain name can only be accessed from within the internal network and does not support public network access. For example, the accelerated domain name for Zone H in the China (Beijing) region is cn-beijing-h-internal.oss-data-acc.aliyuncs.com. If you are in the same virtual private cloud (VPC) as the accelerator, you can access resources in the accelerator through the accelerated domain name. The process is as follows.

image
  • Write requests

    • Read-through prefetch: Write requests sent from a client to the accelerated domain name are forwarded directly to the OSS bucket. The process is the same as using the default OSS domain name.

    • Synchronous prefetch: Write requests sent from a client to the accelerated domain name are forwarded directly to both the OSS bucket and the OSS accelerator.

    • Asynchronous prefetch: Data that needs to be prefetched is written to the OSS accelerator before access requests begin.

    • Synchronous + asynchronous prefetch: Requests are forwarded directly to both the OSS bucket and the OSS accelerator. In addition, hot data can be written to the OSS accelerator before access requests begin.

  • Read requests

    Note

    The read request process is the same for all prefetch policies.

    1. Read requests sent from a client to the accelerated domain name are forwarded to the OSS accelerator.

    2. After the accelerator receives a read request, it searches for the object file in its cache:

      • If the object file exists in the cache, the file is returned directly to the client.

      • If the object file does not exist in the cache, the accelerator requests the object file from the attached OSS bucket. After the file is retrieved, the accelerator caches the object file and returns it to the client.

      • When the accelerator cache is full, the accelerator replaces less frequently accessed files with more frequently accessed files based on access frequency.

Scenarios

The OSS accelerator is suitable for scenarios that require high bandwidth and involve repeated data reads. Specific scenarios are as follows:

Low-latency data sharing

  • Background

    A customer buys an item from a smart vending machine. They use a mobile app to scan and photograph the item, and the photo is uploaded. The application backend receives the image and stores it in the OSS accelerator. Backend subsystems then perform Content Moderation analysis and bar code detection on the image. The bar code detection result is sent back to the application backend for payment processing and other operations. The image must be downloaded in milliseconds.

  • Solution

    Use the OSS accelerator with synchronous prefetch enabled. The OSS accelerator can effectively reduce the image loading latency for the analysis system, which shortens the transaction process. The OSS accelerator is ideal for latency-sensitive services that involve repeated reads.

    image

Model inference

  • Background

    AI model inference requires pulling and loading model files. During inference debugging, you may also need to constantly switch to new model files for testing. As model files become larger, the time required for the inference server to pull the files increases.

  • Solution

    Use the OSS accelerator with asynchronous prefetch or read-through prefetch. Asynchronous prefetch is suitable for scenarios where you know which model files are hot data. Read-through prefetch is suitable for scenarios where the hot model files are uncertain. If you have a list of hot model files, you can configure the accelerator accordingly and use the accelerator software development kit (SDK) to prefetch the specified OSS files into the accelerator. You can also configure an accelerator of a certain size based on experience. The accelerator automatically caches files during the first read for faster access on subsequent reads. The accelerator can be scaled at any time based on performance requirements. If your inference program needs to access OSS through a local directory, you must deploy ossfs.

    image

Big data analytics

  • Background

    A company's business data is partitioned by day and archived in OSS for long-term storage. Analysts use compute engines such as Hive or Spark to analyze the data, but the query scope is uncertain. The analysts want to minimize the query analysis time.

  • Solution

    Use the OSS accelerator with read-through prefetch. This mode is suitable for offline query scenarios with large amounts of data where the query scope is uncertain and accurate prefetching is not possible. For example, data queried by Analyst A is cached in the accelerator cluster. If a query from Analyst B includes data that was previously queried by Analyst A, the data analytics process is accelerated.

    image

Multi-level acceleration

  • Background

    Client-side caching and server-side acceleration are not mutually exclusive. You can combine them to achieve multi-level acceleration based on your business needs.

  • Solution

    Use the OSS accelerator in conjunction with client-side caching. Deploy the client-side cache on the same host as the compute cluster. If a read request results in a client-side cache miss, the data is read from the backend storage, which is the OSS accelerator. Use read-through prefetch for the OSS accelerator, which caches data on the first access. Because the cache space on the client host is limited, a TTL is set for each file and directory in the client-side cache. When the TTL expires, the cached data is evicted to save space. The data in the OSS accelerator is not immediately evicted. Its cache can store hundreds of terabytes of data. If data that missed the client-side cache is read again, it can be loaded directly from the OSS accelerator. This achieves two-level acceleration.

    image

Metrics

Metric

Description

Capacity

  • After public preview: up to 100 TB

  • During public preview: up to 500 GB

If your business scenario requires a higher capacity, submit a ticket to request a capacity increase.

Accelerator bandwidth

The accelerator provides throughput bandwidth for cached data based on the configured space. Each terabyte of accelerator space provides a maximum bandwidth of 2.4 Gbps. The throughput bandwidth provided by the accelerator is in addition to the standard OSS bandwidth and is not limited by the standard OSS bandwidth capabilities. For more information about standard OSS bandwidth limits, see Limits and performance metrics.

For example, in the China (Shenzhen) region, OSS provides a standard bandwidth of 100 Gbps. If you enable an accelerator and configure 10 TB of accelerator space, you can get an additional 24 Gbps of low-latency bandwidth through the accelerated domain name. For batch offline computing applications, use the internal same-region endpoint of OSS to take advantage of the 100 Gbps standard bandwidth with large-scale concurrent reads of large blocks. For hot data query services, you can access data cached on NVMe SSD media through the OSS accelerated domain name to get an additional 24 Gbps of low-latency throughput.

Read bandwidth (peak)

Formula: MAX[600, 300 × Capacity (TB)] MB/s

  • MAX[] indicates the greater of the two values in the brackets. 600 MB/s is the guaranteed base bandwidth. This means a minimum bandwidth of 600 MB/s is provided regardless of the capacity.

  • 300 × Capacity (TB) is the part of the bandwidth that scales linearly with the storage capacity, where capacity is measured in TB.

For example, an accelerator with a 2048 GB (2 TB) capacity has a read bandwidth of 600 MB/s.

Maximum read bandwidth

40 GB/s (320 Gbps)

If your business scenario requires higher read bandwidth, submit a ticket to request an increase.

Minimum read latency for a single 128 KB request

<10 ms

Scaling interval

Can be modified once per hour

Scaling method

Manual scaling through the console

Cache eviction policy

The Least Recently Used (LRU) cache eviction policy is used. The LRU policy ensures that frequently accessed data is retained, while data that has not been accessed for a long time is prioritized for removal. This allows for efficient use of the cache space.

Billing

  • When you use an OSS accelerator, you are charged based on the configured capacity of the accelerator and its usage duration. For more information, see OSS accelerator pricing.

  • When you read or write data in OSS through an accelerated domain name, OSS request fees are incurred even if no origin fetch occurs.

Next steps

  • For more information about how to create an OSS accelerator and modify its capacity, see Create, modify, and delete an accelerator.

  • For more information about how to configure and use the OSS accelerator with common OSS tools and the OSS SDK, see Use an accelerator.

  • For more information about the performance differences between using an internal same-region endpoint of OSS and using an OSS accelerator in specific business scenarios, see Performance metrics.