Businesses that use AI, data warehouses, and big data analytics require lower data access latency, higher queries per second (QPS), and greater throughput from services that run on Object Storage Service (OSS). To meet these demands, OSS provides the accelerator feature. The accelerator caches hot objects from OSS on high-performance NVMe SSD storage media in the same zone as your compute services. This provides data access with millisecond-level latency and high QPS.
Benefits
Low latency
The OSS accelerator is a zone-level service that you can deploy in the same zone as your computing resources to reduce network latency. The accelerator uses NVMe SSD media to lower storage access latency, providing an end-to-end download process with millisecond-level latency. This is highly effective for downloading model files for inference and querying hot data in data warehouses.
High IOPS
High-performance NVMe SSD media delivers hundreds of thousands of read input/output operations per second (IOPS). This is ideal for frequent reads of small files or data blocks that are hundreds of kilobytes in size.
High throughput density
The OSS accelerator provides high throughput for small amounts of data. This meets the burst read demands for small sets of hot data.
High throughput
The accelerator's bandwidth scales linearly with its capacity. It provides burst throughput of up to hundreds of GB/s.
Elastic scaling
Compute tasks are often periodic, and resource requirements vary by task. You can scale the accelerator out or in online as needed. This prevents resource waste and lowers your costs. The accelerator supports cache sizes from 50 GB to hundreds of terabytes. It also inherits the massive data storage benefits of OSS and can directly cache multiple tables or partitions from a data warehouse.
Separation of storage and compute
Compared to cache space on a compute server, the OSS accelerator's space and performance can be adjusted online and independently of the compute server.
Data consistency
The accelerator provides strong consistency for OSS data, a feature not available in traditional caching solutions. When a file in OSS is updated, the accelerator automatically detects and caches the updated file. This ensures that the compute engine always reads the latest data.
Multiple prefetch policies
The OSS accelerator automatically detects file updates in OSS, ensuring that the DPI engine reads the latest data. The OSS accelerator provides the following prefetch policies.
Read-through prefetch: If a read request results in a cache miss in the accelerator, the accelerator automatically performs an origin fetch from OSS. The data is then retrieved and stored in the accelerator.
Synchronous prefetch: When data is written to OSS, it is also synchronously cached in the accelerator.
Asynchronous prefetch: You can configure the accelerator to batch-cache data from OSS to the accelerator.
NoteRead-through prefetch is enabled by default and cannot be configured.
Synchronous and asynchronous prefetch must be manually enabled. You can use both features at the same time.
How it works
After you create an accelerator, it is assigned a zone-specific internal accelerated domain name. This domain name can only be accessed from within the internal network and does not support public network access. For example, the accelerated domain name for Zone H in the China (Beijing) region is cn-beijing-h-internal.oss-data-acc.aliyuncs.com. If you are in the same virtual private cloud (VPC) as the accelerator, you can access resources in the accelerator through the accelerated domain name. The process is as follows.
Write requests
Read-through prefetch: Write requests sent from a client to the accelerated domain name are forwarded directly to the OSS bucket. The process is the same as using the default OSS domain name.
Synchronous prefetch: Write requests sent from a client to the accelerated domain name are forwarded directly to both the OSS bucket and the OSS accelerator.
Asynchronous prefetch: Data that needs to be prefetched is written to the OSS accelerator before access requests begin.
Synchronous + asynchronous prefetch: Requests are forwarded directly to both the OSS bucket and the OSS accelerator. In addition, hot data can be written to the OSS accelerator before access requests begin.
Read requests
NoteThe read request process is the same for all prefetch policies.
Read requests sent from a client to the accelerated domain name are forwarded to the OSS accelerator.
After the accelerator receives a read request, it searches for the object file in its cache:
If the object file exists in the cache, the file is returned directly to the client.
If the object file does not exist in the cache, the accelerator requests the object file from the attached OSS bucket. After the file is retrieved, the accelerator caches the object file and returns it to the client.
When the accelerator cache is full, the accelerator replaces less frequently accessed files with more frequently accessed files based on access frequency.
Scenarios
The OSS accelerator is suitable for scenarios that require high bandwidth and involve repeated data reads. Specific scenarios are as follows:
Low-latency data sharing
Background
A customer buys an item from a smart vending machine. They use a mobile app to scan and photograph the item, and the photo is uploaded. The application backend receives the image and stores it in the OSS accelerator. Backend subsystems then perform Content Moderation analysis and bar code detection on the image. The bar code detection result is sent back to the application backend for payment processing and other operations. The image must be downloaded in milliseconds.
Solution
Use the OSS accelerator with synchronous prefetch enabled. The OSS accelerator can effectively reduce the image loading latency for the analysis system, which shortens the transaction process. The OSS accelerator is ideal for latency-sensitive services that involve repeated reads.
Model inference
Background
AI model inference requires pulling and loading model files. During inference debugging, you may also need to constantly switch to new model files for testing. As model files become larger, the time required for the inference server to pull the files increases.
Solution
Use the OSS accelerator with asynchronous prefetch or read-through prefetch. Asynchronous prefetch is suitable for scenarios where you know which model files are hot data. Read-through prefetch is suitable for scenarios where the hot model files are uncertain. If you have a list of hot model files, you can configure the accelerator accordingly and use the accelerator software development kit (SDK) to prefetch the specified OSS files into the accelerator. You can also configure an accelerator of a certain size based on experience. The accelerator automatically caches files during the first read for faster access on subsequent reads. The accelerator can be scaled at any time based on performance requirements. If your inference program needs to access OSS through a local directory, you must deploy ossfs.
Big data analytics
Background
A company's business data is partitioned by day and archived in OSS for long-term storage. Analysts use compute engines such as Hive or Spark to analyze the data, but the query scope is uncertain. The analysts want to minimize the query analysis time.
Solution
Use the OSS accelerator with read-through prefetch. This mode is suitable for offline query scenarios with large amounts of data where the query scope is uncertain and accurate prefetching is not possible. For example, data queried by Analyst A is cached in the accelerator cluster. If a query from Analyst B includes data that was previously queried by Analyst A, the data analytics process is accelerated.
Multi-level acceleration
Background
Client-side caching and server-side acceleration are not mutually exclusive. You can combine them to achieve multi-level acceleration based on your business needs.
Solution
Use the OSS accelerator in conjunction with client-side caching. Deploy the client-side cache on the same host as the compute cluster. If a read request results in a client-side cache miss, the data is read from the backend storage, which is the OSS accelerator. Use read-through prefetch for the OSS accelerator, which caches data on the first access. Because the cache space on the client host is limited, a TTL is set for each file and directory in the client-side cache. When the TTL expires, the cached data is evicted to save space. The data in the OSS accelerator is not immediately evicted. Its cache can store hundreds of terabytes of data. If data that missed the client-side cache is read again, it can be loaded directly from the OSS accelerator. This achieves two-level acceleration.
Metrics
Metric | Description |
Capacity |
If your business scenario requires a higher capacity, submit a ticket to request a capacity increase. |
Accelerator bandwidth | The accelerator provides throughput bandwidth for cached data based on the configured space. Each terabyte of accelerator space provides a maximum bandwidth of 2.4 Gbps. The throughput bandwidth provided by the accelerator is in addition to the standard OSS bandwidth and is not limited by the standard OSS bandwidth capabilities. For more information about standard OSS bandwidth limits, see Limits and performance metrics. For example, in the China (Shenzhen) region, OSS provides a standard bandwidth of 100 Gbps. If you enable an accelerator and configure 10 TB of accelerator space, you can get an additional 24 Gbps of low-latency bandwidth through the accelerated domain name. For batch offline computing applications, use the internal same-region endpoint of OSS to take advantage of the 100 Gbps standard bandwidth with large-scale concurrent reads of large blocks. For hot data query services, you can access data cached on NVMe SSD media through the OSS accelerated domain name to get an additional 24 Gbps of low-latency throughput. |
Read bandwidth (peak) | Formula: MAX[600, 300 × Capacity (TB)] MB/s
For example, an accelerator with a 2048 GB (2 TB) capacity has a read bandwidth of 600 MB/s. |
Maximum read bandwidth | 40 GB/s (320 Gbps) If your business scenario requires higher read bandwidth, submit a ticket to request an increase. |
Minimum read latency for a single 128 KB request | <10 ms |
Scaling interval | Can be modified once per hour |
Scaling method | Manual scaling through the console |
Cache eviction policy | The Least Recently Used (LRU) cache eviction policy is used. The LRU policy ensures that frequently accessed data is retained, while data that has not been accessed for a long time is prioritized for removal. This allows for efficient use of the cache space. |
Billing
When you use an OSS accelerator, you are charged based on the configured capacity of the accelerator and its usage duration. For more information, see OSS accelerator pricing.
When you read or write data in OSS through an accelerated domain name, OSS request fees are incurred even if no origin fetch occurs.
Next steps
For more information about how to create an OSS accelerator and modify its capacity, see Create, modify, and delete an accelerator.
For more information about how to configure and use the OSS accelerator with common OSS tools and the OSS SDK, see Use an accelerator.
For more information about the performance differences between using an internal same-region endpoint of OSS and using an OSS accelerator in specific business scenarios, see Performance metrics.