ossfs data cache is suitable for scenarios in which a small amount of data is frequently read. After ossfs data cache is enabled, it can effectively reduce the read latency and improve the efficiency of operations on objects. This topic describes how to configure and effectively use ossfs data cache.
When you use cached data, you must pay attention to the consistency and timeliness of the data. Therefore, we recommend that you do not enable ossfs data cache in scenarios in which you have high requirements for the real-time performance of data.
Background information
When ossfs is used to access objects, ossfs creates a temporary file named tmpfile in the /tmp directory by default as the disk cache, which is invisible to users. When all opened file handles are closed, the system automatically deletes the temporary file. However, in business scenarios in which the same objects are frequently accessed, this mechanism may require the downloading of the objects from OSS each time you open, read, and close the objects, which consumes a large amount of OSS throughput.
By enabling ossfs data cache, you can effectively reduce the throughput pressure on remote storage caused by frequent reading of hot data, reduce the read latency, and reduce the number of accesses to remote storage.
Scenarios
ossfs data cache is suitable for scenarios in which a small amount of data is frequently read. You must make sure that the local disk capacity is sufficient to store hot data.
Comparison between modes
The following table describes the differences between the ossfs default mode (no data cache) and ossfs data cache mode. In the ossfs default mode, we recommend that you use the tmpdir option to move the temporary file from the /tmp directory to the data disk directory to prevent the system disk capacity from being occupied.
Mode | Read operation | Write operation | Performance |
ossfs default mode (no data cache) | Each time you read data, data is downloaded from the bucket to a temporary file and then read from the temporary file. The temporary file is deleted when all file handles are closed. | A temporary file is generated and is deleted only when all file handles are closed. When a large number of files is written to the temporary file, a large amount of disk capacity is consumed. If the local disk capacity is insufficient, the system uploads the parts of the temporary file in advance and deletes the cache. In this case, the write speed is limited by the upload speed of OSS. | The performance of this mode varies based on the disk performance and the latency of OSS. |
ossfs data cache | The first time you read objects from the bucket, a file is created in the cache path and objects are stored in the file. You can repeatedly read the objects from the file. | Data is written to the local cache first. When a file is refreshed or closed, the file is uploaded to the bucket. If the local cache capacity is insufficient, ossfs uploads the parts of the file in advance and deletes the cache. In this case, the write speed is limited by the upload speed of OSS. | If a local cache exists, the performance of this mode varies based on the disk performance. If a local cache does not exist, the performance of this mode is the same as that of the default mode. In this case, the performance of this mode varies based on the disk performance and the latency of OSS. |
Configuration options
The following table describes the parameters that you can configure for ossfs data cache.
Parameter | Description | Default value |
use_cache | Enables ossfs data cache and specifies the path of data cache. By default, ossfs data cache is disabled. For example, if you specify | None |
ensure_diskfree | The size of disk capacity reserved to prevent the disk capacity from being fully occupied and affecting other applications to write data. By default, the disk capacity is not reserved. Unit: MB. For example, if you want to set the ossfs reserved disk capacity to 1024 MB, you can specify | None |
del_cache | Specifies whether to delete the local cache. By default, the local cache is not deleted. For example, if you specify | None |
max_dirty_data | Specifies the size threshold to upload a file. If the size of the file is larger than the value, the file is uploaded even when the file is not closed. Unit: MB. For example, if you want to specify that a file greater than 2,000 MB in size is uploaded even when the file is not closed, you can specify | 5120 |
Data cache management mechanism
In ossfs data cache mode, the cached file on the disk is not deleted when the file is closed. This way, the cached file can be directly used when the file is read again in the future.
If the cache capacity is insufficient, the cached data is deleted.
During read and write operations, the system checks the usage of the disk capacity. If the disk capacity is insufficient, the data reclaim operation is triggered. The data reclaim operation is performed on a single cached file as the smallest unit.
Suggestions
If you frequently access hot objects, we recommend that you enable ossfs data cache and configure the path of the cache. This way, the objects can be directly obtained from the local cache during repeated reading. This reduces access latency and quickly loads frequently accessed objects. This also reduces the number of requests to OSS.
If the disk in which your cache is stored is used by other applications, we recommend that you specify the
ensure_diskfreeoption when you use the disk. This way, a specific amount of disk capacity can be reserved to ensure the normal operation of the system.We recommend that you do not enable ossfs data cache in scenarios in which multiple clients write data to or read data from ossfs or a client writes data to ossfs while multiple clients read data from ossfs, and data consistency across multiple clients is required.
If ossfs data cache is enabled, the cache must exist in scenarios in which a client writes data to ossfs. If you do not want to waste disk capacity, we recommend that you use ossutil.
In scenarios in which large objects are read at a time, the performance of the cache is limited. Therefore, we recommend that you enable the direct read mode to reduce the usage of the disk capacity.
The performance of ossfs data cache varies based on the performance of the disk in which the cached data is stored. We recommend that you select disks that have high performance, such as ESSD AutoPL disks with performance burst enabled, elastic ephemeral disks, or local disks. For more information, see ESSD AutoPL disks.