ossfs metadata cache is suitable for scenarios in which a server has high read/write IOPS on Object Storage Service (OSS) data. After you enable ossfs metadata cache, the overall efficiency of object operations and the response time of requests are improved. This topic describes how to configure and effectively use ossfs metadata cache.
When you use ossfs metadata cache, pay attention to data consistency and timeliness issues. Therefore, we recommend that you do not enable ossfs metadata cache in scenarios that have high requirements on the timeliness of data.
Background information
Metadata refers to information that describes data, such as object size, creation time, modification time, and user and group IDs. User and group IDs are attributes that are not supported by OSS, but file systems rely on these attributes for permission checks. ossfs allows you to obtain additional information from the custom headers of objects in OSS. This way, you can perform operations on objects based on their attributes in Linux.
After you enable ossfs metadata cache, performance, resource usage, and user experience are improved.
Performance: ossfs metadata cache reduces the latency of metadata reading, especially in scenarios that involve high I/O operations, which can improve the overall efficiency of object operations.
Resource usage: ossfs metadata cache reduces the number of calls to OSS and the queries per second (QPS) when you frequently access hot data.
User experience: The response time of the requests is improved.
Scenarios
ossfs metadata cache is suitable for scenarios in which you use a server to access OSS data.
In a distributed environment, ossfs metadata cache is suitable for scenarios in which data that does not frequently change is read from OSS. For example, AI training datasets and AI model files are read from OSS and big data query is performed.
How it works
ossfs uses the client memory to cache OSS metadata to reduce the latency of remote storage operations.
Metadata cache for first data access:
The first time you access an object or a directory under the ossfs mount point, the ossfs client obtains the metadata of the object from OSS and stores it in the local cache.
Subsequent access acceleration:
If the cache deletion policy is disabled or the cache does not expire, subsequent access to the metadata of the object is directly read from the local cache without the need to send requests to OSS, which greatly reduces the latency.
Cache update policies:
ossfs updates the local cache based on specific policies, such as cache expiration and cache upper limit.
Multi-client cache synchronization:
ossfs metadata cache is a single-server local cache that uses the client memory. You cannot use a server to mount multiple buckets to multiple local file systems at a time or synchronize metadata changes between multiple servers.
Mode comparison
The following table describes the differences before and after you enable ossfs metadata cache.
ossfs metadata cache | Command | Request method | Operation | Performance |
ossfs metadata cache disabled | stat | ossfs sends HeadObject requests to a bucket to obtain object metadata. | ossfs sends a HeadObject request to obtain only one object from the bucket. | The metadata of objects is obtained from the bucket, which is slower than reading from the memory. |
ls | ossfs sends a ListObject request to a bucket to obtain objects in directories and sends HeadObject requests to obtain object metadata. | After ossfs sends a ListObject request, ossfs sends a HeadObject request to obtain only one object in a directory from the bucket. | ||
ossfs metadata cache enabled (metadata cache does not expire) | stat | ossfs obtains object metadata from the local memory. | ossfs obtains objects from the memory. | Object metadata is read from the local memory, which is faster. |
ls | ossfs sends a ListObject request to a bucket to obtain objects in directories and obtains object metadata from the local memory. | After ossfs sends a ListObject request, ossfs obtains objects in directories. If you want to access a specific object in a directory, ossfs obtains the object from the local memory. |
Parameters
The following table describes the parameters that you can configure for ossfs metadata cache.
Parameter | Description | Value |
max_stat_cache_size | Specifies whether to enable metadata cache and the maximum size of the metadata cache. Specify the parameter based on the number of frequently accessed objects in OSS. If the memory is sufficient, we recommend that you set the parameter to a larger value to improve the operation performance.
| Default value: 100,000. Unit: object or directory. Size: approximately 40 MB. |
stat_cache_expire | Specifies whether to enable the cache deletion policy for the metadata cache and change the validity period of the metadata cache. We recommend that you specify the validity period of the metadata cache based on your business requirements.
Note By default, ossfs enables the cache deletion policy and sets the upper limit for the metadata cache to 100,000, which consumes approximately 40 MB of memory. | Default value: 900. Unit: seconds. |
readdir_optimize | Specifies whether to use cache optimization. Default value: false. After you specify the parameter, ossfs does not send a HeadObject request to obtain the object metadata, such as To use cache optimization, specify | Default value: false. |
Metadata cache management mechanism
The following table describes the metadata cache management mechanism of ossfs.
Cache status | Operation |
Object metadata is cached and the cache deletion policy is disabled or the cache does not expire. | Read object metadata directly from the cache. |
Object metadata is cached and the cache expires. | Update the cache. |
Object metadata is not cached and the cache capacity is available. | Cache objects. |
Object metadata is not cached, the cache capacity is fully consumed, and the cache deletion policy is enabled. | Traverse the cache and delete expired objects. |
Object metadata is not cached, the cache capacity is fully consumed, and the cache deletion policy is disabled. | Delete cached objects that have not been accessed for an extended period of time based on the least recently used (LRU) policy. |
Suggestions
If you do not pay attention to the metadata of objects, we recommend that you specify the
readdir_optimize
parameter to improve the performance of thelist
andfind
operations. After you enable ossfs metadata cache, symbolic links are no longer supported.In multi-client scenarios that have high real-time requirements for data updates, proceed with caution if you enable ossfs metadata cache. You can disable ossfs metadata cache by specifying the
-omax_stat_cache_size=0
parameter to maintain data consistency. However, performance degradation may occur and you may be charged additional fees.If your application requires strong data consistency across multiple clients, we recommend that you use Cloud Storage Gateway (CSG) or Cloud Parallel File Storage (CPFS). We recommend that you do not use ossfs.
If the number of frequently accessed objects in OSS is large, increase the value of the
max_stat_cache_size
parameter appropriately to prevent the cache from being frequently deleted.If you want to perform operations on a large number of objects and the server memory is insufficient, we recommend that you use ossutil or OSS SDKs to perform the operations. If you want to mount a bucket to a local file system, we recommend that you use CSG or CPFS.
If the number of frequently accessed objects in OSS is large, you can create multiple OSSFS mount points, each dedicated to a separate subdirectory. This setup distributes the workload across the mount points and consequently enhances the performance.