This topic offers a comprehensive guide on optimising the read performance of ossfs by setting parameters in read-only scenarios.
The following section applies to ossfs 1.91.3 and later. For more information about how to download and install the latest version of ossfs, see Install ossfs.
Modes
ossfs offers three mode policies designed for various scenarios.
Default mode
The default mode is suitable for random read-only operations on small files (files that can be entirely cached in the page cache) and large files. For example, when your AI training project has poor performance for reading images in direct read mode, even if you supplement some random read operations, we recommend that you switch to the default read mode.
When ossfs reads a file, the kernel caches a copy of the file from the mount point to the memory and writes the data to a file in the local disk. As a result, the cache size consumed by the read operation is twice the size of the file.
If the page cache on your operating system can cache up to 6 GB of data in dirty pages, the default read mode is theoretically suitable for reading a file that is less than 3 GB in size.
You can use the
parallel_count
parameter to adjust the number of concurrent download tasks and themultipart_size
parameter to set the amount of data that can be downloaded in one single task.
Direct read mode
The direct read mode is suitable for scenarios involving sequential reads of large files and allows a limited degree of random read access (such as reads that skip a few chunks). For example, in AI reasoning scenarios, you can use the direct read mode to load a large Safetensors file.
To enable the direct read mode, set the
-odirect_read
parameter to enabled.In direct read mode, ossfs retains in the memory the data within the range of
[-direct_read_backward_chunks * direct_read_chunk_size, +direct_read_prefetch_chunks * direct_read_chunk_size]
, wheredirect_read_chunk_size
specifies a chunk of 4 MB by default,direct_read_prefetch_chunks
defaults to 32, anddirect_read_backward_chunks
defaults to 1. By default, ossfs retains in the memory the data within the range of[-4 MB, +128 MB]
. The direct read mode also supports random reads that are limited to a small range. For example, if two consecutive reads from a Safetensors file fall within the range of[-32 MB, +32 MB]
, you can set-odirect_read_backward_chunks=8
to retain 32 MB of data prior to the current offset.You can adjust the
direct_read_prefetch_chunks
anddirect_read_chunk_size
parameters to increase the amount of data that can be prefetched in parallel for maximized bandwidth usage.
Hybrid read mode
The hybrid read mode is suitable for read-only operations on a combination of small files (files that can be entirely cached in the page cache) and large files and also allows a limited degree of random read access, such as reads that skip a few chunks. For example, you can use the hybrid read mode in AI inference scenarios to load a large Safetensors file. If random reads span a wide offset range, the hybrid read mode provides lower read performance compared with the default read mode.
To enable the hybrid read mode, set the
-odirect_read
parameter to enabled.You must configure the
direct_read_local_file_cache_size_mb
parameter to set the data size threshold beyond which the direct read mode is used. For example, if your machine offers up to 6 GB of page cache, you can set-odirect_read_local_file_cache_size_mb=3072
to switch to the direct read mode when the downloaded data reaches 3 GB.In hybrid read mode, ossfs retains in the memory the data within the range of
[-direct_read_backward_chunks * direct_read_chunk_size, +direct_read_prefetch_chunks * direct_read_chunk_size]
, wheredirect_read_chunk_size
specifies a chunk of 4 MB by default,direct_read_prefetch_chunk
defaults to 32, anddirect_read_backward_chunks
defaults to 1. By default, ossfs retains in the memory the data within the range of[-4 MB, +128 MB]
. Random reads that are limited to a small range are supported. For example, if two consecutive reads from a Safetensors file fall within the range of[-32 MB, +32 MB]
, you can set-odirect_read_backward_chunks=8
to retain 32 MB of data prior to the current offset.You can adjust the
direct_read_prefetch_chunks
anddirect_read_chunk_size
parameters to increase the amount of data that can be prefetched in parallel for maximized bandwidth usage.
Recommendations
If ossfs reads a file and writes the file at the same time, use the default read mode.
If ossfs only reads a file, or reads a file and writes a different file:
Target
Description
Only small files
Use the default read mode.
Only large files
To sequentially read a large file or randomly read specific Safetensors files that have read offsets that cover a narrow range, use the direct read mode.
To perform random read operations that cover a wide offset range, use the default read mode.
If you are uncertain of the appropriate read mode for your business scenarios or the performance remains unsatisfactory after you adjust the direct_read_backward_chunks parameter in direct read mode, use the default read mode.
Small files and large files
To sequentially read a large file or randomly read specific Safetensors files that have read offsets that cover a narrow range, use the hybrid read mode.
If you need to perform random reads with offsets covering a wide range, are uncertain of the appropriate read mode, or still experience unsatisfactory performance after you adjust the direct_read_backward_chunks parameter in hybrid read mode, use the default read mode.
NoteWhen the direct read mode or the hybrid read mode provides unsatisfactory performance, switch to the default mode to store data to local disks. In default read mode, disk performance is a performance constraint of ossfs read performance. We recommend that you use a disk of a higher performance level. For example, you can use ESSD AutoPL disks with appropriate provisioned performance and burst performance settings.