Elastic File Client (EFC) is a compute-side data access acceleration solution. It mounts an OSS bucket as a local file system and uses compute node memory and disk to build a high-speed, distributed, read-only cache. Instead of each node fetching the same data from OSS, EFC aggregates those requests into a single origin fetch and distributes the cached result over a peer-to-peer (P2P) private network—reducing latency and increasing throughput for AI training and inference workloads.
The EFC caching feature is currently in invitational preview and is free to use during this period.
When to use EFC caching
EFC caching improves performance when the same data is read repeatedly across one or more nodes. Cache provides no benefit for workloads that access each object only once (for example, single-pass ETL jobs). In that case, mount EFC without caching enabled to use it purely as a POSIX file system interface to OSS.
| Workload pattern | Recommended configuration |
|---|---|
| AI model training — large datasets read repeatedly | Single-node or distributed cache |
| AI model inference — many nodes loading the same model files | Distributed cache cluster |
| Single-pass data processing (ETL, one-time reads) | Basic mount (no cache) |
| Lightweight nodes that should access the cluster cache but not store any | Agent mode |
How it works
EFC intercepts file reads at the mount point and follows this flow:
Cache lookup: EFC checks local memory or disk cache first, then queries peer nodes over the P2P network.
Cache hit: Data found in cache is returned immediately over the private network—no OSS request needed.
Cache miss (origin fetch): EFC retrieves the data from OSS, stores it in local cache, then returns it to the application. Subsequent reads hit the cache.
This approach lets aggregate throughput scale linearly with node count. A single node can achieve up to 15 GB/s throughput and 200,000 IOPS; performance scales with the number of nodes, subject to actual network bandwidth.
Key advantages
Compute-side cache: Memory and disk on compute nodes act as a multi-tiered read cache, with data prefetch and LRU (Least Recently Used) eviction.
P2P distributed acceleration: Nodes share cached data over a P2P network, supporting clusters of hundreds to thousands of nodes. Aggregate throughput scales linearly with node count.
POSIX-compatible interface: Applications access OSS through standard POSIX calls (
open,read,readdir) without modification.
Limits
Runtime environment
| Requirement | |
|---|---|
| Platform | Platform for AI (Lingjun resources), Lingjun bare metal |
| Operating system | Alibaba Cloud Linux 3 (kernel 5.10.134-13+), Ubuntu 24.04 (kernel 6.8.0-79+) |
Hardware resource requirements
Plan memory and disk before deployment based on whether caching is enabled.
Caching disabled:
Memory: Resident usage is typically under 1 GiB. Reserve at least 5 GiB to handle burst loads.
Disk: No special requirement beyond space for operational logs.
Caching enabled:
Resource usage includes cache media plus index overhead.
Memory:
Base usage: same as caching-disabled mode.
Memory cache: consumes memory equal to the configured cache size.
Index overhead: approximately 0.1% of total cache capacity, regardless of cache type.
Formula:
Total memory ≈ Base runtime memory + Memory cache size + (Memory cache size + Disk cache size) × 0.001
Disk: Disk cache consumes space equal to the configured size (1 TiB configured = 1 TiB used).
Feature limits
Access mode: Read-only. After mounting, all files are owned by
root.chmodandchownare not supported.POSIX operations: Supported:
open,close,read,readdir,readdirplus,lookup,getattr. Not supported:readlink,write,rename,setattr,link,symlink.Storage class: Supports Standard and Infrequent Access only. Archive, Cold Archive, and Deep Cold Archive objects are not accessible.
Path restrictions:
If both
a/b(file) anda/b/(directory) exist in OSS, only the directorya/b/is accessible after mounting.Objects whose keys start with
/, contain consecutive//, or include.or..are not accessible.
Required OSS permissions:
oss:GetBucketStat,oss:ListObjects, andoss:GetObject.Cache availability: EFC is a read-only cache and does not guarantee data high availability. Cached data can be lost on hardware failure, node replacement, or decommissioning, causing a cache miss and an origin fetch on the next access.
Deploy and mount EFC
The deployment process is progressive: start with a basic mount to verify connectivity, then enable caching and distributed clustering as needed.
Step 1: Install the EFC client
Download and install two packages on each compute node.
Download the packages:
# alinas-utils package wget https://aliyun-alinas-eac-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/cache/aliyun-alinas-utils.amd64.rpm # EFC client wget https://aliyun-alinas-eac-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/cache/alinas-efc-latest.amd64.rpmInstall the packages:
sudo rpm -ivh aliyun-alinas-utils.amd64.rpm sudo rpm -ivh alinas-efc-latest.amd64.rpm
After installation, the EFC service registers automatically but does not start. Complete credential configuration before mounting.
Step 2: Configure access credentials
EFC supports two authentication methods. Use Security Token Service (STS) temporary credentials to avoid exposing long-term AccessKeys.
Set credential file permissions to 600 so only root can read them. Store them securely to prevent credential leaks.
STS temporary credentials (recommended)
# Create the STS credential file
cat > /etc/passwd-sts << EOF
{
"SecurityToken": "YourSecurityToken",
"AccessKeyId": "YourSTSAccessKeyId",
"AccessKeySecret": "YourSTSAccessKeySecret",
"Expiration": "YourExpiration"
}
EOF
# Restrict access to root only
chmod 600 /etc/passwd-stsEFC automatically reloads
/etc/passwd-stswhen credentials are updated—no remount required.Manage credential validity yourself. Update the file before expiration.
Expirationformat:2025-12-11T08:37:51Z(UTC).
AccessKey
# Create the password file
echo "YourAccessKeyId:YourAccessKeySecret" > /etc/passwd-oss
# Restrict access to root only
chmod 600 /etc/passwd-ossStep 3: Mount the OSS bucket
Choose the configuration that matches your workload. Start with the basic mount to verify access, then add caching.
Configuration 1: Basic mount (no cache)
Use this when you want to verify EFC can access your OSS bucket, or when each object is read only once and caching provides no benefit.
# Create the mount point
mkdir -p /mnt/oss_data
# Mount using STS credentials (recommended)
mount -t alinas \
-o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data
# Mount using AccessKey
mount -t alinas \
-o efc,protocol=oss,passwd_file=/etc/passwd-oss \
your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data| Parameter | Description |
|---|---|
efc,protocol=oss | Required. Identifies EFC mounting with OSS as the backend. |
g_oss_STSFile=/etc/passwd-sts | Path to the STS credential file. |
passwd_file=/etc/passwd-oss | Path to the AccessKey password file. |
your-bucket.oss-cn-hangzhou.aliyuncs.com:/ | Your bucket domain and path to mount. |
Verify the mount:
# Check that the mount succeeded
df -h | grep /mnt/oss_data
# List files in the bucket
ls /mnt/oss_dataIf your bucket files appear, the basic mount is working. Proceed to enable caching as needed.
Configuration 2: Single-node cache
Use this for single-node AI training jobs or quick performance testing on one machine.
If you completed a basic mount, unmount first: umount /mnt/oss_data# Create the disk cache directory
mkdir -p /mnt/cache/
# Mount with single-node cache (using STS credentials)
mount -t alinas \
-o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
-o g_tier_EnableClusterCache=true \
-o g_tier_DadiIsDistributed=false \
-o g_tier_DadiMemCacheCapacityMB=1024 \
-o g_tier_DadiDiskCacheCapacityMB=10240 \
-o g_tier_DadiDiskCachePath=/mnt/cache/ \
-o g_server_Port=17871 \
your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data| Parameter | Description |
|---|---|
g_tier_EnableClusterCache=true | Enables EFC cache acceleration. |
g_tier_DadiIsDistributed=false | Single-node mode — no communication with other nodes. |
g_tier_DadiMemCacheCapacityMB=1024 | Allocates 1 GiB of memory for cache. |
g_tier_DadiDiskCacheCapacityMB=10240 | Allocates 10 GiB of disk for cache. |
g_tier_DadiDiskCachePath=/mnt/cache/ | Directory for disk cache files. |
g_server_Port=17871 | Enables the HTTP management interface for prefetch and monitoring. |
Configuration 3: Distributed cache cluster
Use this for large-scale AI training or inference across multiple nodes. The P2P network distributes cached data across the cluster, so no single node or OSS bucket becomes a bottleneck.
Only open P2P port 17980 between cluster nodes. Strictly restrict access to management port 17871.
1. Configure the cluster node list
Create /etc/efc/rootlist on all nodes with the same content:
# Format: <unique-id>:<node-private-IP>:<P2P-port>
1:192.168.1.1:17980
2:192.168.1.2:17980
3:192.168.1.3:17980The rootlist file must be identical across all nodes. Inconsistencies prevent some nodes from joining the P2P network and reduce cache hit rates.
2. Mount on all nodes
# Create the disk cache directory
mkdir -p /mnt/cache/
# Mount with distributed cache (using STS credentials)
mount -t alinas \
-o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
-o g_tier_EnableClusterCache=true \
-o g_tier_DadiIsDistributed=true \
-o g_tier_DadiAddr=/etc/efc/rootlist \
-o g_tier_DadiMemCacheCapacityMB=1024 \
-o g_tier_DadiDiskCacheCapacityMB=10240 \
-o g_tier_DadiDiskCachePath=/mnt/cache/ \
-o g_server_Port=17871 \
your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_dataParameters beyond single-node mode:
| Parameter | Description |
|---|---|
g_tier_DadiIsDistributed=true | Enables distributed P2P cache across the cluster. |
g_tier_DadiAddr=/etc/efc/rootlist | Path to the cluster node list file. |
Configuration 4: Agent mode
Use this for lightweight nodes that should benefit from the cluster cache without storing any data locally (for example, nodes with limited disk).
mount -t alinas \
-o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
-o g_tier_EnableClusterCache=true \
-o g_tier_DadiIsDistributed=true \
-o g_tier_DadiRootClientType=2 \
-o g_tier_DadiAddr=/etc/efc/rootlist \
-o g_server_Port=17871 \
your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_datag_tier_DadiRootClientType controls the node's role in the cluster:
| Value | Mode | Behavior |
|---|---|---|
0 (default) | Root/Agent | Provides cache to others and accesses the cluster cache |
1 | Root | Provides cache only |
2 | Agent | Accesses the cluster cache; stores no data locally |
Cache operations
After mounting, manage the cache through the HTTP interface or CLI. Both require the management port to be enabled during mounting (-o g_server_Port=17871).
Prefetch data
Prefetching loads data from OSS into cache before it is read, eliminating first-access latency. This is useful for model inference deployments, where model files should be warm before traffic arrives.
Prefetch paths must be relative paths under the mount point. For example, use
model/weights.bin, not/mnt/oss_data/model/weights.bin.Task statuses:
running,completed,canceled,failed.Completed, canceled, and failed tasks are auto-evicted when total historical tasks exceed 10,000 or task age exceeds 7 days. Running tasks can only be stopped with the
cancelcommand.
HTTP interface
Initiate prefetch:
curl -s "http://localhost:17871/v1/warmup/load?target_path=file100G"Sample response:
{
"ErrorCode": 0,
"ErrorMessage": "Request processed",
"Results": [{
"ErrorCode": 0,
"ErrorMessage": "The warm up (file100G) is processing in the background. Use the 'stat' command to get status.",
"Location": "127.0.0.1:17871",
"Path": "file100G"
}]
}Check prefetch status:
curl -s "http://localhost:17871/v1/warmup/stat?target_path=file100G"Sample response:
{
"ErrorCode": 0,
"ErrorMessage": "Request processed",
"Results": [{
"ErrorCode": 0,
"ErrorMessage": "Successfully stat the warm up",
"Location": "127.0.0.1:17871",
"Path": "file100G",
"TaskInfos": [{
"CompletedSize": 13898874880,
"ErrorCode": 0,
"ErrorMessage": "",
"IsDir": false,
"Path": "file100G",
"Pattern": "",
"Status": "running",
"SubmitTime": 1765274023424073,
"TotalSize": 107374182400
}]
}]
}Cancel a running prefetch task:
curl -s "http://localhost:17871/v1/warmup/cancel?target_path=file100G"HTTP interface parameters:
| Parameter | Required | Description |
|---|---|---|
target_path | Yes | Relative path(s) to prefetch. Separate multiple paths with commas. |
pattern | No | Regular expression to filter matching filenames. |
preceding_time | No | Filter tasks created in the last N seconds (query only). Default: 86400 (1 day). |
CLI
Initiate prefetch:
/usr/bin/aliyun-alinas-efc-cli -m /mnt/oss_data -r warmup/load --path "testDir" --pattern ".*_2$"Check prefetch status:
/usr/bin/aliyun-alinas-efc-cli -m /mnt/oss_data -r warmup/stat --path "testDir"Cancel a running prefetch task:
/usr/bin/aliyun-alinas-efc-cli -m /mnt/oss_data -r warmup/cancel --path "testDir"Completed and failed tasks cannot be canceled. Canceled tasks cannot be resumed—submit a new prefetch task.
CLI parameters:
| Parameter | Required | Description |
|---|---|---|
-m | Yes | Mount point path. |
-r | Yes | Operation: warmup/load (prefetch), warmup/stat (query), warmup/cancel (cancel). |
--path | Yes | Relative path under the mount point. Pass "" for the root directory. Separate multiple paths with commas. |
--pattern | No | Regular expression to filter matching filenames. |
--preceding_time | No | Filter tasks from the last N seconds (query only). Default: 86400 (1 day). |
Monitor cache performance
Track cache hit rate and throughput to tune performance. Enable the management port during mounting to access these metrics.
View metrics:
# HTTP (returns JSON)
curl -s "http://localhost:17871/v1/tier"
# CLI (returns plain text)
aliyun-alinas-efc-cli -m <mount_point> -r tierReset metrics counters:
# HTTP
curl -s "http://localhost:17871/v1/tier/clear"
# CLI
aliyun-alinas-efc-cli -m <mount_point> -r tier -cThe following metrics describe possible read paths when client A accesses the cache provided by client B:
| Metric | Description | Path |
|---|---|---|
tier_read | Total read requests through the cache path | i |
tier_read_bytes | Total data volume through the cache path | i |
tier_read_hit | Requests that hit the distributed cache | ii |
tier_read_hit_bytes | Data volume read from the distributed cache (may include read amplification) | ii |
tier_read_miss | Requests that missed the cache and triggered an origin fetch | iii |
tier_read_miss_bytes | Data volume that missed the cache | iii |
tier_direct_read | Requests that failed to read from a distributed node and fell back to OSS directly | v |
tier_direct_read_bytes | Data volume read directly from OSS as fallback | v |
tier_root_read_source | Requests where this node (acting as distributed node) missed the cache and fetched from OSS | iv |
tier_root_read_source_bytes | Data volume fetched from OSS by this node as distributed node | iv |
Scale the cluster
Add or remove nodes online
Adjust the cluster without service interruption:
Scale out: Add the new node's IP to the
rootlistfile on all existing nodes.Scale in: Remove the target node's IP from the
rootlistfile on all nodes.
EFC reloads rootlist every 5 seconds and applies changes automatically.
Adjust single-node cache capacity
EFC does not support dynamic cache capacity changes. To resize, unmount and remount during off-peak hours.
Use these parameters during remounting:
| Parameter | Unit | Default | Description |
|---|---|---|---|
g_tier_DadiMemCacheCapacityMB | MB | 0 | Memory cache capacity per node |
g_tier_DadiDiskCacheCapacityMB | MB | 0 | Disk cache capacity per node |
Update mount parameters without remounting
Use the Python runtime tool to update mount parameters without manually unmounting. The tool updates EFC's state file and restarts the EFC process with the new parameters.
Download the tool:
wget https://aliyun-alinas-eac-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/cache/efc_runtime_ops.latest.pyUpdate parameters:
python3 efc_runtime_ops.py update /mnt/oss_data \
-o g_tier_EnableClusterCache=true,g_tier_DadiP2PPort=23456 \
-o g_tier_BlockSize=1048576Roll back if the update causes issues:
The tool automatically backs up the previous state file. Restore it with:
python3 efc_runtime_ops.py rollback /mnt/oss_dataManage the directory cache
EFC caches directory listings by default to accelerate ls operations. If OSS directory structure changes are not reflected immediately, clear the directory cache manually.
Directory cache parameters (set at mount time):
| Parameter | Default | Description |
|---|---|---|
g_readdircache_Enable | true | Enable readdir cache |
g_readdircache_MaxCount | 20000 | Maximum number of entries |
g_readdircache_MaxSize | 100 MiB | Memory allocated to readdir cache (bytes) |
g_readdircache_RemainTime | 3600 | Cache validity period (seconds). Default: 1 hour. A longer TTL improves ls performance but delays visibility of new objects. |
Clear the directory cache:
# HTTP
curl -s "http://localhost:17871/v1/action/clear_readdir_cache"
# CLI
aliyun-alinas-efc-cli -m <mount_point> -r action -k clear_readdir_cacheData consistency
EFC is a read-only cache. Changes made to OSS are not synchronized to the cache in real time—applications must tolerate some data lag (eventual consistency).
EFC uses close-to-open semantics by default: reopening a file reflects the latest OSS content. However, directory listings (ls) may lag due to the readdir cache.
| Configuration | File update visibility | New file visibility |
|---|---|---|
| Metadata cache disabled (default) | Reopen the file to see changes | Clear readdir cache |
| Metadata cache enabled | Clear metadata cache, then reopen | Clear both readdir and metadata cache |
To immediately see OSS changes, clear the relevant caches:
# 1. Clear directory cache (required when new files are not visible)
curl -s "http://localhost:17871/v1/action/clear_readdir_cache"
# 2. Clear metadata cache (only needed when metadata cache is enabled)
curl -s "http://localhost:17871/v1/action/clear_metadata_cache"For workloads that update OSS frequently and require immediate reads, build cache-clearing calls into your application's update logic.
Mount parameter reference
Basic parameters
| Parameter | Default | Required | Description |
|---|---|---|---|
efc | — | Yes | Required to identify EFC mounting. |
protocol | — | Yes | Backend storage protocol. Fixed as oss. |
passwd_file | — | Required if not using g_oss_STSFile | Absolute path to the AccessKey password file (AccessKeyId:AccessKeySecret). |
g_oss_STSFile | — | Required if not using passwd_file | Absolute path to the STS credential file. Supports live updates. |
trybind | yes | No | Attempt bind mount. Set to no when using mirroring-based back-to-origin. |
Cache configuration parameters
| Parameter | Default | Description |
|---|---|---|
g_tier_EnableClusterCache | true | Enable EFC cache acceleration. Set to false for protocol-translation-only mounting. |
g_tier_DadiIsDistributed | true | true = distributed mode; false = single-node mode. |
g_tier_DadiAddr | — | Absolute path to the rootlist file. Required in distributed mode. |
g_tier_DadiRootClientType | 0 | Node role. 0 = Root/Agent; 1 = Root (cache provider only); 2 = Agent (no local storage). |
g_tier_DadiMemCacheCapacityMB | 0 | Memory cache capacity (MB). 0 disables memory cache. |
g_tier_DadiDiskCacheCapacityMB | 0 | Disk cache capacity per directory (MB). 0 disables disk cache. |
g_tier_DadiDiskCachePath | — | Absolute path to the disk cache directory (must end with /). For multiple directories, separate with colons (e.g., /mnt/cache1/:/mnt/cache2/). Total capacity = per-directory capacity x number of directories. |
g_tier_DadiP2PPort | 17980 | P2P communication port. Keep default and allow traffic in security groups. |
g_tier_DadiSdkGetTrafficLimit | 3 GB/s | Read cache throughput limit (bytes/s). |
g_tier_DadiCachePageSizeKB | 16 | Cache page size (KB). |
Metadata and directory cache parameters
| Parameter | Default | Description |
|---|---|---|
g_metadata_CacheEnable | false | Enable metadata cache. Reduces HEAD requests to OSS. Enable for workloads where object metadata rarely changes; OSS updates will not be immediately visible until the cache is cleared. |
g_readdircache_Enable | true | Enable readdir cache. |
g_readdircache_MaxCount | 20000 | Maximum number of entries in readdir cache. |
g_readdircache_MaxSize | 100 MiB | Memory size for readdir cache (bytes). |
g_readdircache_RemainTime | 3600 | Readdir cache validity period (seconds). Default: 1 hour. A longer TTL improves performance but delays visibility of new objects in ls. |
Operations parameters
| Parameter | Default | Description |
|---|---|---|
g_server_Port | 0 | Cache management port. Set to a non-zero value (for example, 17871) to enable the HTTP management interface for prefetch and monitoring. |
Best practices
AI model training
AI training jobs typically do repeated, sequential reads of large datasets, often with many small files.
Configure a disk cache large enough to hold the entire hot dataset—this prevents repeated origin fetches across epochs.
Run data prefetch on your training data before starting the job to front-load the initial read latency.
AI model inference
Inference workloads have many compute nodes simultaneously loading the same model files.
Always use distributed cluster mode so the P2P network handles model file distribution. This avoids concentrating origin fetch traffic on OSS or a single node.
Prefetch model files when deploying or updating a service, before traffic arrives.
Security
Set credential file permissions to
600.Use STS temporary credentials rather than long-term AccessKeys.
Open P2P port
17980only between cluster nodes. Restrict management port17871to trusted hosts.
Data consistency
Enable metadata cache (
g_metadata_CacheEnable=true) only for workloads where object metadata rarely changes, such as read-only model repositories. Understand that OSS updates will not be visible until the cache is cleared.For workloads that update OSS frequently and require immediate visibility, integrate cache-clearing calls into your update pipeline.
Billing
EFC client: Free during the invitational preview period.
OSS charges: Standard OSS billing applies to requests and outbound Internet traffic generated by EFC. Effective caching reduces origin requests and traffic, lowering overall OSS costs. See OSS billing overview.
FAQ
Can EFC detect OSS data updates immediately?
EFC's ability to reflect OSS updates depends on your metadata cache configuration. With metadata cache disabled (the default), reopening a file reflects the latest content immediately—but new files may not appear in ls right away because of the readdir cache. To see new files, clear the readdir cache:
curl -s "http://localhost:17871/v1/action/clear_readdir_cache"With metadata cache enabled, neither file updates nor new files are visible until you clear both caches:
curl -s "http://localhost:17871/v1/action/clear_readdir_cache"
curl -s "http://localhost:17871/v1/action/clear_metadata_cache"