All Products
Search
Document Center

Object Storage Service:Accelerate OSS access by using EFC

Last Updated:Mar 20, 2026

Elastic File Client (EFC) is a compute-side data access acceleration solution. It mounts an OSS bucket as a local file system and uses compute node memory and disk to build a high-speed, distributed, read-only cache. Instead of each node fetching the same data from OSS, EFC aggregates those requests into a single origin fetch and distributes the cached result over a peer-to-peer (P2P) private network—reducing latency and increasing throughput for AI training and inference workloads.

The EFC caching feature is currently in invitational preview and is free to use during this period.

When to use EFC caching

EFC caching improves performance when the same data is read repeatedly across one or more nodes. Cache provides no benefit for workloads that access each object only once (for example, single-pass ETL jobs). In that case, mount EFC without caching enabled to use it purely as a POSIX file system interface to OSS.

Workload patternRecommended configuration
AI model training — large datasets read repeatedlySingle-node or distributed cache
AI model inference — many nodes loading the same model filesDistributed cache cluster
Single-pass data processing (ETL, one-time reads)Basic mount (no cache)
Lightweight nodes that should access the cluster cache but not store anyAgent mode

How it works

EFC intercepts file reads at the mount point and follows this flow:

  1. Cache lookup: EFC checks local memory or disk cache first, then queries peer nodes over the P2P network.

  2. Cache hit: Data found in cache is returned immediately over the private network—no OSS request needed.

  3. Cache miss (origin fetch): EFC retrieves the data from OSS, stores it in local cache, then returns it to the application. Subsequent reads hit the cache.

This approach lets aggregate throughput scale linearly with node count. A single node can achieve up to 15 GB/s throughput and 200,000 IOPS; performance scales with the number of nodes, subject to actual network bandwidth.

Key advantages

  • Compute-side cache: Memory and disk on compute nodes act as a multi-tiered read cache, with data prefetch and LRU (Least Recently Used) eviction.

  • P2P distributed acceleration: Nodes share cached data over a P2P network, supporting clusters of hundreds to thousands of nodes. Aggregate throughput scales linearly with node count.

  • POSIX-compatible interface: Applications access OSS through standard POSIX calls (open, read, readdir) without modification.

Limits

Runtime environment

Requirement
PlatformPlatform for AI (Lingjun resources), Lingjun bare metal
Operating systemAlibaba Cloud Linux 3 (kernel 5.10.134-13+), Ubuntu 24.04 (kernel 6.8.0-79+)

Hardware resource requirements

Plan memory and disk before deployment based on whether caching is enabled.

Caching disabled:

  • Memory: Resident usage is typically under 1 GiB. Reserve at least 5 GiB to handle burst loads.

  • Disk: No special requirement beyond space for operational logs.

Caching enabled:

Resource usage includes cache media plus index overhead.

  • Memory:

    • Base usage: same as caching-disabled mode.

    • Memory cache: consumes memory equal to the configured cache size.

    • Index overhead: approximately 0.1% of total cache capacity, regardless of cache type.

    • Formula: Total memory ≈ Base runtime memory + Memory cache size + (Memory cache size + Disk cache size) × 0.001

  • Disk: Disk cache consumes space equal to the configured size (1 TiB configured = 1 TiB used).

Feature limits

  • Access mode: Read-only. After mounting, all files are owned by root. chmod and chown are not supported.

  • POSIX operations: Supported: open, close, read, readdir, readdirplus, lookup, getattr. Not supported: readlink, write, rename, setattr, link, symlink.

  • Storage class: Supports Standard and Infrequent Access only. Archive, Cold Archive, and Deep Cold Archive objects are not accessible.

  • Path restrictions:

    • If both a/b (file) and a/b/ (directory) exist in OSS, only the directory a/b/ is accessible after mounting.

    • Objects whose keys start with /, contain consecutive //, or include . or .. are not accessible.

  • Required OSS permissions: oss:GetBucketStat, oss:ListObjects, and oss:GetObject.

  • Cache availability: EFC is a read-only cache and does not guarantee data high availability. Cached data can be lost on hardware failure, node replacement, or decommissioning, causing a cache miss and an origin fetch on the next access.

Deploy and mount EFC

The deployment process is progressive: start with a basic mount to verify connectivity, then enable caching and distributed clustering as needed.

Step 1: Install the EFC client

Download and install two packages on each compute node.

  1. Download the packages:

       # alinas-utils package
       wget https://aliyun-alinas-eac-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/cache/aliyun-alinas-utils.amd64.rpm
    
       # EFC client
       wget https://aliyun-alinas-eac-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/cache/alinas-efc-latest.amd64.rpm
  2. Install the packages:

       sudo rpm -ivh aliyun-alinas-utils.amd64.rpm
       sudo rpm -ivh alinas-efc-latest.amd64.rpm

After installation, the EFC service registers automatically but does not start. Complete credential configuration before mounting.

Step 2: Configure access credentials

EFC supports two authentication methods. Use Security Token Service (STS) temporary credentials to avoid exposing long-term AccessKeys.

Important

Set credential file permissions to 600 so only root can read them. Store them securely to prevent credential leaks.

STS temporary credentials (recommended)

# Create the STS credential file
cat > /etc/passwd-sts << EOF
{
    "SecurityToken": "YourSecurityToken",
    "AccessKeyId": "YourSTSAccessKeyId",
    "AccessKeySecret": "YourSTSAccessKeySecret",
    "Expiration": "YourExpiration"
}
EOF

# Restrict access to root only
chmod 600 /etc/passwd-sts
  • EFC automatically reloads /etc/passwd-sts when credentials are updated—no remount required.

  • Manage credential validity yourself. Update the file before expiration.

  • Expiration format: 2025-12-11T08:37:51Z (UTC).

AccessKey

# Create the password file
echo "YourAccessKeyId:YourAccessKeySecret" > /etc/passwd-oss

# Restrict access to root only
chmod 600 /etc/passwd-oss

Step 3: Mount the OSS bucket

Choose the configuration that matches your workload. Start with the basic mount to verify access, then add caching.

Configuration 1: Basic mount (no cache)

Use this when you want to verify EFC can access your OSS bucket, or when each object is read only once and caching provides no benefit.

# Create the mount point
mkdir -p /mnt/oss_data

# Mount using STS credentials (recommended)
mount -t alinas \
  -o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
  your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data

# Mount using AccessKey
mount -t alinas \
  -o efc,protocol=oss,passwd_file=/etc/passwd-oss \
  your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data
ParameterDescription
efc,protocol=ossRequired. Identifies EFC mounting with OSS as the backend.
g_oss_STSFile=/etc/passwd-stsPath to the STS credential file.
passwd_file=/etc/passwd-ossPath to the AccessKey password file.
your-bucket.oss-cn-hangzhou.aliyuncs.com:/Your bucket domain and path to mount.

Verify the mount:

# Check that the mount succeeded
df -h | grep /mnt/oss_data

# List files in the bucket
ls /mnt/oss_data

If your bucket files appear, the basic mount is working. Proceed to enable caching as needed.

Configuration 2: Single-node cache

Use this for single-node AI training jobs or quick performance testing on one machine.

If you completed a basic mount, unmount first: umount /mnt/oss_data
# Create the disk cache directory
mkdir -p /mnt/cache/

# Mount with single-node cache (using STS credentials)
mount -t alinas \
  -o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
  -o g_tier_EnableClusterCache=true \
  -o g_tier_DadiIsDistributed=false \
  -o g_tier_DadiMemCacheCapacityMB=1024 \
  -o g_tier_DadiDiskCacheCapacityMB=10240 \
  -o g_tier_DadiDiskCachePath=/mnt/cache/ \
  -o g_server_Port=17871 \
  your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data
ParameterDescription
g_tier_EnableClusterCache=trueEnables EFC cache acceleration.
g_tier_DadiIsDistributed=falseSingle-node mode — no communication with other nodes.
g_tier_DadiMemCacheCapacityMB=1024Allocates 1 GiB of memory for cache.
g_tier_DadiDiskCacheCapacityMB=10240Allocates 10 GiB of disk for cache.
g_tier_DadiDiskCachePath=/mnt/cache/Directory for disk cache files.
g_server_Port=17871Enables the HTTP management interface for prefetch and monitoring.

Configuration 3: Distributed cache cluster

Use this for large-scale AI training or inference across multiple nodes. The P2P network distributes cached data across the cluster, so no single node or OSS bucket becomes a bottleneck.

Important

Only open P2P port 17980 between cluster nodes. Strictly restrict access to management port 17871.

1. Configure the cluster node list

Create /etc/efc/rootlist on all nodes with the same content:

# Format: <unique-id>:<node-private-IP>:<P2P-port>
1:192.168.1.1:17980
2:192.168.1.2:17980
3:192.168.1.3:17980
Warning

The rootlist file must be identical across all nodes. Inconsistencies prevent some nodes from joining the P2P network and reduce cache hit rates.

2. Mount on all nodes

# Create the disk cache directory
mkdir -p /mnt/cache/

# Mount with distributed cache (using STS credentials)
mount -t alinas \
  -o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
  -o g_tier_EnableClusterCache=true \
  -o g_tier_DadiIsDistributed=true \
  -o g_tier_DadiAddr=/etc/efc/rootlist \
  -o g_tier_DadiMemCacheCapacityMB=1024 \
  -o g_tier_DadiDiskCacheCapacityMB=10240 \
  -o g_tier_DadiDiskCachePath=/mnt/cache/ \
  -o g_server_Port=17871 \
  your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data

Parameters beyond single-node mode:

ParameterDescription
g_tier_DadiIsDistributed=trueEnables distributed P2P cache across the cluster.
g_tier_DadiAddr=/etc/efc/rootlistPath to the cluster node list file.

Configuration 4: Agent mode

Use this for lightweight nodes that should benefit from the cluster cache without storing any data locally (for example, nodes with limited disk).

mount -t alinas \
  -o efc,protocol=oss,g_oss_STSFile=/etc/passwd-sts \
  -o g_tier_EnableClusterCache=true \
  -o g_tier_DadiIsDistributed=true \
  -o g_tier_DadiRootClientType=2 \
  -o g_tier_DadiAddr=/etc/efc/rootlist \
  -o g_server_Port=17871 \
  your-bucket.oss-cn-hangzhou.aliyuncs.com:/ /mnt/oss_data

g_tier_DadiRootClientType controls the node's role in the cluster:

ValueModeBehavior
0 (default)Root/AgentProvides cache to others and accesses the cluster cache
1RootProvides cache only
2AgentAccesses the cluster cache; stores no data locally

Cache operations

After mounting, manage the cache through the HTTP interface or CLI. Both require the management port to be enabled during mounting (-o g_server_Port=17871).

Prefetch data

Prefetching loads data from OSS into cache before it is read, eliminating first-access latency. This is useful for model inference deployments, where model files should be warm before traffic arrives.

  • Prefetch paths must be relative paths under the mount point. For example, use model/weights.bin, not /mnt/oss_data/model/weights.bin.

  • Task statuses: running, completed, canceled, failed.

  • Completed, canceled, and failed tasks are auto-evicted when total historical tasks exceed 10,000 or task age exceeds 7 days. Running tasks can only be stopped with the cancel command.

HTTP interface

Initiate prefetch:

curl -s "http://localhost:17871/v1/warmup/load?target_path=file100G"

Sample response:

{
    "ErrorCode": 0,
    "ErrorMessage": "Request processed",
    "Results": [{
        "ErrorCode": 0,
        "ErrorMessage": "The warm up (file100G) is processing in the background. Use the 'stat' command to get status.",
        "Location": "127.0.0.1:17871",
        "Path": "file100G"
    }]
}

Check prefetch status:

curl -s "http://localhost:17871/v1/warmup/stat?target_path=file100G"

Sample response:

{
    "ErrorCode": 0,
    "ErrorMessage": "Request processed",
    "Results": [{
        "ErrorCode": 0,
        "ErrorMessage": "Successfully stat the warm up",
        "Location": "127.0.0.1:17871",
        "Path": "file100G",
        "TaskInfos": [{
            "CompletedSize": 13898874880,
            "ErrorCode": 0,
            "ErrorMessage": "",
            "IsDir": false,
            "Path": "file100G",
            "Pattern": "",
            "Status": "running",
            "SubmitTime": 1765274023424073,
            "TotalSize": 107374182400
        }]
    }]
}

Cancel a running prefetch task:

curl -s "http://localhost:17871/v1/warmup/cancel?target_path=file100G"

HTTP interface parameters:

ParameterRequiredDescription
target_pathYesRelative path(s) to prefetch. Separate multiple paths with commas.
patternNoRegular expression to filter matching filenames.
preceding_timeNoFilter tasks created in the last N seconds (query only). Default: 86400 (1 day).

CLI

Initiate prefetch:

/usr/bin/aliyun-alinas-efc-cli -m /mnt/oss_data -r warmup/load --path "testDir" --pattern ".*_2$"

Check prefetch status:

/usr/bin/aliyun-alinas-efc-cli -m /mnt/oss_data -r warmup/stat --path "testDir"

Cancel a running prefetch task:

/usr/bin/aliyun-alinas-efc-cli -m /mnt/oss_data -r warmup/cancel --path "testDir"
Completed and failed tasks cannot be canceled. Canceled tasks cannot be resumed—submit a new prefetch task.

CLI parameters:

ParameterRequiredDescription
-mYesMount point path.
-rYesOperation: warmup/load (prefetch), warmup/stat (query), warmup/cancel (cancel).
--pathYesRelative path under the mount point. Pass "" for the root directory. Separate multiple paths with commas.
--patternNoRegular expression to filter matching filenames.
--preceding_timeNoFilter tasks from the last N seconds (query only). Default: 86400 (1 day).

Monitor cache performance

Track cache hit rate and throughput to tune performance. Enable the management port during mounting to access these metrics.

View metrics:

# HTTP (returns JSON)
curl -s "http://localhost:17871/v1/tier"

# CLI (returns plain text)
aliyun-alinas-efc-cli -m <mount_point> -r tier

Reset metrics counters:

# HTTP
curl -s "http://localhost:17871/v1/tier/clear"

# CLI
aliyun-alinas-efc-cli -m <mount_point> -r tier -c

The following metrics describe possible read paths when client A accesses the cache provided by client B:

image
MetricDescriptionPath
tier_readTotal read requests through the cache pathi
tier_read_bytesTotal data volume through the cache pathi
tier_read_hitRequests that hit the distributed cacheii
tier_read_hit_bytesData volume read from the distributed cache (may include read amplification)ii
tier_read_missRequests that missed the cache and triggered an origin fetchiii
tier_read_miss_bytesData volume that missed the cacheiii
tier_direct_readRequests that failed to read from a distributed node and fell back to OSS directlyv
tier_direct_read_bytesData volume read directly from OSS as fallbackv
tier_root_read_sourceRequests where this node (acting as distributed node) missed the cache and fetched from OSSiv
tier_root_read_source_bytesData volume fetched from OSS by this node as distributed nodeiv

Scale the cluster

Add or remove nodes online

Adjust the cluster without service interruption:

  • Scale out: Add the new node's IP to the rootlist file on all existing nodes.

  • Scale in: Remove the target node's IP from the rootlist file on all nodes.

EFC reloads rootlist every 5 seconds and applies changes automatically.

Adjust single-node cache capacity

Important

EFC does not support dynamic cache capacity changes. To resize, unmount and remount during off-peak hours.

Use these parameters during remounting:

ParameterUnitDefaultDescription
g_tier_DadiMemCacheCapacityMBMB0Memory cache capacity per node
g_tier_DadiDiskCacheCapacityMBMB0Disk cache capacity per node

Update mount parameters without remounting

Use the Python runtime tool to update mount parameters without manually unmounting. The tool updates EFC's state file and restarts the EFC process with the new parameters.

Download the tool:

wget https://aliyun-alinas-eac-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/cache/efc_runtime_ops.latest.py

Update parameters:

python3 efc_runtime_ops.py update /mnt/oss_data \
  -o g_tier_EnableClusterCache=true,g_tier_DadiP2PPort=23456 \
  -o g_tier_BlockSize=1048576

Roll back if the update causes issues:

The tool automatically backs up the previous state file. Restore it with:

python3 efc_runtime_ops.py rollback /mnt/oss_data

Manage the directory cache

EFC caches directory listings by default to accelerate ls operations. If OSS directory structure changes are not reflected immediately, clear the directory cache manually.

Directory cache parameters (set at mount time):

ParameterDefaultDescription
g_readdircache_EnabletrueEnable readdir cache
g_readdircache_MaxCount20000Maximum number of entries
g_readdircache_MaxSize100 MiBMemory allocated to readdir cache (bytes)
g_readdircache_RemainTime3600Cache validity period (seconds). Default: 1 hour. A longer TTL improves ls performance but delays visibility of new objects.

Clear the directory cache:

# HTTP
curl -s "http://localhost:17871/v1/action/clear_readdir_cache"

# CLI
aliyun-alinas-efc-cli -m <mount_point> -r action -k clear_readdir_cache

Data consistency

EFC is a read-only cache. Changes made to OSS are not synchronized to the cache in real time—applications must tolerate some data lag (eventual consistency).

EFC uses close-to-open semantics by default: reopening a file reflects the latest OSS content. However, directory listings (ls) may lag due to the readdir cache.

ConfigurationFile update visibilityNew file visibility
Metadata cache disabled (default)Reopen the file to see changesClear readdir cache
Metadata cache enabledClear metadata cache, then reopenClear both readdir and metadata cache

To immediately see OSS changes, clear the relevant caches:

# 1. Clear directory cache (required when new files are not visible)
curl -s "http://localhost:17871/v1/action/clear_readdir_cache"

# 2. Clear metadata cache (only needed when metadata cache is enabled)
curl -s "http://localhost:17871/v1/action/clear_metadata_cache"

For workloads that update OSS frequently and require immediate reads, build cache-clearing calls into your application's update logic.

Mount parameter reference

Basic parameters

ParameterDefaultRequiredDescription
efcYesRequired to identify EFC mounting.
protocolYesBackend storage protocol. Fixed as oss.
passwd_fileRequired if not using g_oss_STSFileAbsolute path to the AccessKey password file (AccessKeyId:AccessKeySecret).
g_oss_STSFileRequired if not using passwd_fileAbsolute path to the STS credential file. Supports live updates.
trybindyesNoAttempt bind mount. Set to no when using mirroring-based back-to-origin.

Cache configuration parameters

ParameterDefaultDescription
g_tier_EnableClusterCachetrueEnable EFC cache acceleration. Set to false for protocol-translation-only mounting.
g_tier_DadiIsDistributedtruetrue = distributed mode; false = single-node mode.
g_tier_DadiAddrAbsolute path to the rootlist file. Required in distributed mode.
g_tier_DadiRootClientType0Node role. 0 = Root/Agent; 1 = Root (cache provider only); 2 = Agent (no local storage).
g_tier_DadiMemCacheCapacityMB0Memory cache capacity (MB). 0 disables memory cache.
g_tier_DadiDiskCacheCapacityMB0Disk cache capacity per directory (MB). 0 disables disk cache.
g_tier_DadiDiskCachePathAbsolute path to the disk cache directory (must end with /). For multiple directories, separate with colons (e.g., /mnt/cache1/:/mnt/cache2/). Total capacity = per-directory capacity x number of directories.
g_tier_DadiP2PPort17980P2P communication port. Keep default and allow traffic in security groups.
g_tier_DadiSdkGetTrafficLimit3 GB/sRead cache throughput limit (bytes/s).
g_tier_DadiCachePageSizeKB16Cache page size (KB).

Metadata and directory cache parameters

ParameterDefaultDescription
g_metadata_CacheEnablefalseEnable metadata cache. Reduces HEAD requests to OSS. Enable for workloads where object metadata rarely changes; OSS updates will not be immediately visible until the cache is cleared.
g_readdircache_EnabletrueEnable readdir cache.
g_readdircache_MaxCount20000Maximum number of entries in readdir cache.
g_readdircache_MaxSize100 MiBMemory size for readdir cache (bytes).
g_readdircache_RemainTime3600Readdir cache validity period (seconds). Default: 1 hour. A longer TTL improves performance but delays visibility of new objects in ls.

Operations parameters

ParameterDefaultDescription
g_server_Port0Cache management port. Set to a non-zero value (for example, 17871) to enable the HTTP management interface for prefetch and monitoring.

Best practices

AI model training

AI training jobs typically do repeated, sequential reads of large datasets, often with many small files.

  • Configure a disk cache large enough to hold the entire hot dataset—this prevents repeated origin fetches across epochs.

  • Run data prefetch on your training data before starting the job to front-load the initial read latency.

AI model inference

Inference workloads have many compute nodes simultaneously loading the same model files.

  • Always use distributed cluster mode so the P2P network handles model file distribution. This avoids concentrating origin fetch traffic on OSS or a single node.

  • Prefetch model files when deploying or updating a service, before traffic arrives.

Security

  • Set credential file permissions to 600.

  • Use STS temporary credentials rather than long-term AccessKeys.

  • Open P2P port 17980 only between cluster nodes. Restrict management port 17871 to trusted hosts.

Data consistency

  • Enable metadata cache (g_metadata_CacheEnable=true) only for workloads where object metadata rarely changes, such as read-only model repositories. Understand that OSS updates will not be visible until the cache is cleared.

  • For workloads that update OSS frequently and require immediate visibility, integrate cache-clearing calls into your update pipeline.

Billing

  • EFC client: Free during the invitational preview period.

  • OSS charges: Standard OSS billing applies to requests and outbound Internet traffic generated by EFC. Effective caching reduces origin requests and traffic, lowering overall OSS costs. See OSS billing overview.

FAQ

Can EFC detect OSS data updates immediately?

EFC's ability to reflect OSS updates depends on your metadata cache configuration. With metadata cache disabled (the default), reopening a file reflects the latest content immediately—but new files may not appear in ls right away because of the readdir cache. To see new files, clear the readdir cache:

curl -s "http://localhost:17871/v1/action/clear_readdir_cache"

With metadata cache enabled, neither file updates nor new files are visible until you clear both caches:

curl -s "http://localhost:17871/v1/action/clear_readdir_cache"
curl -s "http://localhost:17871/v1/action/clear_metadata_cache"