All Products
Search
Document Center

Platform For AI:Enable memory caching for a local directory

Last Updated:Mar 11, 2026

Cache model files from OSS or NAS in memory to reduce inference latency and accelerate model switching.

How it works

EAS mounts model files to a local directory from OSS, NAS, or a Docker image. For more information, see Mount storage. Reading models, switching models, and scaling containers all require network bandwidth. In Stable Diffusion scenarios, inference requests frequently switch between base models and LoRA models, reading files from OSS or NAS and increasing network latency.

Memory caching solves this problem. The following diagram shows the architecture.

image
  • Model files in a local directory are cached to memory.

  • Cache uses an LRU eviction policy and supports file sharing among instances. Cached files appear as a standard file system directory.

  • Service reads cached files directly from memory. No changes to business code are required.

  • Instances in a service share a P2P network. When scaling out the service, new instances read cached files from nearby instances over P2P, accelerating cluster scale-out.

Limitations

  • Cache directory is read-only to ensure data consistency.

  • To add a model file, place it in the source directory. The file becomes available in the cache directory automatically.

  • Do not modify or delete model files in the source directory. Doing so may cause dirty data in the cache.

Prerequisites

Ensure the following requirements are met:

  • PAI workspace with EAS access

  • Model files in OSS or NAS

  • Instance type with sufficient memory for caching (for example, ml.gu7i.c16m60.1-gu30)

Procedure

This section uses Stable Diffusion as an example with the following configuration:

  • Image startup command: ./webui.sh --listen --port 8000 --skip-version-check --no-hashing --no-download-sd-model --skip-prepare-environment --api --filebrowser

  • OSS directory for model files: oss://path/to/models/

  • Container directory for model files (source directory): /code/stable-diffusion-webui/data-slow

Source directory /code/stable-diffusion-webui/data-slow stores model files, which are then cached to /code/stable-diffusion-webui/data-fast. The service reads model files from the cache directory instead of the source directory.

PAI console

  1. Log on to the PAI console. Select a region in the top navigation bar. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).

  2. Click Deploy Service and select Custom Deployment in the Custom Model Deployment section.

  3. On the Custom Deployment page, configure the following parameters. For other parameters, see Custom deployment.

    Parameter Description Example
    Environment Information > Model Settings Select the OSS mount mode. Uri: oss://path/to/models/
    Mount Path: /code/stable-diffusion-webui/data-slow
    Command Startup parameter based on your image or code. For Stable Diffusion, add --ckpt-dir and set it to the cache directory. ./webui.sh --listen --port 8000 --skip-version-check --no-hashing --no-download-sd-model --skip-prepare-environment --api --filebrowser --ckpt-dir /code/stable-diffusion-webui/data-fast
    Features > Distributed cache acceleration Enable Memory Caching and configure the following parameters: Maximum Memory Usage: 20 GB
    Maximum Memory Usage: Maximum memory for cached files in GB. When exceeded, LRU policy evicts cached files. Source Path: /code/stable-diffusion-webui/data-slow
    Source Path: Source directory of cached files. Can be an OSS or NAS mount directory, subdirectory, or regular file directory. Mount Path: /code/stable-diffusion-webui/data-fast
    Mount Path: Cache directory from which the service reads files.
  4. Click Deploy.

JSON configuration file

Step 1: Create configuration file

Create a JSON configuration file with the following sample configuration:

{
    "cloud": {
        "computing": {
            "instances": [
                {
                    "type": "ml.gu7i.c16m60.1-gu30"
                }
            ]
        }
    },
    "containers": [
        {
            "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:4.2",
            "port": 8000,
            "script": "./webui.sh --listen --port 8000 --skip-version-check --no-hashing --no-download-sd-model --skip-prepare-environment --api --filebrowser --ckpt-dir /code/stable-diffusion-webui/data-fast"
        }
    ],
    "metadata": {
        "cpu": 16,
        "enable_webservice": true,
        "gpu": 1,
        "instance": 1,
        "memory": 60000,
        "name": "sdwebui_test"
    },
    "options": {
        "enable_cache": true
    },
    "storage": [
        {
            "cache": {
                "capacity": "20G",
                "path": "/code/stable-diffusion-webui/data-slow"
            },
            "mount_path": "/code/stable-diffusion-webui/data-fast"
        },
        {
            "mount_path": "/code/stable-diffusion-webui/data-slow",
            "oss": {
                "path": "oss://path/to/models/",
                "readOnly": false
            },
            "properties": {
                "resource_type": "model"
            }
        }
    ]
}

Cache-related parameters are described below. For other parameters, see Parameters of model services.

Parameter Description
script Startup command. Configure based on your image or code. For Stable Diffusion, add --ckpt-dir and set it to the cache directory.
cache.capacity Maximum memory for cached files. Unit: GB. When exceeded, LRU policy evicts cached files.
cache.path Source directory of cached files. Can be an OSS or NAS mount directory, subdirectory, or regular file directory.
mount_path (cache entry) Cache directory to which cached files are mounted. Files in this directory mirror the source directory. The service reads files from this directory.

Step 2: Deploy service

Deploy the service using one of the following methods.

PAI console

  1. Log on to the PAI console. Select a region in the top navigation bar. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).

  2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. On the Deploy Service page, click JSON Deployment in the Custom Model Deployment section. In the editor, paste the JSON configuration from Step 1.

  3. Click Deploy.

EASCMD client

  1. Download the EASCMD client and complete identity authentication. For more information, see Download the EASCMD client and complete identity authentication.

  2. Save the JSON configuration from Step 1 as test.json in the directory where the EASCMD client is installed.

  3. Run the following command. This example uses Windows 64-bit:

    eascmdwin64.exe create <test.json>

Performance

Model switching performance in a Stable Diffusion scenario (in seconds). Actual results vary depending on your environment.

Model Size OSS mount (s) Local memory hit (s) Remote memory hit (s)
anything-v4.5.safetensors 7.2 GB 89.88 3.845 15.18
Anything-v5.0-PRT-RE.safetensors 2.0 GB 16.73 2.967 5.46
cetusMix_Coda2.safetensors 3.6 GB 24.76 3.249 7.13
chilloutmix_NiPrunedFp32Fix.safetensors 4.0 GB 48.79 3.556 8.47
CounterfeitV30_v30.safetensors 4.0 GB 64.99 3.014 7.94
deliberate_v2.safetensors 2.0 GB 16.33 2.985 5.55
DreamShaper_6_NoVae.safetensors 5.6 GB 71.78 3.416 10.17
pastelmix-fp32.ckpt 4.0 GB 43.88 4.959 9.23
revAnimated_v122.safetensors 4.0 GB 69.38 3.165 3.20
  • If no model files exist in memory cache, CacheFS reads model files from the source directory. For example, if files are mounted from an OSS bucket, CacheFS reads from that OSS bucket.

  • When a service has multiple instances, the instances share memory across the cluster. An instance can read model files directly from the memory of other instances. Read time varies based on file size.

  • When scaling out a service cluster, new instances read model files from the memory of existing instances during initialization, speeding up scale-out operations.