All Products
Search
Document Center

Object Storage Service:Accelerate model deployment with OSS Connector for AI/ML

Last Updated:Apr 02, 2026

OSS Connector for AI/ML provides a non-intrusive way to load models without requiring code changes. It uses LD_PRELOAD for high-performance, direct reads from OSS and supports prefetch and caching to significantly improve model loading speed. The connector is compatible with containers and mainstream inference frameworks.

High performance

It can achieve over 10 GB/s of throughput with sufficient bandwidth. For more information, see Performance testing.

The model broadcast feature lets you batch-start inference services for the same model. A single node loads the model from OSS, and the remaining nodes use local storage and network resources to distribute the model through a chain topology structure. This significantly reduces origin-pull pressure and improves startup efficiency for large-scale node deployments.

How it works

OSS Connector for AI/ML addresses the performance bottlenecks of loading large models from OSS in the cloud.

  • Traditional FUSE-based mount solutions often fail to fully utilize OSS's high bandwidth, resulting in slow model loading. OSS Connector for AI/ML improves data access efficiency by intercepting I/O requests from the inference framework and converting them directly into HTTP(s) requests to OSS.

  • It uses the LD_PRELOAD mechanism to prefetch and cache model data in memory, which requires no changes to your application code and significantly speeds up model loading.

Deployment environment

  • Operating system: Linux x86-64

  • glibc: >=2.17

Installation

  1. Download the complete installation package.

    • oss-connector-lib-1.2.0.x86_64.rpm: For Red Hat-based Linux distributions

      https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.2.0.x86_64.rpm
    • oss-connector-lib-1.2.0.x86_64.deb: For Debian-based Linux distributions

      https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.2.0.x86_64.deb
  2. Install the OSS connector.

    When you use an .rpm or .deb package to install OSS Connector, the dynamic library file libossc_preload.so is automatically installed to the /usr/local/lib/ directory.

    • Install oss-connector-lib-1.2.0.x86_64.rpm

      yum install -y oss-connector-lib-1.2.0.x86_64.rpm
    • Install oss-connector-lib-1.2.0.x86_64.deb

      dpkg -i oss-connector-lib-1.2.0.x86_64.deb
  3. After installation, verify that /usr/local/lib/libossc_preload.so exists and that the version is correct.

    nm -D /usr/local/lib/libossc_preload.so | grep version

Configuration

  • Configuration file

    The configuration file controls log output, cache policies, and prefetch concurrency. Properly configuring these parameters can improve system performance and maintainability.

    The path to the configuration file is /etc/oss-connector/config.json. The installation package includes a default configuration file. The configuration is as follows:

    {
        "logLevel": 1,
        "logPath": "/var/log/oss-connector/connector.log",
        "auditPath": "/var/log/oss-connector/audit.log",
        "expireTimeSec": 120,
        "prefetch": {
            "vcpus": 16,
            "workers": 16,
            "maxCacheAdviseGB": -1
        }
    }
    

    Parameter

    Description

    logLevel

    The log level. Controls the verbosity of the log output.

    logPath

    The log file path. Specifies the output location for runtime logs.

    auditPath

    The audit log file path. Records audit information for security and compliance tracking.

    expireTimeSec

    The delay in seconds before releasing cached files that are no longer referenced. Default: 120.

    prefetch.vcpus

    The number of vCPUs (virtual CPUs) for prefetching. Default: 16.

    prefetch.workers

    The number of coroutine workers per vCPU to increase concurrency. Default: 16.

    prefetch.maxCacheAdviseGB

    The size of the memory cache in GB that can be used for prefetching. Default: -1 (unlimited).

  • Configure environment variables

    Environment variable

    Description

    OSS_ACCESS_KEY_ID

    The AccessKey ID and AccessKey Secret of an Alibaba Cloud account or a RAM user.

    When you configure permissions with a temporary access credential, set these variables to the AccessKey ID and AccessKey Secret of that credential.

    OSS Connector for AI/ML requires the oss:ListObjects permission for the target bucket and directory. If the bucket and files you are accessing support anonymous access, you can leave the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables unset or set them to empty strings.

    OSS_ACCESS_KEY_SECRET

    OSS_SESSION_TOKEN

    The temporary access token. This is required when you use a temporary access credential from STS to access OSS.

    If you use the AccessKey ID and AccessKey Secret of an Alibaba Cloud account or RAM user for permission configuration, leave this field empty.

    OSS_ENDPOINT

    Specify the OSS service endpoint. An example value is http://oss-cn-beijing-internal.aliyuncs.com. If you do not specify a protocol, HTTPS is used by default. We recommend that you use the HTTP protocol in secure environments, such as an internal network, for better performance.

    OSS_REGION

    The OSS region ID, such as cn-beijing. If not specified, authentication might fail.

    OSS_PATH

    The OSS model directory is in the format oss://bucketname/path/. For example, oss://examplebucket/qwen/Qwen3-8B/.

    MODEL_DIR

    The local model directory for vllm or other inference frameworks. We recommend starting with an empty directory. Temporary data downloaded during use can be safely deleted afterward.

    Note
    • The MODEL_DIR path must match the model path used by the inference framework, such as the --model parameter for vllm or the --model-path parameter for sglang.

    • MODEL_DIR requires read and write permissions. The directory structure of MODEL_DIR must correspond to that of OSS_PATH.

    • During model loading, model files are prefetched and cached in memory. After loading, the cache is released with a delay, which defaults to 120 seconds. You can adjust this with the expireTimeSec parameter in the configuration file.

    • The local model directory should only be used for loading models with the connector; it is not valid for other purposes.

    • Do not create the local model directory on another OSS mount point, such as an ossfs mount point.

    LD_PRELOAD

    The path of the dynamic library to preload is typically /usr/local/lib/libossc_preload.so. We recommend that you use a temporary environment variable for configuration. For example, LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ./myapp

    ENABLE_CONNECTOR

    Sets the OSS Connector process role. Use a temporary environment variable to apply this setting.

    • ENABLE_CONNECTOR=1: Primary connector role.

    • ENABLE_CONNECTOR=2: Secondary connector role.

    Within a single running instance, only one process can have the primary connector role. We recommend assigning this role to the main process (for example, the entrypoint). Other processes that use the connector must be assigned the secondary connector role. For a usage example, see the ray+vllm example for multi-node startup.

    OSS_AUTHORIZATION_FILE_PATH

    The path to a JSON-formatted credential file.

    AccessKey ID and AccessKey Secret of an Alibaba Cloud account or RAM user:

    {
      "AccessKeyId": "LTAI************************",
      "AccessKeySecret": "At32************************"
    }

    Temporary access credential:

    {
      "AccessKeyId": "STS.L4aB******************",
      "AccessKeySecret": "wyLTSm*************************",
      "SecurityToken": "************",
      "Expiration": "2024-08-15T15:04:05Z"
    }
    Note

    This setting has a higher priority than the OSS_ACCESS_KEY_ID, OSS_ACCESS_KEY_SECRET, and OSS_SESSION_TOKEN environment variables.

    CONNECTOR_CONFIG_PATH

    You can modify the configuration file path by using an environment variable. Default value: /etc/oss-connector/config.json

    CONNECTOR_UDS_PATH

    You can set the Unix Domain Socket (UDS) file path by using an environment variable. Default value: /run/modelconnector.sock

    Note

    The primary and secondary connector processes communicate through UDS.

    CONNECTOR_MAX_CACHE_ADVISE_GB

    Sets the size of the memory cache in GB that can be used for prefetching.

    Note

    This has the same function as prefetch.maxCacheAdviseGB in the configuration file but has a higher priority.

Start model service

Single-node startup

Vllm API server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce

Sglang API server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 

Multi-model loading

When an inference task involves multiple models, such as Speculative Decoding, the OSS Connector supports loading multiple models from OSS simultaneously. Simply set OSS_PATH to the common parent path of all models and MODEL_DIR to the corresponding local parent directory.

The following example shows how to use Speculative Decoding with vllm to load the target model Qwen3-32B and the draft model Qwen3-0.6B simultaneously:

export OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID}
export OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET}
export OSS_ENDPOINT=${OSS_ENDPOINT}
export OSS_REGION=${OSS_REGION}
export OSS_PATH=oss://examplebucket/
export MODEL_DIR=/tmp/models

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 \
python3 -m vllm.entrypoints.openai.api_server \
    --model ${MODEL_DIR}/qwen/Qwen3-32B/ --trust-remote-code \
    --tensor-parallel-size 1 --disable-custom-all-reduce \
    --speculative_config '{"model": "'"${MODEL_DIR}/qwen/Qwen3-0___6B/"'", "num_speculative_tokens": 5}'
Note

There is a correspondence between OSS_PATH and MODEL_DIR. For example, if the target model path on OSS is oss://examplebucket/qwen/Qwen3-32B/ and the draft model path is oss://examplebucket/qwen/Qwen3-0___6B/, set OSS_PATH to their common parent path, oss://examplebucket/, and set MODEL_DIR to /tmp/models. The corresponding local paths for the target and draft models are /tmp/models/qwen/Qwen3-32B/ and /tmp/models/qwen/Qwen3-0___6B/, respectively.

Multi-node startup

In multi-node deployment scenarios, OSS Connector for AI/ML supports model broadcast. When model broadcast is enabled, only a single node loads model data from OSS. The remaining nodes distribute the model data through a chain topology structure, which avoids the high bandwidth consumption that occurs when multiple nodes pull from the origin simultaneously. For more information about model broadcast, see Model broadcast.

Ray and vllm

Common environment variables:

export OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID}
export OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET}
export OSS_ENDPOINT=${OSS_ENDPOINT}
export OSS_REGION=${OSS_REGION}
export OSS_PATH=oss://examplebucket/
export MODEL_DIR=/tmp/models
Important

The OSS_PATH and MODEL_DIR variables must correspond. For example, if the model path on OSS is oss://examplebucket/qwen/Qwen2___5-72B/, the local model directory is /tmp/models/qwen/Qwen2___5-72B/.

Start the ray head on Pod A:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --head --dashboard-host 0.0.0.0 --block

Start ray on Pod B and join the cluster:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --address='172.24.176.137:6379' --block     // Use the head pod's IP address, which is provided in the 'ray start' output from Pod A.

Start the vllm API Server:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=2 python3 -m vllm.entrypoints.openai.api_server --model ${MODEL_DIR}/qwen/Qwen2___5-72B/ --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.98 --tensor-parallel-size 32

Sglang

Configure environment variables for the sglang process on each node.

Primary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 0 

Secondary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 1 

Kubernetes deployment

When deploying OSS Connector for AI/ML in a Kubernetes environment, you can install it by using an Init Container, performing a dynamic installation at startup, or creating a custom image. For more information and a complete YAML example of a Kubernetes deployment, see Enable Connector in Kubernetes.

Performance testing

Single-node model loading test

Test environment

Item

Specification

OSS

Beijing, internal network download bandwidth 250 Gbps

Test node

ecs.g7nex.32xlarge, network bandwidth 160 Gbps (80 Gbps × 2)

Metrics

Metric

Description

Model download

The time taken to download the model files by using the connector.

End-to-end

The total time from starting the CPU version of the vllm API server until the service is ready.

Test results

Model name

Model size (GB)

Download time (s)

End-to-end time (s)

Qwen2.5-14B

27.522

1.7721

20.48

Qwen2.5-72B

135.437

10.57

30.09

Qwen3-8B

15.271

0.97

18.88

Qwen3-32B

61.039

3.99

22.97