All Products
Search
Document Center

Object Storage Service:Improve model deployment efficiency with OSS Connector for AI/ML

Last Updated:Sep 25, 2025

OSS Connector for AI/ML offers a non-intrusive model loading solution that requires no code changes. It uses `LD_PRELOAD` for high-performance direct reads from OSS. The connector supports prefetching and caching to significantly improve model loading speed. It works with containers and mainstream inference frameworks.

High performance

OSS Connector for AI/ML significantly improves performance when loading large models from OSS. With sufficient bandwidth, throughput can exceed 10 GB/s. For more information, see Performance testing.

How it works

OSS Connector for AI/ML addresses performance bottlenecks that occur when you load large models from OSS in a cloud environment.

  • Traditional mount solutions based on Filesystem in Userspace (FUSE) often cannot fully utilize the high bandwidth of OSS. This results in slow model loading. OSS Connector improves data access efficiency by intercepting I/O requests from the inference framework and converting them directly into HTTP(s) requests to OSS.

  • It uses the `LD_PRELOAD` mechanism to prefetch and cache model data in memory. This requires no code changes to your inference application and significantly speeds up model loading.

Deployment environment

  • Operating system: Linux x86-64

  • glibc: >=2.17

Install OSS Connector

  1. Download the complete installation package.

    • oss-connector-lib-1.1.0rc7.x86_64.rpm: For Red Hat-based Linux distributions

      https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.rpm
    • oss-connector-lib-1.1.0rc7.x86_64.deb: For Debian-based Linux distributions

      https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.deb
  2. Install OSS Connector.

    Use the downloaded .rpm or .deb package for the installation. The dynamic library file `libossc_preload.so` is automatically installed to the /usr/local/lib/ directory.

    • Install oss-connector-lib-1.1.0rc7.x86_64.rpm

      yum install -y oss-connector-lib-1.1.0rc7.x86_64.rpm
    • Install oss-connector-lib-1.1.0rc7.x86_64.deb

      dpkg -i oss-connector-lib-1.1.0rc7.x86_64.deb
  3. After installation, verify that `/usr/local/lib/libossc_preload.so` exists and that the version is correct.

    nm -D /usr/local/lib/libossc_preload.so | grep version

Configure OSS Connector

  • Configuration file

    You can use the configuration file to control log output, cache policy, and prefetch concurrency. Correctly setting these parameters can improve system performance and maintenance.

    The configuration file is located at /etc/oss-connector/config.json. The installation package includes a default configuration file, as shown below:

    {
        "logLevel": 1,
        "logPath": "/var/log/oss-connector/connector.log",
        "auditPath": "/var/log/oss-connector/audit.log",
        "expireTimeSec": 120,
        "prefetch": {
            "vcpus": 16,
            "workers": 16
        }
    }
    

    Parameter

    Description

    logLevel

    Log level. Controls the detail level of log output.

    logPath

    Log file path. Specifies the output location for runtime logs.

    auditPath

    Audit log file path. Records audit information for security and compliance tracking.

    expireTimeSec

    The delayed release time for cache files in seconds. Files are released after a delay when there are no references. The default is 120 seconds.

    prefetch.vcpus

    The number of virtual CPUs (concurrent CPU cores) used for prefetching. The default value is 16.

    prefetch.workers

    The number of coroutines (workers) per vCPU. This is used to increase concurrency. The default value is 16.

  • Configure environment variables

    Environment variable KEY

    Description

    OSS_ACCESS_KEY_ID

    The AccessKey ID and AccessKey secret of an Alibaba Cloud account or a Resource Access Management (RAM) user.

    When you configure permissions with a temporary access token, set these to the AccessKey ID and AccessKey secret of the temporary access credential.

    OSS Connector requires the `oss:ListObjects` permission for the target bucket directory. If the bucket and files you access support anonymous access, you can leave the `OSS_ACCESS_KEY_ID` and `OSS_ACCESS_KEY_SECRET` environment variables unset or set them to empty strings.

    OSS_ACCESS_KEY_SECRET

    OSS_SESSION_TOKEN

    The temporary access token. You must set this parameter when you use a temporary access credential from Security Token Service (STS) to access OSS.

    When you use the AccessKey ID and AccessKey secret of an Alibaba Cloud account or RAM user for permission configuration, set this field to an empty string.

    OSS_ENDPOINT

    Specifies the OSS service Endpoint. Example: http://oss-cn-beijing-internal.aliyuncs.com. If you do not specify a protocol, HTTPS is used by default. We recommend using the HTTP protocol in secure environments, such as an internal network, for better performance.

    OSS_REGION

    Specifies the OSS Region ID. Example: cn-beijing. If not specified, authentication may fail.

    OSS_PATH

    The OSS model directory. The format is `oss://bucketname/path/`. Example: oss://examplebucket/qwen/Qwen3-8B/.

    MODEL_DIR

    The local model directory passed to vllm or other inference frameworks. We recommend emptying the directory first. Temporary data is downloaded during use and can be deleted afterward.

    Note
    • The `MODEL_DIR` path must be consistent with the model path of the inference framework, such as the `--model` parameter for vllm or the `--model-path` parameter for sglang.

    • `MODEL_DIR` requires read and write permissions. The directory structure of `MODEL_DIR` corresponds to `OSS_PATH`.

    • During model loading, model files are prefetched and cached in memory. The cache is released after a delay when the model is loaded. The default delay is 120 seconds. You can adjust this with the `expireTimeSec` parameter in the configuration file.

    • Use the local model directory only for loading models with the connector. It cannot be used for other purposes.

    • Do not create the local model directory on another OSS mount target, such as an ossfs mount target.

    LD_PRELOAD

    The path to the dynamic library to be preloaded, usually /usr/local/lib/libossc_preload.so. We recommend configuring this using a temporary environment variable. For example: LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ./myapp

    ENABLE_CONNECTOR

    Sets the OSS Connector process role. Use a temporary environment variable to make it effective.

    • `ENABLE_CONNECTOR=1`: Primary connector role.

    • `ENABLE_CONNECTOR=2`: Secondary connector role.

    A single running instance can have only one primary connector process. We recommend assigning the primary role to the main process, such as an entrypoint. All other processes that use the connector must be assigned the secondary connector role. For more information, see the ray+vllm example for multi-node startup.

Start the model service

Single-node startup

vllm API Server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce

sglang API Server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 

Multi-node startup

ray+vllm

Common environment variables:

export OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID}
export OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET}
export OSS_ENDPOINT=${OSS_ENDPOINT}
export OSS_REGION=${OSS_REGION}
export OSS_PATH=oss://examplebucket/
export MODEL_DIR=/tmp/models
Important

The `OSS_PATH` and `MODEL_DIR` variables must correspond. For example, if the model path on OSS is `oss://examplebucket/qwen/Qwen2___5-72B/`, the local model directory is `/tmp/models/qwen/Qwen2___5-72B/`.

Pod A starts the ray head:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --head --dashboard-host 0.0.0.0 --block

Pod B starts ray and joins the cluster:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --address='172.24.176.137:6379' --block     // 172.24.176.137 is the pod IP. Change this to the IP address of the head pod. The command to join the cluster is provided in the output after you run `ray start` on Pod A.

Start the vllm API Server:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=2 python3 -m vllm.entrypoints.openai.api_server --model ${MODEL_DIR}/qwen/Qwen2___5-72B/ --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.98 --tensor-parallel-size 32

sglang

Configure environment variables for the sglang process on each node.

Primary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 0 

Secondary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 1 

Kubernetes deployment

To deploy a pod in a Kubernetes environment, first build an image with the connector installed and push the image to a repository. The following YAML file is an example of a Kubernetes pod deployment:

apiVersion: v1
kind: ConfigMap
metadata:
  name: connector-config
data:
  config.json: |
    {
        "logLevel": 1,
        "logPath": "/var/log/oss-connector/connector.log",
        "auditPath": "/var/log/oss-connector/audit.log",
        "expireTimeSec": 120,
        "prefetch": {
            "vcpus": 16,
            "workers": 16
        }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-connector-deployment
spec:
  selector:
    matchLabels:
      app: model-connector
  template:
    metadata:
      labels:
        app: model-connector
    spec:
      imagePullSecrets:
        - name: acr-credential-beijing
      hostNetwork: true
      containers:
      - name: container-name
        image: {IMAGE_ADDRESS}
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "24"
            memory: "700Gi"
          limits:
            cpu: "128"
            memory: "900Gi"
        command: 
          - bash
          - -c
          - ENABLE_CONNECTOR=1 python3 -m vllm.entrypoints.openai.api_server --model /var/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
        env:
        - name: LD_PRELOAD
          value: "/usr/local/lib/libossc_preload.so"
        - name: OSS_ENDPOINT
          value: "oss-cn-beijing-internal.aliyuncs.com"
        - name: OSS_REGION
          value: "cn-beijing"
        - name: OSS_PATH
          value: "oss://examplebucket/qwen/Qwen1.5-7B-Chat/"
        - name: MODEL_DIR
          value: "/var/model/"
        - name: OSS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: oss-access-key-connector
              key: key
        - name: OSS_ACCESS_KEY_SECRET
          valueFrom:
            secretKeyRef:
              name: oss-access-key-connector
              key: secret
        volumeMounts:
          - name: connector-config
            mountPath:  /etc/oss-connector/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: connector-config
        configMap:
          name: connector-config

Performance testing

Single-node model loading test

Test environment

Metric

Description

OSS

Beijing, internal network download bandwidth 250 Gbps

Test node

ecs.g7nex.32xlarge, network bandwidth 160 Gbps (80 Gbps × 2)

Statistical metrics

Metric

Description

Model download

The time from when the model file download starts to when it finishes using the connector.

End-to-end

The time it takes for the CPU version of the vllm API server to start and become ready.

Test results

Model name

Model size (GB)

Model download time (seconds)

End-to-end time (seconds)

Qwen2.5-14B

27.522

1.7721

20.48

Qwen2.5-72B

135.437

10.57

30.09

Qwen3-8B

15.271

0.97

18.88

Qwen3-32B

61.039

3.99

22.97