OSS Connector for AI/ML offers a non-intrusive model loading solution that requires no code changes. It uses `LD_PRELOAD` for high-performance direct reads from OSS. The connector supports prefetching and caching to significantly improve model loading speed. It works with containers and mainstream inference frameworks.
High performance
OSS Connector for AI/ML significantly improves performance when loading large models from OSS. With sufficient bandwidth, throughput can exceed 10 GB/s. For more information, see Performance testing.
How it works
OSS Connector for AI/ML addresses performance bottlenecks that occur when you load large models from OSS in a cloud environment.
Traditional mount solutions based on Filesystem in Userspace (FUSE) often cannot fully utilize the high bandwidth of OSS. This results in slow model loading. OSS Connector improves data access efficiency by intercepting I/O requests from the inference framework and converting them directly into HTTP(s) requests to OSS.
It uses the `LD_PRELOAD` mechanism to prefetch and cache model data in memory. This requires no code changes to your inference application and significantly speeds up model loading.
Deployment environment
Operating system: Linux x86-64
glibc: >=2.17
Install OSS Connector
Download the complete installation package.
oss-connector-lib-1.1.0rc7.x86_64.rpm: For Red Hat-based Linux distributions
https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.rpmoss-connector-lib-1.1.0rc7.x86_64.deb: For Debian-based Linux distributions
https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.deb
Install OSS Connector.
Use the downloaded .rpm or .deb package for the installation. The dynamic library file `libossc_preload.so` is automatically installed to the
/usr/local/lib/directory.Install oss-connector-lib-1.1.0rc7.x86_64.rpm
yum install -y oss-connector-lib-1.1.0rc7.x86_64.rpmInstall oss-connector-lib-1.1.0rc7.x86_64.deb
dpkg -i oss-connector-lib-1.1.0rc7.x86_64.deb
After installation, verify that `/usr/local/lib/libossc_preload.so` exists and that the version is correct.
nm -D /usr/local/lib/libossc_preload.so | grep version
Configure OSS Connector
Configuration file
You can use the configuration file to control log output, cache policy, and prefetch concurrency. Correctly setting these parameters can improve system performance and maintenance.
The configuration file is located at
/etc/oss-connector/config.json. The installation package includes a default configuration file, as shown below:{ "logLevel": 1, "logPath": "/var/log/oss-connector/connector.log", "auditPath": "/var/log/oss-connector/audit.log", "expireTimeSec": 120, "prefetch": { "vcpus": 16, "workers": 16 } }Parameter
Description
logLevel
Log level. Controls the detail level of log output.
logPath
Log file path. Specifies the output location for runtime logs.
auditPath
Audit log file path. Records audit information for security and compliance tracking.
expireTimeSec
The delayed release time for cache files in seconds. Files are released after a delay when there are no references. The default is 120 seconds.
prefetch.vcpus
The number of virtual CPUs (concurrent CPU cores) used for prefetching. The default value is 16.
prefetch.workers
The number of coroutines (workers) per vCPU. This is used to increase concurrency. The default value is 16.
Configure environment variables
Environment variable KEY
Description
OSS_ACCESS_KEY_ID
The AccessKey ID and AccessKey secret of an Alibaba Cloud account or a Resource Access Management (RAM) user.
When you configure permissions with a temporary access token, set these to the AccessKey ID and AccessKey secret of the temporary access credential.
OSS Connector requires the `oss:ListObjects` permission for the target bucket directory. If the bucket and files you access support anonymous access, you can leave the `OSS_ACCESS_KEY_ID` and `OSS_ACCESS_KEY_SECRET` environment variables unset or set them to empty strings.
OSS_ACCESS_KEY_SECRET
OSS_SESSION_TOKEN
The temporary access token. You must set this parameter when you use a temporary access credential from Security Token Service (STS) to access OSS.
When you use the AccessKey ID and AccessKey secret of an Alibaba Cloud account or RAM user for permission configuration, set this field to an empty string.
OSS_ENDPOINT
Specifies the OSS service Endpoint. Example:
http://oss-cn-beijing-internal.aliyuncs.com. If you do not specify a protocol, HTTPS is used by default. We recommend using the HTTP protocol in secure environments, such as an internal network, for better performance.OSS_REGION
Specifies the OSS Region ID. Example: cn-beijing. If not specified, authentication may fail.
OSS_PATH
The OSS model directory. The format is `oss://bucketname/path/`. Example:
oss://examplebucket/qwen/Qwen3-8B/.MODEL_DIR
The local model directory passed to vllm or other inference frameworks. We recommend emptying the directory first. Temporary data is downloaded during use and can be deleted afterward.
NoteThe `MODEL_DIR` path must be consistent with the model path of the inference framework, such as the `--model` parameter for vllm or the `--model-path` parameter for sglang.
`MODEL_DIR` requires read and write permissions. The directory structure of `MODEL_DIR` corresponds to `OSS_PATH`.
During model loading, model files are prefetched and cached in memory. The cache is released after a delay when the model is loaded. The default delay is 120 seconds. You can adjust this with the `expireTimeSec` parameter in the configuration file.
Use the local model directory only for loading models with the connector. It cannot be used for other purposes.
Do not create the local model directory on another OSS mount target, such as an ossfs mount target.
LD_PRELOAD
The path to the dynamic library to be preloaded, usually
/usr/local/lib/libossc_preload.so. We recommend configuring this using a temporary environment variable. For example:LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ./myappENABLE_CONNECTOR
Sets the OSS Connector process role. Use a temporary environment variable to make it effective.
`ENABLE_CONNECTOR=1`: Primary connector role.
`ENABLE_CONNECTOR=2`: Secondary connector role.
A single running instance can have only one primary connector process. We recommend assigning the primary role to the main process, such as an entrypoint. All other processes that use the connector must be assigned the secondary connector role. For more information, see the ray+vllm example for multi-node startup.
Start the model service
Single-node startup
vllm API Server
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reducesglang API Server
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 Multi-node startup
ray+vllm
Common environment variables:
export OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID}
export OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET}
export OSS_ENDPOINT=${OSS_ENDPOINT}
export OSS_REGION=${OSS_REGION}
export OSS_PATH=oss://examplebucket/
export MODEL_DIR=/tmp/modelsThe `OSS_PATH` and `MODEL_DIR` variables must correspond. For example, if the model path on OSS is `oss://examplebucket/qwen/Qwen2___5-72B/`, the local model directory is `/tmp/models/qwen/Qwen2___5-72B/`.
Pod A starts the ray head:
LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --head --dashboard-host 0.0.0.0 --blockPod B starts ray and joins the cluster:
LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --address='172.24.176.137:6379' --block // 172.24.176.137 is the pod IP. Change this to the IP address of the head pod. The command to join the cluster is provided in the output after you run `ray start` on Pod A.Start the vllm API Server:
LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=2 python3 -m vllm.entrypoints.openai.api_server --model ${MODEL_DIR}/qwen/Qwen2___5-72B/ --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.98 --tensor-parallel-size 32sglang
Configure environment variables for the sglang process on each node.
Primary node startup:
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 0 Secondary node startup:
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 1 Kubernetes deployment
To deploy a pod in a Kubernetes environment, first build an image with the connector installed and push the image to a repository. The following YAML file is an example of a Kubernetes pod deployment:
apiVersion: v1
kind: ConfigMap
metadata:
name: connector-config
data:
config.json: |
{
"logLevel": 1,
"logPath": "/var/log/oss-connector/connector.log",
"auditPath": "/var/log/oss-connector/audit.log",
"expireTimeSec": 120,
"prefetch": {
"vcpus": 16,
"workers": 16
}
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-connector-deployment
spec:
selector:
matchLabels:
app: model-connector
template:
metadata:
labels:
app: model-connector
spec:
imagePullSecrets:
- name: acr-credential-beijing
hostNetwork: true
containers:
- name: container-name
image: {IMAGE_ADDRESS}
imagePullPolicy: Always
resources:
requests:
cpu: "24"
memory: "700Gi"
limits:
cpu: "128"
memory: "900Gi"
command:
- bash
- -c
- ENABLE_CONNECTOR=1 python3 -m vllm.entrypoints.openai.api_server --model /var/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
env:
- name: LD_PRELOAD
value: "/usr/local/lib/libossc_preload.so"
- name: OSS_ENDPOINT
value: "oss-cn-beijing-internal.aliyuncs.com"
- name: OSS_REGION
value: "cn-beijing"
- name: OSS_PATH
value: "oss://examplebucket/qwen/Qwen1.5-7B-Chat/"
- name: MODEL_DIR
value: "/var/model/"
- name: OSS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: oss-access-key-connector
key: key
- name: OSS_ACCESS_KEY_SECRET
valueFrom:
secretKeyRef:
name: oss-access-key-connector
key: secret
volumeMounts:
- name: connector-config
mountPath: /etc/oss-connector/
terminationGracePeriodSeconds: 10
volumes:
- name: connector-config
configMap:
name: connector-config
Performance testing
Single-node model loading test
Test environment
Metric | Description |
OSS | Beijing, internal network download bandwidth 250 Gbps |
Test node | ecs.g7nex.32xlarge, network bandwidth 160 Gbps (80 Gbps × 2) |
Statistical metrics
Metric | Description |
Model download | The time from when the model file download starts to when it finishes using the connector. |
End-to-end | The time it takes for the CPU version of the vllm API server to start and become ready. |
Test results
Model name | Model size (GB) | Model download time (seconds) | End-to-end time (seconds) |
Qwen2.5-14B | 27.522 | 1.7721 | 20.48 |
Qwen2.5-72B | 135.437 | 10.57 | 30.09 |
Qwen3-8B | 15.271 | 0.97 | 18.88 |
Qwen3-32B | 61.039 | 3.99 | 22.97 |