PAI services — Deep Learning Containers (DLC) and Data Science Workshop (DSW) — support four methods for reading training data from Object Storage Service (OSS): JindoFuse, ossfs 2.0, OSS Connector for AI/ML, and the OSS Python SDK. Each method suits different workloads, frameworks, and data access patterns.
Choose a method
Downloading the full dataset before training causes GPU idle time, forces re-downloads for every run, and wastes bandwidth when only a fraction of the data is needed. The four methods below let you access OSS data directly from your training container, without a full local copy.
| Method | Best for | Notes |
|---|---|---|
| JindoFuse | Non-PyTorch frameworks; small datasets that fit in local cache; workloads that write back to OSS | Mounts OSS as a local path; supports read and write |
| ossfs 2.0 | High-throughput AI training, inference, big data, and autonomous driving workloads with sequential reads or append-only writes | FUSE-based; does not require full POSIX semantics |
| OSS Connector for AI/ML | PyTorch training on millions of small files with high-throughput requirements | Streams data without mounting; uses the native PyTorch Dataset interface |
| OSS SDK (oss2) | Temporary or programmatic access when mounting is not needed | Maximum flexibility; no mount required |
OSS Connector for AI/ML is optimized for PyTorch and provides the best throughput when reading millions of small files. If your framework is not PyTorch or you need to write data back to OSS, use JindoFuse or ossfs 2.0 instead.
ossfs 2.0
ossfs 2.0 mounts OSS as a local file system using FUSE. It delivers high sequential read/write performance and is designed for AI training, inference, big data, and autonomous driving workloads that do not require full POSIX semantics.
To activate ossfs 2.0, set {"mountType":"ossfs"} in Advanced Configuration.
Mount OSS in DLC
When creating a DLC job, add an OSS data source. For configuration details, see Create a training job.

| Mount type | Description |
|---|---|
| Dataset | Select a dataset of the Object Storage Service (OSS) type and configure the mount path. Public datasets support read-only mode only. |
| Direct Mount | Directly mount an OSS bucket storage path. When using a Lingjun resource quota with local cache enabled, turn on the Use Cache switch. |
Mount OSS in DSW
When creating a DSW instance, add an OSS data source. For configuration details, see Create a DSW instance.

| Mount type | Description |
|---|---|
| Dataset mount | Select a dataset of the Object Storage Service (OSS) type and configure the mount path. Public datasets support read-only mode only. |
| Storage path mount | Directly mount an OSS bucket storage path. |
Configure ossfs 2.0
Set advanced parameters using fs.ossfs.args in Advanced Configuration. Separate multiple parameters with a comma. For the full parameter reference, see ossfs 2.0.
Files are static during the run — reading a fixed set of files with no modifications. A longer metadata cache reduces storage API calls.
{
"mountType": "ossfs",
"fs.ossfs.args": "-oattr_timeout=7200"
}Fast read/write — balances cache efficiency and data freshness for typical training workloads:
{
"mountType": "ossfs",
"fs.ossfs.args": "-oattr_timeout=3, -onegative_timeout=0"
}Consistent view across distributed nodes — makes sure all nodes see the same file state after each write:
{
"mountType": "ossfs",
"fs.ossfs.args": "-onegative_timeout=0, -oclose_to_open"
}Out-of-memory (OOM) from many open files — high concurrency in DLC or DSW can open many files simultaneously, causing memory pressure. The following configuration reduces memory usage:
{
"mountType": "ossfs",
"fs.ossfs.args": "-oreaddirplus=false, -oinode_cache_eviction_threshold=300000"
}Writing large files — by default, ossfs 2.0 uses an 8 MiB part size for multipart uploads, which limits individual file writes to 78.125 GiB. Files larger than this will fail to write. To increase the limit, raise the part size using -oupload_buffer_size. For example, a 32 MiB part size (33,554,432 bytes) raises the per-file limit to 312.5 GiB. A larger part size uses more memory; control total memory with -total_mem_limit. For all mount options, see Mount options.
{
"mountType": "ossfs",
"fs.ossfs.args": "-oupload_buffer_size=33554432"
}OSS Connector for AI/ML
OSS Connector for AI/ML is a client library for PyTorch training. It streams OSS data directly into your training loop without mounting, using the native PyTorch Dataset and IterableDataset interfaces.
Limitations
Official images: Only DLC jobs and DSW instances running PyTorch 2.0 or later are supported.
Custom images: PyTorch 2.0 or later and Python 3.8–3.12 are required. Install the connector with:
pip install -i http://yum.tbsite.net/aliyun-pypi/simple/ --extra-index-url http://yum.tbsite.net/pypi/simple/ --trusted-host=yum.tbsite.net osstorchconnectorOssCheckpoint: Available only in general computing resource environments.
Prerequisites
Before you begin, make sure you have:
A DLC job or DSW instance running PyTorch 2.0 or later
Configured credentials (see below)
A
config.jsonfile with concurrency and prefetch settings (see below)
Step 1: Configure credentials
Use one of the following methods:
RAM role (recommended): Configure a DLC RAM role so the job automatically gets a Security Token Service (STS) temporary credential. No explicit authentication configuration is needed in your code. See Configure a DLC RAM role.
Credential file: Create a credential file in your code project with the following format. After configuring a RAM role, the default credential path is
/mnt/.alibabacloud/credentials.NoteStoring AccessKey information in plaintext poses a security risk. Use a RAM role whenever possible.
Field Required Description Example AccessKeyIdYes AccessKey ID of an Alibaba Cloud account or RAM user. For STS credentials, use the temporary AccessKey ID. NTS****AccessKeySecretYes AccessKey secret of an Alibaba Cloud account or RAM user. For STS credentials, use the temporary AccessKey secret. 7NR2****SecurityTokenNo Required when using STS temporary credentials. STS.6MC2****ExpirationNo Expiration time of the credential. If blank, the credential never expires. The connector re-reads the file after expiration. 2024-08-20T00:00:00Z{ "AccessKeyId": "<access-key-id>", "AccessKeySecret": "<access-key-secret>", "SecurityToken": "<security-token>", "Expiration": "2024-08-20T00:00:00Z" }
Step 2: Create a config.json file
The config.json file sets concurrency, prefetch, and logging parameters for OSS data requests.
{
"logLevel": 1,
"logPath": "/var/log/oss-connector/connector.log",
"auditPath": "/var/log/oss-connector/audit.log",
"datasetConfig": {
"prefetchConcurrency": 24,
"prefetchWorker": 2
},
"checkpointConfig": {
"prefetchConcurrency": 24,
"prefetchWorker": 4,
"uploadConcurrency": 64
}
}| Field | Required | Description | Default |
|---|---|---|---|
logLevel | Yes | Log level. 0: Debug, 1: INFO, 2: WARN, 3: ERROR. | 1 (INFO) |
logPath | Yes | Connector log file path. | /var/log/oss-connector/connector.log |
auditPath | Yes | Audit log path. Records I/O requests with latency greater than 100 ms. | /var/log/oss-connector/audit.log |
datasetConfig.prefetchConcurrency | Yes | Concurrent prefetch tasks for dataset reads. | 24 |
datasetConfig.prefetchWorker | Yes | vCPUs allocated for dataset prefetch. | 4 |
checkpointConfig.prefetchConcurrency | Yes | Concurrent prefetch tasks for checkpoint reads. | 24 |
checkpointConfig.prefetchWorker | Yes | vCPUs allocated for checkpoint prefetch. | 4 |
checkpointConfig.uploadConcurrency | Yes | Concurrent tasks for checkpoint writes. | 64 |
Choose a dataset interface
OSS Connector for AI/ML provides two dataset interfaces that extend standard PyTorch classes:
| Interface | Extends | Reading order | Best for |
|---|---|---|---|
OssIterableDataset | IterableDataset | Sequential | Large datasets, limited memory, no shuffle needed |
OssMapDataset | Dataset | Determined by DataLoader; supports shuffle | Small datasets, sufficient memory, random access or parallel processing |
Both interfaces support three access methods:
from_prefix()— access all files under an OSS path prefixfrom_manifest_file()— access files listed in a manifest file; supports multiple bucketsfrom_objects()— access a specific list of OSS URIs
Use OssMapDataset
Access by OSS path prefix:
Use this when your dataset files are organized under a single OSS folder and you do not need a separate index file.
def read_and_transform(data):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
])
try:
img = accimage.Image((data.read()))
val = transform(img)
label = data.label # file name
except Exception as e:
print("read failed", e)
return None, 0
return val, label
dataset = OssMapDataset.from_prefix(
"{oss_data_folder_uri}",
endpoint="{oss_endpoint}",
transform=read_and_transform,
cred_path=cred_path,
config_path=config_path
)Access by manifest file:
Use this when your dataset spans multiple OSS buckets or you need a separate index mapping files to labels.
Expected manifest format (one JSON object per line):
{'data': {'source': 'oss://examplebucket.oss-cn-wulanchabu.aliyuncs.com/dataset_folder/class1/image1.JPEG'}}
{'data': {'source': ''}}def transform_oss_path(input_path):
pattern = r'oss://(.*?)\.(.*?)/(.*)'
match = re.match(pattern, input_path)
if match:
return f'oss://{match.group(1)}/{match.group(3)}'
else:
return input_path
def manifest_parser(reader: io.IOBase) -> Iterable[Tuple[str, str, int]]:
lines = reader.read().decode("utf-8").strip().split("\n")
for i, line in enumerate(lines):
data = json.loads(line)
yield transform_oss_path(data["data"]["source"]), ""
dataset = OssMapDataset.from_manifest_file(
"{manifest_file_path}",
manifest_parser,
"",
endpoint=endpoint,
transform=read_and_trans,
cred_path=cred_path,
config_path=config_path
)Access by list of OSS URIs:
uris = [
"oss://examplebucket.oss-cn-wulanchabu.aliyuncs.com/dataset_folder/class1/image1.JPEG",
"oss://examplebucket.oss-cn-wulanchabu.aliyuncs.com/dataset_folder/class2/image2.JPEG"
]
dataset = OssMapDataset.from_objects(
uris,
endpoint=endpoint,
transform=read_and_trans,
cred_path=cred_path,
config_path=config_path
)Use OssIterableDataset
OssIterableDataset supports the same three access methods. Replace OssMapDataset with OssIterableDataset in each call:
# Access by prefix
dataset = OssIterableDataset.from_prefix("{oss_data_folder_uri}", endpoint="{oss_endpoint}", transform=read_and_transform, cred_path=cred_path, config_path=config_path)
# Access by manifest file
dataset = OssIterableDataset.from_manifest_file("{manifest_file_path}", manifest_parser, "", endpoint=endpoint, transform=read_and_trans, cred_path=cred_path, config_path=config_path)
# Access by URI list
dataset = OssIterableDataset.from_objects(uris, endpoint=endpoint, transform=read_and_trans, cred_path=cred_path, config_path=config_path)Use OssCheckpoint
OssCheckpoint saves and loads model checkpoints directly to and from OSS, without mounting.
OssCheckpoint is available only in general computing resource environments.
checkpoint = OssCheckpoint(endpoint="{oss_endpoint}", cred_path=cred_path, config_path=config_path)
checkpoint_read_uri = "{checkpoint_path}"
checkpoint_write_uri = "{checkpoint_path}"
with checkpoint.reader(checkpoint_read_uri) as reader:
state_dict = torch.load(reader)
model.load_state_dict(state_dict)
with checkpoint.writer(checkpoint_write_uri) as writer:
torch.save(model.state_dict(), writer)Full example
The following example combines dataset loading and checkpoint management in a complete training loop.
from osstorchconnector import OssMapDataset, OssCheckpoint
import torchvision.transforms as transforms
import accimage
import torchvision.models as models
import torch
cred_path = "/mnt/.alibabacloud/credentials" # Default path after configuring a DLC RAM role.
config_path = "config.json"
checkpoint = OssCheckpoint(endpoint="{oss_endpoint}", cred_path=cred_path, config_path=config_path)
model = models.__dict__["resnet18"]()
epochs = 100
checkpoint_read_uri = "{checkpoint_path}"
checkpoint_write_uri = "{checkpoint_path}"
# Load an existing checkpoint.
with checkpoint.reader(checkpoint_read_uri) as reader:
state_dict = torch.load(reader)
model.load_state_dict(state_dict)
def read_and_transform(data):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
])
try:
img = accimage.Image((data.read()))
value = transform(img)
except Exception as e:
print("read failed", e)
return None, 0
return value, 0
# Build a dataset from OSS — no local download or mount required.
dataset = OssMapDataset.from_prefix(
"{oss_data_folder_uri}",
endpoint="{oss_endpoint}",
transform=read_and_transform,
cred_path=cred_path,
config_path=config_path
)
data_loader = torch.utils.data.DataLoader(
dataset, batch_size="{batch_size}", num_workers="{num_workers}", pin_memory=True
)
for epoch in range(epochs):
for step, (images, target) in enumerate(data_loader):
# Batch processing and model training
pass
# Save checkpoint after each epoch.
with checkpoint.writer(checkpoint_write_uri) as writer:
torch.save(model.state_dict(), writer)OSS SDK
Use the OSS Python API (oss2) when you need temporary or programmatic access to OSS data without mounting — for example, to apply custom business logic when selecting which data to load.
Prerequisites
Before you begin, make sure you have:
Installed the OSS Python SDK. See Installation (Python SDK V1).
Configured access credentials. See Configure access credentials (Python SDK V1).
Read and write OSS data
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Load credentials from the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables.
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
bucket = oss2.Bucket(auth, '<endpoint>', '<your_bucket_name>')
# Read a complete file.
result = bucket.get_object('<your_file_path/your_file>')
print(result.read())
# Read a byte range.
result = bucket.get_object('<your_file_path/your_file>', byte_range=(0, 99))
# Write data.
bucket.put_object('<your_file_path/your_file>', '<your_object_content>')
# Append to an appendable file.
result = bucket.append_object('<your_file_path/your_file>', 0, '<your_object_content>')
result = bucket.append_object('<your_file_path/your_file>', result.next_position, '<your_object_content>')Replace the following placeholders:
| Placeholder | Description | Example |
|---|---|---|
<endpoint> | Endpoint for the region where the bucket resides. | https://oss-cn-hangzhou.aliyuncs.com. For the full list, see Regions and endpoints. |
<your_bucket_name> | Name of the OSS bucket. | my-training-data |
<your_file_path/your_file> | Full object path, excluding the bucket name. | testfolder/exampleobject.txt |
<your_object_content> | Content to write or append. | — |
Load training data with a custom dataset
Store data in an OSS bucket with an index file that maps file paths to labels. Use the DataLoader API to read data in parallel across multiple processes.
import io
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
import PIL
import torch
class OSSDataset(torch.utils.data.dataset.Dataset):
def __init__(self, endpoint, bucket, auth, index_file):
self._bucket = oss2.Bucket(auth, endpoint, bucket)
# Index file format: each sample is "path:label", samples separated by commas.
self._indices = self._bucket.get_object(index_file).read().split(',')
def __len__(self):
return len(self._indices)
def __getitem__(self, index):
img_path, label = self._indices(index).strip().split(':')
img_str = self._bucket.get_object(img_path)
img_buf = io.BytesIO()
img_buf.write(img_str.read())
img_buf.seek(0)
img = Image.open(img_buf).convert('RGB')
img_buf.close()
return img, label
# Load credentials from environment variables.
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
dataset = OSSDataset(endpoint, bucket, auth, index_file)
data_loader = torch.utils.data.DataLoader(
dataset,
batch_size=batch_size,
num_workers=num_loaders,
pin_memory=True
)| Parameter | Description |
|---|---|
endpoint | Endpoint for the region where the bucket resides. For example, https://oss-cn-hangzhou.aliyuncs.com. See Regions and endpoints. |
bucket | Bucket name. |
index_file | Path to the index file. Each sample uses the format path:label, separated by commas. |
Save and load PyTorch models
For more details on PyTorch model serialization, see PyTorch — saving and loading models.
Save a model:
from io import BytesIO
import torch
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
bucket_name = "<your_bucket_name>" # Do not prefix with oss://
bucket = oss2.Bucket(auth, endpoint, bucket_name)
buffer = BytesIO()
torch.save(model.state_dict(), buffer)
bucket.put_object("<your_model_path>", buffer.getvalue())Load a model:
from io import BytesIO
import torch
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
bucket_name = "<your_bucket_name>" # Do not prefix with oss://
bucket = oss2.Bucket(auth, endpoint, bucket_name)
buffer = BytesIO(bucket.get_object("<your_model_path>").read())
model.load_state_dict(torch.load(buffer))Replace <your_bucket_name> with the bucket name (without the oss:// prefix) and <your_model_path> with the model object path.

