OssCheckpoint reads and writes checkpoints—snapshots of model state saved at a specific point during training—directly from OSS buckets. It is suitable for scenarios that involve reading and writing results during the data training process.
Prerequisites
Before you begin, ensure that you have:
OSS Connector for AI/ML installed. See Install OSS Connector for AI/ML.
OSS Connector for AI/ML configured. See Configure OSS Connector for AI/ML.
Read and write checkpoints
Initialize OssCheckpoint with your endpoint and credential paths, then use the reader and writer context managers to load and save model state.
Read a checkpoint:
import torch
from osstorchconnector import OssCheckpoint
ENDPOINT = "endpoint"
CRED_PATH = "/root/.alibabacloud/credentials"
CONFIG_PATH = "/etc/oss-connector/config.json"
checkpoint = OssCheckpoint(endpoint=ENDPOINT, cred_path=CRED_PATH, config_path=CONFIG_PATH)
CHECKPOINT_READ_URI = "oss://checkpoint/epoch.0"
with checkpoint.reader(CHECKPOINT_READ_URI) as reader:
state_dict = torch.load(reader)Write a checkpoint:
CHECKPOINT_WRITE_URI = "oss://checkpoint/epoch.1"
with checkpoint.writer(CHECKPOINT_WRITE_URI) as writer:
torch.save(state_dict, writer)Data types
OssCheckpoint objects support common I/O operations. For the full list of supported data types, see Data types in OSS Connector for AI/ML.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
endpoint | string | Yes | The endpoint used to access OSS. See Regions and endpoints. |
cred_path | string | Yes | Path to the credentials file. Default: /root/.alibabacloud/credentials. See Configure access credentials. |
config_path | string | Yes | Path to the OSS Connector configuration file. Default: /etc/oss-connector/config.json. See Configure OSS Connector. |