This topic describes how to use the OSS Python SDK and the OSS Python API to read and write data in OSS.
Recommendations
To frequently access and process large datasets, register OSS as a dataset and mount it. For temporary access or access that depends on your business logic, use the SDK and API methods described in this topic.
Use the OSS Python SDK
DSW includes the oss2 Python package. Follow these steps to read and write data in OSS.
Authentication and initialization.
import oss2 auth = oss2.Auth('<your_AccessKey_ID>', '<your_AccessKey_Secret>') bucket = oss2.Bucket(auth, '<your_oss_endpoint>', '<your_bucket_name>')Replace the following placeholders with your actual values.
Parameter
Description
<your_AccessKey_ID> and <your_AccessKey_Secret>
The AccessKey ID and AccessKey secret for your Alibaba Cloud account. For more information, see Create an AccessKey pair.
<your_oss_endpoint>
The endpoint of the OSS instance. Choose the endpoint that matches your instance's region:
Pay-as-you-go instances in the China (Beijing) Region:
oss-cn-beijing.aliyuncs.comSubscription instances in the China (Beijing) region:
oss-cn-beijing-internal.aliyuncs.comGPU P100 instances or CPU instances in the China (Shanghai) region:
oss-cn-shanghai.aliyuncs.comGPU M40 instances in the China (Shanghai) region:
oss-cn-shanghai-internal.aliyuncs.com
For more information, see Regions and endpoints of OSS.
<your_bucket_name>
The name of the bucket. Do not include the
oss://prefix.Read data from and write data to OSS.
# Read a complete file. result = bucket.get_object('<your_file_path/your_file>') print(result.read()) # Read data by range. result = bucket.get_object('<your_file_path/your_file>', byte_range=(0, 99)) # Write data to OSS. bucket.put_object('<your_file_path/your_file>', '<your_object_content>') # Append data to a file. result = bucket.append_object('<your_file_path/your_file>', 0, '<your_object_content>') result = bucket.append_object('<your_file_path/your_file>', result.next_position, '<your_object_content>')Replace the following placeholders with your actual values:
<your_file_path/your_file>: The path to the file you want to read or write.<your_object_content>: The content you want to write or append.
Use the OSS Python API
For PyTorch users, DSW provides the OSS Python API to read and write data directly in OSS.
You can store training data or models in OSS:
Load training data
You can store your data in an OSS Bucket and save the data paths and corresponding labels in an index file within the same Bucket. By creating a custom
Dataset, you can use theDataLoaderAPI in PyTorch to read data in parallel across multiple processes. The following code provides an example.import io import oss2 import PIL import torch class OSSDataset(torch.utils.data.dataset.Dataset): def __init__(self, endpoint, bucket, auth, index_file): self._bucket = oss2.Bucket(auth, endpoint, bucket) self._indices = self._bucket.get_object(index_file).read().split(',') def __len__(self): return len(self._indices) def __getitem__(self, index): img_path, label = self._indices(index).strip().split(':') img_str = self._bucket.get_object(img_path) img_buf = io.BytesIO() img_buf.write(img_str.read()) img_buf.seek(0) img = Image.open(img_buf).convert('RGB') img_buf.close() return img, label dataset = OSSDataset(endpoint, bucket, auth, index_file) data_loader = torch.utils.data.DataLoader( dataset, batch_size=batch_size, num_workers=num_loaders, pin_memory=True)Replace the following placeholders with your actual values:
endpoint: The OSS endpoint.bucket: The Bucket name.auth: The Authentication object.index_file: The path to the index file.
NoteIn this example, the index file uses this format: commas (,) separate samples, and colons (:) separate the sample path from its label.
Save or load a model
You can use the
oss2Python API to save or load a PyTorch model. For more information on saving and loading models in PyTorch, see PyTorch.Save a model
from io import BytesIO import torch import oss2 # Specify the Bucket name. bucket_name = "<your_bucket_name>" bucket = oss2.Bucket(auth, endpoint, bucket_name) buffer = BytesIO() torch.save(model.state_dict(), buffer) bucket.put_object("<your_model_path>", buffer.getvalue())Replace the following placeholders with your actual values:
auth: The Authentication object.endpoint: The OSS endpoint.<your_bucket_name>: The OSS Bucket name, without theoss://prefix.<your_model_path>: The destination path for the model within the bucket.
Load a model
from io import BytesIO import torch import oss2 bucket_name = "<your_bucket_name>" bucket = oss2.Bucket(auth, endpoint, bucket_name) buffer = BytesIO(bucket.get_object("<your_model_path>").read()) model.load_state_dict(torch.load(buffer))Replace the following placeholders with your actual values:
auth: The Authentication object.endpoint: The OSS endpoint.<your_bucket_name>: The OSS Bucket name, without theoss://prefix.<your_model_path>: The path of the model to load from the bucket.