This topic describes how to use Object Storage Service (OSS) SDK for Python, OSS API for Python to read data from and write data to OSS.
Background information
OSS is a secure, cost-effective, and highly reliable cloud storage service provided by Alibaba Cloud. It enables you to store a large amount of data in the cloud. By default, Data Science Workshop (DSW) instances are attached with Network Attached Storage (NAS) file systems. You can also use DSW instances with OSS if you require larger storage.
OSS SDK for Python
In most cases, you can use OSS SDK for Python to read data from and write data to OSS. For more information, see . DSW has preinstalled OSS2 for Python packages. The following code block describes how to read data from or write data to OSS:
Authentication and initialization.
import oss2 auth = oss2.Auth('<your_AccessKey_ID>', '<your_AccessKey_Secret>') bucket = oss2.Bucket(auth, 'http://oss-cn-beijing-internal.aliyuncs.com', '<your_bucket_name>')
Set the following parameters based on your requirements.
Parameter
Description
<your_AccessKey_ID>
The AccessKey ID of your Alibaba Cloud account.
<your_AccessKey_Secret>
The AccessKey secret of your Alibaba Cloud account.
http://oss-cn-beijing-internal.aliyuncs.com
The endpoint of OSS. Select an endpoint based on the region where your DSW instances are deployed.
Pay-as-you-go instances deployed in China (Beijing): oss-cn-beijing.aliyuncs.com
Subscription instances deployed in China (Beijing): oss-cn-beijing-internal.aliyuncs.com
GPU P100 instances and CPU instances deployed in China (Shanghai): oss-cn-shanghai.aliyuncs.com
GPU M40 instances deployed in China (Shanghai): oss-cn-shanghai-internal.aliyuncs.com
<your_bucket_name>
The name of the OSS bucket. It cannot start with oss://.
Read data from and write data to OSS.
#Read a file from OSS. result = bucket.get_object('<your_file_path/your_file>') print(result.read()) #Read data by range. result = bucket.get_object('<your_file_path/your_file>', byte_range=(0, 99)) #Write data to OSS. bucket.put_object('<your_file_path/your_file>', '<your_object_content>') #Append a file. result = bucket.append_object('<your_file_path/your_file>', 0, '<your_object_content>') result = bucket.append_object('<your_file_path/your_file>', result.next_position, '<your_object_content>')
<your_file_path/your_file>
indicates the OSS path from which the data is to be read and to which the data is to be written.<your_object_content>
indicates the content that you want to append to a file. Set the parameters based on your requirements.
OSS API for Python
For PyTorch users, DSW provides OSS API for Python to read data from and write data to OSS.
You can store training data and model files in OSS.
Load training data
You can store training data in an OSS bucket. The path and labels of the data must be stored in an index file in the same OSS bucket. You can customize DataSet and call the
DataLoader
API in PyTorch to read data through multiple threads in parallel. The following code block shows an example:import io import oss2 import PIL import torch class OSSDataset(torch.utils.data.dataset.Dataset): def __init__(self, endpoint, bucket, auth, index_file): self._bucket = oss2.Bucket(auth, endpoint, bucket) self._indices = self._bucket.get_object(index_file).read().split(',') def __len__(self): return len(self._indices) def __getitem__(self, index): img_path, label = self._indices(index).strip().split(':') img_str = self._bucket.get_object(img_path) img_buf = io.BytesIO() img_buf.write(img_str.read()) img_buf.seek(0) img = Image.open(img_buf).convert('RGB') img_buf.close() return img, label dataset = OSSDataset(endpoint, bucket,auth, index_file) data_loader = torch.utils.data.DataLoader( dataset, batch_size=batch_size, num_workers=num_loaders, pin_memory=True)
endpoint
indicates the endpoint of OSS.bucket
indicates the name of the OSS bucket.auth
indicates the objects that are authenticated.index_file
indicates the path of the index file. Set the parameters based on your requirements.NoteIn this topic, samples in the index file are separated with commas (,). The sample path and labels are separated with colons (:).
Save or load models
You can use OSS2 API for Python to save or load PyTorch models. For more information about how to save or load models by using PyTorch, see PyTorch.
Save a model
from io import BytesIO import torch import oss2 # bucket_name bucket_name = "<your_bucket_name>" bucket = oss2.Bucket(auth, endpoint, bucket_name) buffer = BytesIO() torch.save(model.state_dict(), buffer) bucket.put_object("<your_model_path>", buffer.getvalue())
endpoint
indicates the endpoint of OSS.bucket
indicates the name of the OSS bucket. It cannot start with oss://.auth
indicates the objects that are authenticated.<your_model_path>
indicates the path where you want to store the model. Set the parameters based on your requirements.Load a model
from io import BytesIO import torch import oss2 bucket_name = "<your_bucket_name>" bucket = oss2.Bucket(auth, endpoint, bucket_name) buffer = BytesIO(bucket.get_object("<your_model_path>").read()) model.load_state_dict(torch.load(buffer))
endpoint
indicates the endpoint of OSS.bucket
indicates the name of the OSS bucket. It cannot start with oss://.auth
indicates the objects that are authenticated.log_file
indicates the path where the model is stored. Set the parameters based on your requirements.