All Products
Search
Document Center

Platform For AI:Read data from and write data to Object Storage Service (OSS)

Last Updated:Nov 07, 2025

This topic describes how to use the OSS Python SDK and the OSS Python API to read and write data in OSS.

Recommendations

To frequently access and process large datasets, register OSS as a dataset and mount it. For temporary access or access that depends on your business logic, use the SDK and API methods described in this topic.

Use the OSS Python SDK

DSW includes the oss2 Python package. Follow these steps to read and write data in OSS.

  1. Authentication and initialization.

    import oss2
    auth = oss2.Auth('<your_AccessKey_ID>', '<your_AccessKey_Secret>')
    bucket = oss2.Bucket(auth, '<your_oss_endpoint>', '<your_bucket_name>')

    Replace the following placeholders with your actual values.

    Parameter

    Description

    <your_AccessKey_ID> and <your_AccessKey_Secret>

    The AccessKey ID and AccessKey secret for your Alibaba Cloud account. For more information, see Create an AccessKey pair.

    <your_oss_endpoint>

    The endpoint of the OSS instance. Choose the endpoint that matches your instance's region:

    • Pay-as-you-go instances in the China (Beijing) Region: oss-cn-beijing.aliyuncs.com

    • Subscription instances in the China (Beijing) region: oss-cn-beijing-internal.aliyuncs.com

    • GPU P100 instances or CPU instances in the China (Shanghai) region: oss-cn-shanghai.aliyuncs.com

    • GPU M40 instances in the China (Shanghai) region: oss-cn-shanghai-internal.aliyuncs.com

    For more information, see Regions and endpoints of OSS.

    <your_bucket_name>

    The name of the bucket. Do not include the oss:// prefix.

  2. Read data from and write data to OSS.

    # Read a complete file.
    result = bucket.get_object('<your_file_path/your_file>')
    print(result.read())
    # Read data by range.
    result = bucket.get_object('<your_file_path/your_file>', byte_range=(0, 99))
    # Write data to OSS.
    bucket.put_object('<your_file_path/your_file>', '<your_object_content>')
    # Append data to a file.
    result = bucket.append_object('<your_file_path/your_file>', 0, '<your_object_content>')
    result = bucket.append_object('<your_file_path/your_file>', result.next_position, '<your_object_content>')

    Replace the following placeholders with your actual values:

    • <your_file_path/your_file>: The path to the file you want to read or write.

    • <your_object_content>: The content you want to write or append.

Use the OSS Python API

For PyTorch users, DSW provides the OSS Python API to read and write data directly in OSS.

You can store training data or models in OSS:

  • Load training data

    You can store your data in an OSS Bucket and save the data paths and corresponding labels in an index file within the same Bucket. By creating a custom Dataset, you can use the DataLoader API in PyTorch to read data in parallel across multiple processes. The following code provides an example.

    import io
    import oss2
    import PIL
    import torch
    class OSSDataset(torch.utils.data.dataset.Dataset):
        def __init__(self, endpoint, bucket, auth, index_file):
            self._bucket = oss2.Bucket(auth, endpoint, bucket)
            self._indices = self._bucket.get_object(index_file).read().split(',')
        def __len__(self):
            return len(self._indices)
        def __getitem__(self, index):
            img_path, label = self._indices(index).strip().split(':')
            img_str = self._bucket.get_object(img_path)
            img_buf = io.BytesIO()
            img_buf.write(img_str.read())
            img_buf.seek(0)
            img = Image.open(img_buf).convert('RGB')
            img_buf.close()
            return img, label
    dataset = OSSDataset(endpoint, bucket, auth, index_file)
    data_loader = torch.utils.data.DataLoader(
        dataset,
        batch_size=batch_size,
        num_workers=num_loaders,
        pin_memory=True)

    Replace the following placeholders with your actual values:

    • endpoint: The OSS endpoint.

    • bucket: The Bucket name.

    • auth: The Authentication object.

    • index_file: The path to the index file.

    Note

    In this example, the index file uses this format: commas (,) separate samples, and colons (:) separate the sample path from its label.

  • Save or load a model

    You can use the oss2 Python API to save or load a PyTorch model. For more information on saving and loading models in PyTorch, see PyTorch.

    • Save a model

      from io import BytesIO
      import torch
      import oss2
      # Specify the Bucket name.
      bucket_name = "<your_bucket_name>"
      bucket = oss2.Bucket(auth, endpoint, bucket_name)
      buffer = BytesIO()
      torch.save(model.state_dict(), buffer)
      bucket.put_object("<your_model_path>", buffer.getvalue())

      Replace the following placeholders with your actual values:

      • auth: The Authentication object.

      • endpoint: The OSS endpoint.

      • <your_bucket_name>: The OSS Bucket name, without the oss:// prefix.

      • <your_model_path>: The destination path for the model within the bucket.

    • Load a model

      from io import BytesIO
      import torch
      import oss2
      bucket_name = "<your_bucket_name>"
      bucket = oss2.Bucket(auth, endpoint, bucket_name)
      buffer = BytesIO(bucket.get_object("<your_model_path>").read())
      model.load_state_dict(torch.load(buffer))

      Replace the following placeholders with your actual values:

      • auth: The Authentication object.

      • endpoint: The OSS endpoint.

      • <your_bucket_name>: The OSS Bucket name, without the oss:// prefix.

      • <your_model_path>: The path of the model to load from the bucket.