All Products
Search
Document Center

Object Storage Service:Achieve 100 Gbps internal bandwidth with OSS

Last Updated:Mar 27, 2026

In select regions, OSS supports an internal network bandwidth of up to 100 Gbps for a single account. This topic describes how to maximize the 100 Gbps bandwidth of OSS. It covers key considerations, a practical test using Go, and techniques to improve download performance with common tools.

Scenarios

Business scenarios involving large amounts of data, high concurrent access, or extremely high real-time requirements typically require up to 100 Gbit/s of internal network bandwidth. Examples:

  • Big data analysis and computing: You need to read a large amount of data, such as reading terabytes or even petabytes of data in a single task, which requires high real-time performance. To improve computing efficiency, the data reading speed must be fast enough to avoid performance bottlenecks.

  • AI training dataset loading: The training dataset may contain petabytes of data, which requires high throughput. When multiple training nodes load data at the same time, the total bandwidth may reach 100 Gbit/s.

  • Data backup and restoration: If the amount of data is large, backup and restoration operations, such as full data backups, involve petabytes of data. If you need to complete the backup and restoration operations within a short period of time, the bandwidth may reach 100 Gbit/s.

  • High-performance scientific computing: A scientific computing task may process petabytes of data or even larger datasets. Multiple research teams may access the same data at the same time, resulting in high concurrent access and high bandwidth requirements. To support real-time analysis and team collaboration, data transmission must be accelerated.

Key points

To use up to 100 Gbit/s of bandwidth, you must first select an appropriate Elastic Compute Service (ECS) instance type to set the upper bandwidth limit. Second, if you want to store data on disks, you must select high-performance disks to improve efficiency. In addition, we recommend that you access data over a Virtual Private Cloud (VPC) to improve access speed. If you require concurrent downloads, you can use appropriate optimization techniques to leverage the download bandwidth.

Network receiving capability

If you installed the OSS client on an ECS instance, the data download speed is limited by the network speed of the ECS instance. The Alibaba Cloud model with the strongest network capability can provide 160 Gbit/s of bandwidth. If you deploy multiple instances in a cluster, the total bandwidth can reach 100 Gbit/s when the client is concurrently accessed. If you deploy only a single instance, we recommend that you use network-enhanced instance families or instance families with high clock speeds. The latter performs better when they receive a large number of data packets.

Note

Take note that a single elastic network interface (ENI) supports up to 100 Gbit/s of bandwidth based on the instance specifications. If the bandwidth of a single instance exceeds 100 Gbit/s, ECS requires you to bind multiple ENIs. For more information, see Elastic Network Interface.

Disk I/O

After the data is downloaded to the local computer, if you want to store the data on disks, the download speed is limited by the disk performance. By default, tools such as ossutil and ossfs store data on disks. In this case, you can use high-performance disks or memory disks to improve the download speed, as shown in the following figure.

Note

Although an Enterprise SSD (ESSD) can achieve a throughput of 32 Gbit/s, a memory disk has a significantly better performance. To reach the maximum download bandwidth, do not store data on disks. For more information about how to use ossfs to prevent data from being stored on disks and improve data read performance, see Performance optimization in read-only scenarios.

image

Use a VPC

The Alibaba Cloud internal network is optimized for network data requests. When you use a VPC endpoint, the network is more stable than the Internet. To reach the maximum download bandwidth, you must use a VPC.

Your ECS instances run in your VPC. OSS provides a unified internal domain name that can be accessed by any customer over a VPC. Example: oss-cn-beijing-internal.aliyuncs.com. The data flow between ECS and OSS passes through Server Load Balancer (SLB), and requests are sent to the backend distributed cluster. This evenly distributes data requests across the cluster and grants OSS powerful high-concurrency processing capabilities.

image

Concurrent download

OSS uses the HTTP protocol to transmit data. Due to the performance limitations of a single HTTP request, concurrent download is used to accelerate data download. For example, you can split an object into multiple ranges so that each request accesses only one range to reach the maximum download bandwidth. You are charged based on the number of API operations. A greater number of ranges mean a greater number of API operations, and a greater number of ranges do not guarantee the peak download speed during the single-stream data downloads. For more information about how to use common tools for concurrent download, see Optimize concurrent download by using common tools.

image

Use cases

To test the download capability of the maximum bandwidth, we built a Go language test program to download a 100 GB binary bin object from OSS. We designed a special data processing strategy to avoid storing data on disks, which means that the data is read once and then discarded. At the same time, the large object is split into multiple ranges, and the range size and the amount of concurrent data are configured as adjustable parameters. You can easily adjust these parameters to reach the maximum download bandwidth.

Test environment

Instance type

vCPU

Memory (GiB)

Network baseline/burst bandwidth (Gbit/s)

Packet forwarding rate (PPS)

Number of connections

NIC queues

ENI

Private IPv4/IPv6 addresses per ENI

Number of disks that can be attached

Disk baseline/burst IOPS

Disk baseline/burst bandwidth (Gbit/s)

ecs.hfg8i.32xlarge

128

512

100/none

30 million

4 million

64

15

50/50

64

900,000/none

64/none

Test procedure

  1. Configure environment variables.

    export OSS_ACCESS_KEY_ID=<ALIBABA_CLOUD_ACCESS_KEY_ID>
    export OSS_ACCESS_KEY_SECRET=<ALIBABA_CLOUD_ACCESS_KEY_SECRET>
  2. Sample code:

    package main
    
    import (
        "context"
        "flag"
        "fmt"
        "io"
        "log"
        "time"
        "github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
        "github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
    )
    
    // Define global variables to store command-line arguments.
    var (
        region      string // The region where the bucket is located.
        endpoint    string // The endpoint for accessing OSS.
        bucketName  string // The bucket name.
        objectName  string // The object name.
        chunkSize   int64  // The chunk size in bytes.
        prefetchNum int    // The number of chunks to prefetch.
    )
    
    // Initialize the function to parse command-line arguments.
    func init() {
        flag.StringVar(&region, "region", "", "The region where the bucket is located.")
        flag.StringVar(&endpoint, "endpoint", "", "The endpoint for accessing OSS.")
        flag.StringVar(&bucketName, "bucket", "", "The bucket name.")
        flag.StringVar(&objectName, "object", "", "The object name.")
        flag.Int64Var(&chunkSize, "chunk-size", 0, "The chunk size in bytes.")
        flag.IntVar(&prefetchNum, "prefetch-num", 0, "The number of chunks to prefetch.")
    }
    
    func main() {
        // Parse command-line arguments.
        flag.Parse()
    
        // Check if the required parameters are provided.
        if len(bucketName) == 0 {
            flag.PrintDefaults()
            log.Fatalf("invalid parameters, bucket name required")
        }
    
        if len(region) == 0 {
            flag.PrintDefaults()
            log.Fatalf("invalid parameters, region required")
        }
    
        // Configure the OSS client.
        cfg := oss.LoadDefaultConfig().
            WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()). // Use credentials from environment variables.
            WithRegion(region) // Set the region.
    
        // If a custom endpoint is provided, set it.
        if len(endpoint) > 0 {
            cfg.WithEndpoint(endpoint)
        }
    
        // Create the OSS client.
        client := oss.NewClient(cfg)
    
        // Open the OSS object.
        f, err := client.OpenFile(context.TODO(), bucketName, objectName, func(oo *oss.OpenOptions) {
            oo.EnablePrefetch = true      // Enable prefetching.
            oo.ChunkSize = chunkSize      // Set the chunk size.
            oo.PrefetchNum = prefetchNum  // Set the number of chunks to prefetch.
            oo.PrefetchThreshold = int64(0) // Set the prefetch threshold.
        })
    
        if err != nil {
            log.Fatalf("open fail, err:%v", err)
        }
    
        // Record the start time.
        startTick := time.Now().UnixNano() / 1000 / 1000
    
        // Read the file content and discard it to test read speed.
        written, err := io.Copy(io.Discard, f)
    
        // Record the end time.
        endTick := time.Now().UnixNano() / 1000 / 1000
    
        if err != nil {
            log.Fatalf("copy fail, err:%v", err)
        }
    
        // Calculate the read speed in MiB/s.
        speed := float64(written/1024/1024) / (float64(endTick-startTick) / 1000)
    
        // Print the average read speed.
        fmt.Printf("average speed:%.2f(MiB/s)\n", speed)
    }
  3. Start the test program.

    go run down_object.go -bucket yourbucket -endpoint oss-cn-hangzhou-internal.aliyuncs.com  -object 100GB.file -region cn-hangzhou -chunk-size 419430400 -prefetch-num 256

Test conclusions

In the preceding process, we adjust the concurrency and chunk size to observe changes in the download duration and peak download bandwidth. In general, we recommend that you set the concurrency to 1 to 4 times the number of the cores and set the chunk size to the FileSize/Concurrency value. The chunk size cannot be less than 2 MB. The shortest download duration can be achieved and the peak download bandwidth reaches 100 Gbit/s based on the parameter configurations.

No.

Concurrency

Chunk size (MB)

Peak bandwidth (Gbit/s)

E2E (s)

1

128

800

100

16.321

2

256

400

100

14.881

3

512

200

100

15.349

4

1024

100

100

19.129

Optimize concurrent downloads by using common tools

The following section describes the methods for optimizing the download performance of OSS objects by using common tools.

ossutil

  • Parameter description

    Parameter

    Description

    --bigfile-threshold

    The object size threshold for using resumable download. Default value: 104857600 (100 MB). Valid values: 0 to 9223372036854775807. Unit: bytes.

    --range

    The byte range of the object to download. Bytes are numbered starting from 0.

    • You can specify a range. For example, 3-9 indicates a range from byte 3 to byte 9, which includes byte 3 and byte 9.

    • You can specify the range from which the download starts. For example, 3- indicates a range from byte 3 to the end of the object, which includes byte 3.

    • You can specify the range from which the download ends. For example, -9 indicates a range from byte 0 to byte 9, which includes byte 9.

    --parallel

    The number of concurrent operations to perform on a single object. Valid values: 1 to 10000. By default, ossutil automatically sets a value for this option based on the operation type and object size.

    --part-size

    The part size. Unit: bytes. Valid values: 2097152 to 16777216 (2 to 16 MB). In most cases, if the number of CPU cores is large, you can set a small part size. If the number of CPU cores is small, you can increase the part size appropriately.

  • Example

    The following command downloads the 100GB.file object to /dev/shm with a concurrency of 256 and a part size of 468,435,456 bytes.

    time ossutil --parallel=256 --part-size=468435456 --endpoint=oss-cn-hangzhou-internal.aliyuncs.com cp oss://cache-test/EcsTest/100GB.file /dev/shm/100GB.file
  • Performance description

    The average end-to-end speed is 2.94 GB/s (about 24 Gbit/s).

    image

ossfs

  • Parameter description

    Parameter

    Description

    parallel_count

    The number of parts that can be concurrently downloaded when multipart upload is used to upload a large object. Default value: 5.

    multipart_size

    The part size in MB when multipart upload is used to upload data. Default value: 10. This parameter limits the maximum size of the object that you can upload. When multipart upload is used, the maximum number of parts that an object can be split into is 10,000. By default, the maximum size of the object that can be uploaded is 100 GB. You can change the value of this option to upload a larger object.

    direct_read

    Enables the direct read mode. By default, ossfs uses the disk storage capacity to store temporary data uploaded or downloaded. You can specify this option to directly read data from OSS instead of the local disk. This option is disabled by default. You can use -odirect_read to enable the direct read mode.

    Note

    Direct reads are interrupted when a write, rename, or truncate operation is performed on the object being read. In this case, the object exits the direct read mode and must be reopened.

    direct_read_prefetch_chunks

    The number of chunks that can be prefetched to the memory. This option can be used to optimize sequential read performance. Default value: 32.

    This option takes effect only if the -odirect_read option is specified.

    direct_read_chunk_size

    The amount of data that can be directly read from OSS in a single read request. Unit: MB. Default value: 4. Valid values: 1 to 32.

    This option takes effect only if the -odirect_read option is specified.

    ensure_diskfree

    The size of the reserved disk capacity, which is used to prevent the disk capacity from being fully occupied and affecting other applications to write data. By default, the disk capacity is not reserved. Unit: MB.

    For example, to have ossfs reserve 1,024 MB of available disk space, add -oensure_diskfree=1024 when mounting.

    free_space_ratio

    The minimum remaining disk space ratio that you want to reserve. For example, if the disk space is 50 GB and you set -o free_space_ratio to 20, 10 GB (50 GB x 20% = 10 GB) is reserved.

    max_stat_cache_size

    The maximum number of files whose metadata can be stored in metadata caches. By default, the metadata of up to 100,000 objects can be cached. If a directory contains a large number of objects, you can modify this option to improve the object listing performance of the Is command. To disable metadata caching, you can set this option to 0.

    stat_cache_expire

    The validity period of the object metadata cache. Unit: seconds. Default value: 900.

    readdir_optimize

    Specifies whether to use cache optimization. Default value: false.

    After adding this mount option, ossfs does not send a HeadObject request to get file item metadata, such as gid and uid, during ls operations. A HeadObject request is sent only when a file with a size of 0 is accessed. However, a certain number of HeadObject requests may still be generated due to permission checks and other reasons. Enable this option based on your application's characteristics. To enable it, add -oreaddir_optimize when mounting.

  • Examples

    Note

    We recommend that you modify the parameters based on the CPU processing capability and network bandwidth.

    • Default read mode: Mount a bucket named cache-test to the /mnt/cache-test folder on your computer, set the number of parts that can be concurrently downloaded to 128, and set each part to 32 MB in size.

      ossfs cache-test /mnt/cache-test -ourl=http://oss-cn-hangzhou-internal.aliyuncs.com  -oparallel_count=128 -omultipart_size=32 
    • Direct read mode: Mount a bucket named cache-test to the /mnt/cache-test folder on your computer, enable direct read mode, set the number of prefetched chunks to 128, and set each chunk to 32 MB in size.

      ossfs cache-test /mnt/cache-test -ourl=http://oss-cn-hangzhou-internal.aliyuncs.com  -odirect_read -odirect_read_prefetch_chunks=128 -odirect_read_chunk_size=32
  • Performance description

    In default read mode, ossfs downloads data to the local disk. In direct read mode, data is stored in the memory, which accelerates access, but consumes more memory.

    Note

    In direct read mode, ossfs manages downloaded data in chunks. The size of each chunk is 4 MB by default and can be changed by using the direct_read_chunk_size parameter. In the memory, ossfs retains the data within the following range: [The current chunk - 1, The current chunk + direct_read_prefetch_chunks]. Determine whether to use the direct read mode based on the memory size, especially the page cache. In most cases, the direct read mode is suitable for scenarios in which the page cache capacity is insufficient. For example, if the total memory of your computer is 16 GB and the page cache can consume 6 GB, you can use the direct read mode when the object size exceeds 6 GB. For more information, see Direct read mode.

    Mode

    Concurrency

    Chunk size (MB)

    Peak bandwidth (Gbit/s)

    E2E bandwidth (Gbit/s)

    E2E duration (s)

    Default read mode

    128

    32

    24

    11.3

    72.01

    Direct read mode

    128

    32

    24

    16.1

    50.9

    Optimization for model object reading:

    Model object size (GB)

    Default read mode (Duration: s; maximum memory: 6 GB)

    Hybrid read mode (Duration: s)

    Hybrid read mode (Duration: s; data retention: [-32, +32])

    1

    8.19

    8.20

    8.56

    2.4

    24.5

    20.43

    20.02

    5

    26.5

    22.3

    19.89

    5.5

    22.8

    23.1

    22.98

    8.5

    106.0

    36.6

    36.00

    12.6

    154.6

    42.1

    41.9

Python SDK

By default, the Python SDK accesses storage serially. You can significantly improve bandwidth by using Python's concurrency libraries for multithreaded downloads.

  • Prerequisites

    The size of model objects is about 5.6 GB, and the specification of the test machine is ECS48vCPU, with 16 Gbit/s of bandwidth and 180 GB of memory.

  • Example

    import oss2
    import time
    import os
    import threading
    from io import BytesIO
    
    # Configure parameters.
    OSS_CONFIG = {
        "bucket_endpoint": os.environ.get('OSS_BUCKET_ENDPOINT', 'oss-cn-hangzhou-internal.aliyuncs.com'), # The default endpoint of the region in which the bucket is located.
        "bucket_name": os.environ.get('OSS_BUCKET_NAME', 'bucket_name'), # The name of the bucket.
        "access_key_id": os.environ['ACCESS_KEY_ID], # The AccessKey ID of the RAM user.
        "access_key_secret": os.environ['ACCESS_KEY_SECRET] # The AccessKey secret of the RAM user.
    }
    
    # Initialize the bucket.
    def __bucket__():
        auth = oss2.Auth(OSS_CONFIG["access_key_id"], OSS_CONFIG["access_key_secret"])
        return oss2.Bucket(
            auth, 
            OSS_CONFIG["bucket_endpoint"], 
            OSS_CONFIG["bucket_name"], 
            enable_crc=False
        )
    
    # Query the size of the model objects.
    def __get_object_size(object_name):
        simplifiedmeta = __bucket__().get_object_meta(object_name)
        return int(simplifiedmeta.headers['Content-Length'])
    
    # Query the last modified time of the model objects.
    def get_remote_model_mmtime(model_name):
        return __bucket__().head_object(model_name).last_modified
    
    # List the model objects.
    def list_remote_models(ext_filter=('.ckpt',)): # Specify the default extension filter.
        dir_prefix = ""
        output = []
        
        for obj in oss2.ObjectIteratorV2(
            __bucket__(),
            prefix=dir_prefix,
            delimiter='/',
            start_after=dir_prefix,
            fetch_owner=False
        ):
            if not obj.is_prefix():
                _, ext = os.path.splitext(obj.key)
                if ext.lower() in ext_filter:
                    output.append(obj.key)
        return output
    
    # Download the multipart download thread function by chunk.
    def __range_get(object_name, buffer, offset, start, end, read_chunk_size, progress_callback, total_bytes):
        chunk_size = int(read_chunk_size)
        with __bucket__().get_object(object_name, byte_range=(start, end)) as object_stream:
            s = start
            while True:
                chunk = object_stream.read(chunk_size)
                if not chunk:
                    break
                buffer.seek(s - offset)
                buffer.write(chunk)
                s += len(chunk)
                # Calculate the number of downloaded bytes and call the download progress callback function.
                if progress_callback:
                    progress_callback(s - start, total_bytes)
    
    # Read the model objects and specify optional progress callback parameters.
    def read_remote_model(
        checkpoint_file, 
        start=0, 
        size=-1, 
        read_chunk_size=2*1024*1024,  # 2MB
        part_size=256*1024*1024,      # 256MB
        progress_callback=None # The download progress callback.
    ):
        time_start = time.time()
        buffer = BytesIO()
        obj_size = __get_object_size(checkpoint_file)
        
        end = (obj_size if size == -1 else start + size) - 1
        s = start
        tasks = []
    
        # Calculate the progress.
        total_bytes = end - start + 1
        downloaded_bytes = 0
    
        while s <= end:
            current_end = min(s + part_size - 1, end)
            task = threading.Thread(
                target=__range_get,
                args=(checkpoint_file, buffer, start, s, current_end, read_chunk_size, progress_callback, total_bytes)
            )
            tasks.append(task)
            task.start()
            s += part_size
    
        for task in tasks:
            task.join()
    
        time_end = time.time()
        # Display the total download of the objects.
        print(f"Downloaded {checkpoint_file} in {time_end - time_start:.2f} seconds.")
    
         # Calculate and display the size of the downloaded objects. Unit: GB.
        file_size_gb = obj_size / (1024 * 1024 * 1024)
        print(f"Total downloaded file size: {file_size_gb:.2f} GB")
    
        buffer.seek(0)
        return buffer
    
    # Specify the download progress callback function.
    def show_progress(downloaded, total):
        progress = (downloaded / total) * 100
        print(f"Progress: {progress:.2f}%", end="\r")
    
    # Call the download progress callback function.
    if __name__ == "__main__":
        # Use the list_remote_models method to list the model objects.
        models = list_remote_models()
        print("Remote models:", models)
    
        if models:
            # Download the first model object.
            first_model = models[0]
            buffer = read_remote_model(first_model, progress_callback=show_progress)
            print(f"\nDownloaded {first_model} to buffer.")
  • Conclusion

    Version

    OSS duration (s)

    OSS average bandwidth (Mbit/s)

    OSS peak bandwidth (Mbit/s)

    OSS SDK for Python (serial mode)

    109

    53

    100

    OSS SDK for Python (concurrent download mode)

    11.1

    516

    600

    The test results clearly show that compared to the serial mode of the OSS Python SDK, the concurrent download mode takes only about 10.2% of the time, the average bandwidth is about 9.7 times higher, and the peak bandwidth is 6 times higher. Therefore, using concurrent downloads with the Python SDK for AI model training significantly improves efficiency and bandwidth.

  • Other solutions

    In addition to OSS SDK for Python, Alibaba Cloud provides a Python library named osstorchconnector, which is mainly used to efficiently access and store OSS data in PyTorch training tasks. The library has completed a secondary encapsulation of concurrent downloads for users. The following table describes the test results on AI model loading by using osstorchconnector. For more information, see 性能测试.

    Item

    Description

    Test scenario

    Model loading and chat Q&A

    Model name

    gpt3-finnish-3B

    Model size

    11 GB

    Scenario

    Chat Q&A

    Hardware configurations

    High-specification ECS instance: 96 vCPUs, 384 GiB of memory, and 30 Gbit/s of internal bandwidth

    Conclusion

    The average bandwidth is approximately 10 Gbit/s. OSS can support 10 tasks to load models at the same time.