All Products
Search
Document Center

Container Service for Kubernetes:Best practices for OSS persistent volume performance optimization

Last Updated:Oct 28, 2025

If the read/write performance of your Object Storage Service (OSS) persistent volume, such as latency or throughput, does not meet expectations, you can use the troubleshooting steps and optimization practices in this topic to systematically identify and resolve the issues.

OSS persistent volumes are best suited for scenarios that involve sequential read/write operations and require high bandwidth. For scenarios that require high-concurrency random writes or depend heavily on extended file attributes, such as file system owner and mode, consider using other persistent volume types, such as NAS or CPFS. For more information, see Overview of supported persistent volumes.

Client implementation principle

The flat structure of OSS is fundamentally different from the tree structure of a file system. To allow applications in containers to access OSS through standard file system interfaces (POSIX-based APIs), such as read, write, and stat, the CSI plug-in runs a client on the application node. This client supports ossfs 1.0, ossfs 2.0, and strmvol. The client acts as a bidirectional translator. It converts file operations from the application, such as write, into HTTP requests to OSS objects, such as PUT Object. It then converts the OSS response back into a file system response. This process simulates a file system.

This conversion process involves network transmission and protocol overhead. Therefore, the performance of an OSS persistent volume is closely related to the client type, network conditions, node resources, and your application's data access mode.

Troubleshooting guide

When you troubleshoot performance issues, follow these steps:

This topic focuses on troubleshooting performance issues when you access OSS through POSIX-based APIs using CSI. If your application code uses the OSS software development kit (SDK) directly, see Best Practices for OSS Performance for optimization guidance.
  1. Confirm that the client meets expectations: Verify that the selected client and its performance baseline are suitable for your scenario.

  2. Check for configuration errors or external bottlenecks: These issues are usually unrelated to application logic and are part of a basic configuration and environment health check. An operations and maintenance (O&M) engineer can perform this check.

  3. Optimize application read/write logic and client parameters: This step requires a deep understanding of the application's data access mode. A developer should adjust the code or configuration.

Before you begin, make sure that your cluster's CSI component version is v1.33.1 or later. To upgrade the component, see Manage CSI components.

Confirm that the client meets expectations

Client selection

Compared to ossfs 1.0, ossfs 2.0 and strmvol persistent volumes offer significant performance improvements for sequential read/write operations and high-concurrency reads of small files. If your application does not perform random writes, we recommend that you use ossfs 2.0.

If you are unsure whether your application performs random writes, you can try to mount an ossfs 2.0 persistent volume in a test environment. If the application performs a random write operation, an EINVAL error is returned.

To maximize performance when you use the ossfs 2.0 client, chmod and chown operations do not report errors but have no effect. To set permissions for files or folders under the mount target, you can add mount parameters to the otherOpts field of the persistent volume (PV). For chmod, use -o file_mode=<permission_code> or -o dir_mode=<permission_code>. For chown, use -o gid= or -o uid=.

For detailed selection recommendations, see Client selection reference.

Performance expectations

As a network file system, the performance of an OSS persistent volume is affected by the network and protocol layers. A performance issue typically indicates a significant deviation from the provided benchmark results, not a performance difference when compared to a local file system. The baseline performance for different clients is as follows.

Check for configuration errors or external bottlenecks

An Internet endpoint is configured on the PV

When your cluster and OSS bucket are in the same region, you can use an internal endpoint to significantly reduce network latency.

  • Troubleshooting method: You can run the following command to check the OSS endpoint that is configured for the PV.

    kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeAttributes.url}'
  • Solution: If the output is an Internet endpoint, such as http://oss-<region-id>.aliyuncs.com, you must recreate the PV and use an internal endpoint, such as http://oss-<region-id>-internal.aliyuncs.com.

Throttling on the OSS server

OSS has limits on the total bandwidth and queries per second (QPS) for a single bucket. For services in some regions and in scenarios that involve high-concurrency access to OSS, such as workflows, bandwidth and QPS throttling may occur.

  • Troubleshooting method: In the OSS console, you can view the monitoring metrics for the OSS bucket to check whether metrics such as bandwidth usage and number of requests are approaching or exceeding the limits.

  • Solutions:

    • Change the region: The bandwidth and QPS limits for a single OSS bucket vary by region. If your application is still in the testing phase, you can consider moving it to another region.

    • Bandwidth bottleneck:

    • QPS (or IOPS) bottleneck: For most applications, high concurrency causes the bandwidth limit to be reached before the QPS limit. If only the QPS bottleneck is triggered, the cause might be frequent metadata retrieval, such as ls and stat, or frequent file opening and closing. In this case, you must check the client configuration and the application's read/write method. For optimization information, see Optimize application read/write logic and client parameters.

Internal bandwidth of the application node reaches the bottleneck

The client and the application pod run on the same node and share its network resources. In serverless computing power scenarios, they share the resources of a single ACS instance. Therefore, the node's network bandwidth limit constrains the maximum performance of the OSS persistent volume.

ossfs 1.0 persistent volume data caching is limited by the maximum disk throughput

To support full POSIX write operations and ensure data consistency for a single client, the ossfs 1.0 client caches some data to the disk by default when it accesses files. This caching operation occurs in the /tmp directory of the ossfs pod. This space corresponds to the disk on the node that is used for container runtime data. Therefore, the performance of ossfs 1.0 is directly limited by the input/output operations per second (IOPS) and throughput caps of this disk.

  • Troubleshooting method:

    The following method does not apply to ContainerOS nodes.
    1. Run the following command to find the ID of the container runtime data disk on the node where the application pod is running.

      kubectl get node <node-name> -o yaml | grep alibabacloud.com/data-disk-serial-id

      Based on the data-disk-serial-id in the output, the temporary space is located on the disk instance d-uf69tilfftjoa3qb****.

      alibabacloud.com/data-disk-serial-id: uf69tilfftjoa3qb****
      If the ID in the output is empty, go to the ECS console - Elastic Block Storage - Disks. Locate the disk based on the attached instance and query its monitoring information.
    2. Based on the instance ID, you can view the disk monitoring information to determine whether the disk has an IOPS or throughput bottleneck.

  • Solutions:

    • For read-only or sequential write scenarios: The ossfs 1.0 client cannot predict whether an open file will be randomly written to. Therefore, even a read-only operation can trigger data caching to the disk. We recommend that you switch these applications to a more suitable client, such as ossfs 2.0, or implement read/write splitting for the application.

    • For random write scenarios: The performance of random writes through a Filesystem in Userspace (FUSE) client, such as ossfs 1.0, is usually poor. We recommend that you modify the application to access OSS directly through the OSS SDK, or switch to a NAS persistent volume, which is better suited for this scenario.

High usage of the disk storing temporary data for ossfs 1.0 persistent volumes

If disk monitoring shows that the usage of the disk for ossfs 1.0 data caching is consistently high, especially in serverless computing power scenarios, the frequent eviction and rotation of temporary data can further degrade access performance.

  • Troubleshooting method:

    1. Follow the solution in ossfs 1.0 persistent volume data caching is limited by the maximum disk throughput to identify the disk that is used for container runtime data.

    2. You can check the resource usage of the disk through disk monitoring. If the usage remains at a stable high level, but the application itself writes a small amount of data to the disk, the performance issue may be caused by the rotation of temporary data by the ossfs 1.0 client.

  • Solutions:

    1. Check the application logic: Confirm whether the application code holds file descriptors for a long time (that is, keeps files open for extended periods). While a file descriptor is held, its related temporary data cannot be released. This can consume a large amount of local disk space.

    2. Scale out the local disk: If the application logic is correct and the current disk's maximum throughput meets requirements, you can consider increasing the disk capacity.

Short authentication TTL causes many RAM requests

When you use a Resource Access Management (RAM) role for authentication, the ossfs client (versions 1.0 and 2.0) obtains a token and records its expiration time on the first access to OSS. To ensure session continuity, the client obtains a new token and refreshes the expiration time 20 minutes before the current token expires.

Therefore, the maximum session duration (max_session_duration) for the RAM role must be longer than 20 minutes.

For example, if the session duration is 30 minutes, the client refreshes the token after 10 minutes of use (30 - 20 = 10), which does not affect normal throughput. If the session duration is less than 20 minutes, the client considers the token to be always about to expire. It then tries to obtain a new token before every request to OSS. This triggers many RAM requests and affects performance.

  • Troubleshooting method: The default client log level may not record frequent token refresh behavior. You must confirm that the RAM role used by the application is configured correctly.

  • Solution: See Set the maximum session duration for a RAM role to query and modify the maximum session duration for the RAM role. Make sure that the configuration is correct.

Optimize application read/write logic and client parameters

Optimize metadata caching (and avoid disabling it)

Metadata caching is a core mechanism for improving the performance of high-frequency operations, such as ls and stat. It can significantly reduce time-consuming network requests. If you disable or misconfigure it, the client may frequently request data from OSS to ensure data consistency, which creates a performance bottleneck.

  • Troubleshooting method:
    You can run the following command to check whether the client parameters configured for the PV include options to disable the cache.

    kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeAttributes.otherOpts}'
  • Solutions:

    • Avoid disabling the cache: If the output of the command includes max_stat_cache_size=0 (for ossfs 1.0) or close_to_open=true (for ossfs 2.0), and your scenario does not require strong consistency, you must recreate the persistent volume and remove these parameters.

    • Optimize cache configuration: If the data that your application reads is updated infrequently, you can increase the size and expiration time of the metadata cache to further improve performance.

Avoid maintaining extended object information (for ossfs 1.0 only)

File system metadata, such as mode, gid, and uid, are considered extended information for objects in OSS. The ossfs 1.0 client needs to make extra HTTP requests to retrieve this information. Although the client caches this metadata (including basic and extended properties) by default to improve data access performance, frequent metadata retrieval, especially for extended properties, remains a performance bottleneck.

Therefore, performance optimization for these scenarios focuses on two points:

  1. Minimize the number of metadata retrieval operations. For more information, see Optimize metadata caching (and avoid disabling it).

  2. Avoid requesting extended properties that have high overhead to reduce the time cost of each retrieval.

Solutions:

  • The application does not depend on file system metadata: We recommend that you set the readdir_optimize parameter to disable maintenance of extended information. The behavior of chmod, chown, stat, and other operations will then be the same as in ossfs 2.0.

  • The application only requires global permission configuration: For scenarios where non-root containers need permissions, we recommend that you set the readdir_optimize parameter. At the same time, you can use the gid, uid, and file_mode/dir_mode parameters to globally modify the default user or permissions of the file system.

  • The application requires fine-grained access policies: We recommend that you do not use file system permissions for access control. Instead, you can use the OSS server-side authentication system to implement path isolation. For example, you can create different RAM users or roles for different applications and use a bucket policy or RAM policy to restrict the subpaths that they can access. You should still set the readdir_optimize parameter to optimize performance.

Optimize the file write mode

The default write operation for OSS objects is overwrite. When the client handles file modifications, it must follow a read-then-write pattern. It first reads the complete object from OSS to the local machine. Then, after the file is closed, it re-uploads the entire file to the server. Therefore, in extreme cases of frequent file opening, writing, and closing, the application repeatedly downloads and uploads the full data.

Solutions:

  • Use batch writes instead of multiple writes: Avoid unnecessary multiple writes whenever possible. You can prepare the content in memory and then call a single write operation to complete it. Note that some encapsulated write functions, such as Java's FileUtils.write, already include a complete open, write, and close flow. If you call such functions repeatedly in a loop, performance issues will occur.

  • Use temporary files for frequent modifications: If a file needs to be opened and modified multiple times during a data processing task, you can first copy it from the OSS mount target to a temporary path in the container, such as /tmp. Perform all modifications on the local temporary file. Finally, you can copy the final version to the OSS mount target in a single operation to upload it.

  • Enable the appendable feature for append-write scenarios: If your application scenario involves only frequent, small append writes, such as writing logs, you can consider using ossfs 2.0 and enabling the enable_appendable_object=true parameter. After you enable this parameter, the client uses OSS Appendable objects. When appending data, the client does not need to download and re-upload the entire file each time. However, take note of the following:

    • If the object file already exists but is not an Appendable object, you cannot use this solution to append data to it.

    • enable_appendable_object is a global parameter for the entire persistent volume. After it is enabled, all write operations on this volume are performed through the AppendObject API. This affects the performance of overwriting large files but not reading them. We recommend that you create a dedicated persistent volume for this type of append-write application.

Optimize the concurrent read mode (especially for model loading scenarios)

When multiple processes concurrently read data from different locations within a single large file, the access mode is similar to a random read. This can cause the client to redundantly read data, which leads to abnormally high bandwidth usage. For example, in AI model loading scenarios, mainstream model frameworks often have multiple concurrent processes that read the same model parameter file. This can trigger a performance bottleneck.

If you find that the performance of ossfs 2.0 in such scenarios is worse than that of ossfs 1.0, or if its performance differs significantly from the ossfs 2.0 client stress testing performance, you can confirm that this is the issue.

Solutions:

  • Data prefetching (recommended): Before the application pod starts, you can use a script to prefetch the data. The core principle is to use high-bandwidth concurrent sequential reads to load the entire model file into the node's memory (Page Cache) in advance. After prefetching is complete, the application's random reads directly hit the memory, which bypasses the performance bottleneck.

    You can use the following standalone prefetching script, which concurrently reads the file content to /dev/null:

    #!/bin/bash
    # Set the maximum concurrency.
    MAX_JOBS=4
    
    # Check if a folder is provided as an argument.
    if [ -z "$1" ]; then
        echo "Usage: $0 <folder_path>"
        exit 1
    fi
    
    DIR="$1"
    
    # Check if the folder exists.
    if [ ! -d "$DIR" ]; then
        echo "Error: '$DIR' is not a valid folder."
        exit 1
    fi
    
    # Use find to locate all regular files and use xargs to concurrently run cat to /dev/null.
    find "$DIR" -type f -print0 | xargs -0 -I {} -P "$MAX_JOBS" sh -c 'cat "{}" > /dev/null'
    
    echo "Finished reading all files."

    You can simplify the core logic of this script and run it in the pod's lifecycle.postStart hook.

    # ...
    spec:
      containers:
      - image: your-image
        env:
        # Define the folder path where the model files are located. If not configured or the folder does not exist, the script skips data prefetching.
        - name: MODEL_DIR
          value: /where/is/your/model/
        # Define the concurrency for the prefetching script. The default is 4.
        - name: PRELOADER_CONC
          value: "8"
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "CONC=${PRELOADER_CONC:-4}; if [ -d \"$MODEL_DIR\" ]; then find \"$MODEL_DIR\" -type f -print0 | xargs -0 -I {} -P \"$CONC\" sh -c 'cat \"{}\" > /dev/null'; fi"]
    # ...
  • Refactor the application logic: You can evaluate and refactor the application logic to switch from multi-process concurrent reads within a single file to concurrent reads of multiple different files. This change can leverage the high throughput advantage of OSS.

Recommendations for production environments

  • Client selection: Prioritize using ossfs 2.0 persistent volumes unless your application has a random write requirement that cannot be modified.

  • Prioritize monitoring: Routinely monitor the bandwidth and QPS of the OSS bucket, the network bandwidth of the node, and the local disk I/O (for ossfs 1.0). This helps you quickly locate performance issues when they occur.

  • Avoid disabling the cache: Do not disable metadata caching under any circumstances unless your application requires strong consistency.

  • Data prefetching: For scenarios that involve reading large amounts of data, such as AI inference, you can consider using a data prefetching solution if you have high requirements for startup latency. Note that this pre-occupies node resources and bandwidth.

  • Cost considerations: The OSS persistent volume feature and its related components are free of charge. However, you are charged for the underlying OSS resources that you use, such as storage capacity, API requests, and outbound traffic. The improper configurations, such as disabling the cache or using an Internet endpoint, and application access modes mentioned in this topic can cause a surge in API requests and traffic, which increases your OSS usage costs.

FAQ

Why is an OSS persistent volume much slower than a node's local disk?

This is normal. An OSS persistent volume is a network file system. All operations must go through network and protocol conversion. Its latency and throughput are different from those of directly attached local storage. Performance evaluation should be based on the corresponding benchmarks.

Why is my application's read performance acceptable, but its write performance is extremely poor?

This is usually related to the overwrite-only nature of OSS objects. When the client handles file modifications, it must first download, then modify, and finally upload the entire file. This process has high overhead. For optimization information, see Optimize the file write mode.

How can I determine if my application is performing random writes?

In a test environment, you can switch the persistent volume type that is mounted by the application from ossfs 1.0 to ossfs 2.0. If the application encounters an EINVAL (Invalid argument) error related to file writing while running, you can determine that it is performing random write operations that are not supported by ossfs 2.0.