All Products
Search
Document Center

Cloud Parallel File Storage:CPFS for Lingjun dataflow (invitational preview)

Last Updated:Mar 28, 2025

The dataflow feature allows Cloud Parallel File Storage (CPFS) for Lingjun file systems to exchange data with Object Storage Service (OSS) buckets. You can create dataflows and dataflow tasks to transmit data between CPFS for Lingjun file systems and OSS buckets at a high speed.

Feature overview

CPFS for Lingjun allows you to use the dataflow feature to perform the following operations:

  • Transmit data at the account level

    You can create a dataflow to transmit data between CPFS for Lingjun file systems and OSS buckets within the same account or across accounts.

  • Transmit data at the directory level

    You can create a dataflow to map a subdirectory of a CPFS for Lingjun file system to a prefix in an OSS bucket. This allows you to implement fine-grained permission management and flexible data transmission.

  • Import and export data

    You can create batch or streaming tasks to import and export data between CPFS for Lingjun and OSS. Batch tasks are suitable for preloading datasets before computing tasks start. Streaming tasks are suitable for continuously reading and writing multiple checkpoint files during computing tasks for model training. If a dataflow task fails, you can identify the cause of failure based on the task report.

    Warning

    CPFS for Lingjun exports the File Modification timestamps attribute to the custom metadata of an OSS bucket. The metadata field is named x-oss-meta-alihbr-sync-mtime and cannot be deleted or modified. Otherwise, an error occurs when you access the File Modification timestamps attribute of the file system.

Limits

  • Limits on dataflows

    • CPFS for Lingjun V2.4.0 and later support dataflows within the same account. CPFS for Lingjun V2.6.0 and later support dataflows across accounts.

    • A maximum of 10 dataflows can be created for a CPFS for Lingjun file system.

    • A file path in a CPFS for Lingjun file system can be associated with only one OSS bucket.

    • You cannot create dataflows between a CPFS for Lingjun file system and an OSS bucket that resides in another region.

  • Limits on path, file name and directory name

    • You cannot rename a non-empty directory in a path that is associated with a dataflow. Otherwise, the Permission Denied error message or an error message indicating that the directory is not empty is returned.

    • Proceed with caution when you use special characters in the names of directories and files.

      • The following characters are supported: letters, digits, exclamation points (!), hyphens (-), underscores (_), periods (.), asterisks (*), and parentheses (()).

      • The following characters are not supported: double periods (..), backslash (\), consecutive slashes (\\), and slash (/).

    • A path can be up to 1,023 characters in length.

  • Limits on dataflow tasks

    • Only CPFS for Lingjun V2.6.0 and later support streaming tasks. In addition, you can use streaming tasks only by calling API operations.

    • A maximum of 4 batch tasks can be run at the same time for a dataflow. The streaming task is unlimited.

    • Limits on data import

      • After a symbolic link is imported to CPFS for Lingjun, the symbolic link is converted into a regular file that contains no symbolic link information.

      • If an OSS bucket contains data of multiple versions, only data of the latest version is imported.

      • The name of a file or a subdirectory can be up to 255 bytes in length.

    • Limits on data export

      • After a symbolic link is synchronized to OSS, the file that the symbolic link points to is not synchronized to OSS. In this case, the symbolic link is converted into a regular object that contains no data.

      • Hard links can be synchronized to OSS only as regular files that contain no link information.

      • After a file of the Socket, Device, or Pipe type is exported to an OSS bucket, the file is converted into a regular object that contains no data.

      • A directory path can be up to 1,023 characters in length.

Performance metrics

Operation

Metric

Description

Data import

Throughput for files whose size is larger than 1 GB

  • The maximum throughput for the import of a single file is 5 GB/s.

  • The maximum throughput for the import of multiple files is 100 GB/s.

    Note

    The actual throughput is limited by the OSS bandwidth and the throughput of CPFS for Lingjun. The throughput is also affected by the file size, number of files, and amount of data. For more information about the bandwidth limits of OSS, see the Bandwidth section of the "Limits and performance metrics" topic. For more information about the throughput of CPFS for Lingjun, see Storage types.

IOPS for files whose size is less than 1 GB

The IOPS for the import of one or more directories is 1,000.

Data export

Throughput for files whose size is larger than 1 GB

  • The maximum throughput for the export of a single file is 5 GB/s.

  • The maximum throughput for the export of multiple files is 100 GB/s.

    Note

    The actual throughput is limited by the OSS bandwidth and the throughput of CPFS for Lingjun. The throughput is also affected by the file size, number of files, and amount of data. For more information about the bandwidth limits of OSS, see the Bandwidth section of the "Limits and performance metrics" topic. For more information about the throughput of CPFS for Lingjun, see Storage types.

IOPS for files whose size is less than 1 GB

The IOPS for the export of one or more directories is 1,200.

Billing

The dataflow feature of CPFS for Lingjun is in public preview and is free of charge.

Procedure

  1. Create a dataflow.

  2. Create a batch task or a streaming task.