All Products
Search
Document Center

Cloud Parallel File Storage:Manage dataflow tasks

Last Updated:Nov 28, 2025

This topic describes how to create and manage Cloud Parallel File Storage (CPFS) dataflow tasks and view task reports in the NAS console.

Prerequisites

Task description

  • Task types

    • Based on the data operations they perform, tasks are classified into three types: Import, Export, and Evict.

      Type

      Description

      Import

      Imports data from a source storage to a CPFS file system.

      • Import type: You can import two types of data: Metadata and Data (MetaAndData).

        • Metadata: Imports only the metadata of files.

        • Data: Imports both the metadata and data of files.

      • Import path: The path of a file in the source OSS bucket. A dataflow task imports a file to the fileset based on its path in the OSS bucket.

      • If an imported file or directory does not have POSIX metadata attributes, the default owner is root and the default permission is 0770.

      Export

      Exports a specified directory or file from a dataflow fileset to an OSS bucket.

      • Export path: The path of a file or directory in the CPFS file system. A dataflow task exports a file to the bucket based on its path in the fileset.

      • Empty directories, hard links, and symbolic links cannot be exported to OSS.

      • Metadata export: You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported.

        Warning

        CPFS exports metadata to the custom metadata of the OSS bucket. The metadata is named x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, errors may occur with the file system metadata.

      Evict

      Releases the data of a file on a CPFS file system. After eviction, only the file's metadata is retained on the CPFS file system. You can still see the file, but its data blocks are cleared and no longer occupy storage space on the CPFS file system. When you access the file data, it is loaded from the source storage, such as OSS, on demand.

      Note

      Before you evict a file, make sure that the latest version of the file exists in the OSS bucket.

    • Based on the initiator, tasks are classified as user tasks or system tasks.

      Type

      Description

      User task

      A dataflow task created in the console or by calling the CreateDataFlowTask API operation.

      • You can query user tasks in the Dataflow > Task Management panel of the console.

      • When a user task is complete, a task report is generated and saved to the .dataflow_report directory of the CPFS file system.

      System task

      A task that is automatically generated by CPFS after you enable Automatic Metadata Update. This task synchronizes updated file metadata from an OSS bucket to CPFS.

      • System tasks are automatically generated at the specified Metadata Refresh Interval (minutes) to synchronize updated file metadata from the OSS bucket.

      • You can query system tasks in the Dataflow > Task Management panel of the console.

      • System tasks do not generate task reports.

  • Task execution scope

    The scope of a task can be a directory or a specified file list (EntryList). If the scope is a directory, the task traverses all files in the directory tree.

Create a dataflow task

  1. Log on to the NAS console.

  2. In the left-side navigation pane, choose File System > File System List.

  3. In the top navigation bar, select a region.

  4. On the File System List page, click the name of the file system.

  5. On the details page of the file system, click Dataflow.

  6. On the Dataflow tab, find the target dataflow and click Task Management in the Actions column.

  7. In the Task Management panel, click Create Job.

  8. In the Create Job panel, configure the parameters for the task.

    Import data

    Configuration item

    Description

    Data Type

    Select the type of data to import.

    • Data: Imports both the data and metadata of files.

    • Metadata: Imports only the metadata of files.

      If you import only file metadata, you can query only the filename. When you access the data, it is loaded from the source on demand.

    Specify OSS Object Prefix Subdirectory

    Select the directory or file list for the dataflow task.

    • Import Objects from OSS: The specified OSS directory must start and end with a forward slash (/).

    • Import Listed Objects: Each line in the file represents the path of a file in the OSS bucket. Directories are not supported.

    Export data

    • Empty directories, hard links, and symbolic links cannot be exported to an OSS bucket.

    • You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported.

    • CPFS exports metadata to the custom metadata of the OSS bucket. The metadata is named x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, file system metadata errors may occur.

      Configuration item

      Description

      Specify CPFS Subdirectory

      Select the directory or file list for the dataflow task.

      • Export Files from CPFS: The directory must start and end with a forward slash (/) and must be the path of the directory in the CPFS file system.

      • Export Listed Files: Each line in the file represents the path of a file in the CPFS file system. Directories are not supported.

    Delete data

    Configuration item

    Description

    Delete File

    Select the directory or file list for the dataflow task.

    • Delete Files from CPFS: The directory must start and end with a forward slash (/).

    • Delete Listed Files: Each line in the file represents the path of a file in the CPFS file system. Directories are not supported.

  9. Review the configuration and click OK.

    Note

    When a specified dataflow task is running, the automatic data update task for the dataflow is suspended.

View a task report

  1. Log on to the NAS console.

  2. In the left-side navigation pane, choose File System > File System List.

  3. In the top navigation bar, select a region.

  4. On the File System List page, click the name of the file system.

  5. On the details page of the file system, click Dataflow.

  6. On the Dataflow tab, find the target dataflow and click Task Management in the Actions column.

  7. In the Task Management panel, find the task whose report you want to view and choose More > Report in the Actions column.

  8. Obtain the full path of the target task report and download it.

    Note
    • Task reports are generated only for user tasks. System tasks do not generate task reports.

    • You can view the task report after the user task is complete. The report is saved to the .dataflow_report directory of the CPFS file system.

    The following code provides a sample task report:

    SUMMARY,dataflowId,taskId,userId,fsId,startDate,endData,total,succ,skip,failed,throughput_MBps
    FILE,path,status,size
    
    SUMMARY,df-0001,task-0001,1001,cpfs-1234,1632477577,1632477677,18,10,1,7,0.01
    FILE,test1/object1,cached,131072
    FILE,test1/object2,cached,131072

    Category

    Field

    Description

    Task statistics (SUMMARY)

    dataflowId

    The dataflow ID.

    taskId

    The task ID.

    userId

    The user ID.

    fsId

    The file system ID.

    startDate

    The task start time, in seconds since the epoch.

    endDate

    The task end time, in seconds since the epoch.

    total

    The total number of files processed by the task.

    succ

    The number of files that were successfully processed.

    skip

    The number of files that were skipped. For example, files that were already imported in an import task.

    failed

    The number of files that failed to be processed.

    throughput_MBps

    The average throughput during task execution, in MB/s.

    File information (FILE)

    path

    The path of the file in the fileset.

    status

    The file status.

    • cached: The file is imported or exported.

    • uncached: The file is not imported.

    • dirty: The file was modified on the CPFS file system and has not been exported.

    • NA: The file does not exist.

    size

    The file size, in bytes.

Related operations

Operation

Description

Steps

View a task

You can view the configuration and running status of a dataflow task in the console.

  1. On the Dataflow tab, find the target dataflow and click Task Management.

  2. In the Task Management panel, view the details of the target task.

Cancel a task

You can cancel a running dataflow task in the console.

  1. On the Dataflow tab, find the target dataflow and click Task Management.

  2. In the Task Management panel, find the target task and click Cancel.

  3. Confirm the task that you want to cancel and click OK.

Copy a task

You can copy a completed task to run it again.

  1. On the Dataflow tab, find the target dataflow and click Task Management.

  2. In the Task Management panel, find the target task and choose .

  3. Confirm the task that you want to copy and click OK.