This topic describes how to create and manage Cloud Parallel File Storage (CPFS) dataflow tasks and view task reports in the NAS console.
Prerequisites
A CPFS fileset is created. For more information, see Create a fileset.
A dataflow is created. For more information, see Create a dataflow.
Task description
Task types
Based on the data operations they perform, tasks are classified into three types: Import, Export, and Evict.
Type
Description
Import
Imports data from a source storage to a CPFS file system.
Import type: You can import two types of data: Metadata and Data (MetaAndData).
Metadata: Imports only the metadata of files.
Data: Imports both the metadata and data of files.
Import path: The path of a file in the source OSS bucket. A dataflow task imports a file to the fileset based on its path in the OSS bucket.
If an imported file or directory does not have POSIX metadata attributes, the default owner is root and the default permission is 0770.
Export
Exports a specified directory or file from a dataflow fileset to an OSS bucket.
Export path: The path of a file or directory in the CPFS file system. A dataflow task exports a file to the bucket based on its path in the fileset.
Empty directories, hard links, and symbolic links cannot be exported to OSS.
Metadata export: You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported.
WarningCPFS exports metadata to the custom metadata of the OSS bucket. The metadata is named
x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, errors may occur with the file system metadata.
Evict
Releases the data of a file on a CPFS file system. After eviction, only the file's metadata is retained on the CPFS file system. You can still see the file, but its data blocks are cleared and no longer occupy storage space on the CPFS file system. When you access the file data, it is loaded from the source storage, such as OSS, on demand.
NoteBefore you evict a file, make sure that the latest version of the file exists in the OSS bucket.
Based on the initiator, tasks are classified as user tasks or system tasks.
Type
Description
User task
A dataflow task created in the console or by calling the CreateDataFlowTask API operation.
You can query user tasks in the panel of the console.
When a user task is complete, a task report is generated and saved to the .dataflow_report directory of the CPFS file system.
System task
A task that is automatically generated by CPFS after you enable Automatic Metadata Update. This task synchronizes updated file metadata from an OSS bucket to CPFS.
System tasks are automatically generated at the specified Metadata Refresh Interval (minutes) to synchronize updated file metadata from the OSS bucket.
You can query system tasks in the panel of the console.
System tasks do not generate task reports.
Task execution scope
The scope of a task can be a directory or a specified file list (EntryList). If the scope is a directory, the task traverses all files in the directory tree.
Create a dataflow task
Log on to the NAS console.
In the left-side navigation pane, choose File System > File System List.
In the top navigation bar, select a region.
On the File System List page, click the name of the file system.
On the details page of the file system, click Dataflow.
On the Dataflow tab, find the target dataflow and click Task Management in the Actions column.
In the Task Management panel, click Create Job.
In the Create Job panel, configure the parameters for the task.
Import data
Configuration item
Description
Data Type
Select the type of data to import.
Data: Imports both the data and metadata of files.
Metadata: Imports only the metadata of files.
If you import only file metadata, you can query only the filename. When you access the data, it is loaded from the source on demand.
Specify OSS Object Prefix Subdirectory
Select the directory or file list for the dataflow task.
Import Objects from OSS: The specified OSS directory must start and end with a forward slash (/).
Import Listed Objects: Each line in the file represents the path of a file in the OSS bucket. Directories are not supported.
Export data
Empty directories, hard links, and symbolic links cannot be exported to an OSS bucket.
You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported.
CPFS exports metadata to the custom metadata of the OSS bucket. The metadata is named
x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, file system metadata errors may occur.Configuration item
Description
Specify CPFS Subdirectory
Select the directory or file list for the dataflow task.
Export Files from CPFS: The directory must start and end with a forward slash (/) and must be the path of the directory in the CPFS file system.
Export Listed Files: Each line in the file represents the path of a file in the CPFS file system. Directories are not supported.
Delete data
Configuration item
Description
Delete File
Select the directory or file list for the dataflow task.
Delete Files from CPFS: The directory must start and end with a forward slash (/).
Delete Listed Files: Each line in the file represents the path of a file in the CPFS file system. Directories are not supported.
Review the configuration and click OK.
NoteWhen a specified dataflow task is running, the automatic data update task for the dataflow is suspended.
View a task report
Log on to the NAS console.
In the left-side navigation pane, choose File System > File System List.
In the top navigation bar, select a region.
On the File System List page, click the name of the file system.
On the details page of the file system, click Dataflow.
On the Dataflow tab, find the target dataflow and click Task Management in the Actions column.
In the Task Management panel, find the task whose report you want to view and choose in the Actions column.
Obtain the full path of the target task report and download it.
NoteTask reports are generated only for user tasks. System tasks do not generate task reports.
You can view the task report after the user task is complete. The report is saved to the .dataflow_report directory of the CPFS file system.
The following code provides a sample task report:
SUMMARY,dataflowId,taskId,userId,fsId,startDate,endData,total,succ,skip,failed,throughput_MBps FILE,path,status,size SUMMARY,df-0001,task-0001,1001,cpfs-1234,1632477577,1632477677,18,10,1,7,0.01 FILE,test1/object1,cached,131072 FILE,test1/object2,cached,131072Category
Field
Description
Task statistics (SUMMARY)
dataflowId
The dataflow ID.
taskId
The task ID.
userId
The user ID.
fsId
The file system ID.
startDate
The task start time, in seconds since the epoch.
endDate
The task end time, in seconds since the epoch.
total
The total number of files processed by the task.
succ
The number of files that were successfully processed.
skip
The number of files that were skipped. For example, files that were already imported in an import task.
failed
The number of files that failed to be processed.
throughput_MBps
The average throughput during task execution, in MB/s.
File information (FILE)
path
The path of the file in the fileset.
status
The file status.
cached: The file is imported or exported.
uncached: The file is not imported.
dirty: The file was modified on the CPFS file system and has not been exported.
NA: The file does not exist.
size
The file size, in bytes.
Related operations
Operation | Description | Steps |
View a task | You can view the configuration and running status of a dataflow task in the console. |
|
Cancel a task | You can cancel a running dataflow task in the console. |
|
Copy a task | You can copy a completed task to run it again. |
|
> Report