Creates a dataflow for a Cloud Parallel File Storage (CPFS) file system and source storage.
Operation description
-
Basic operations
- Only Cloud Parallel File Storage (CPFS) V2.2.0 and later and CPFS for Lingjun V2.4.0 and later support data flows.
- You can create a data flow only when a CPFS or CPFS for Lingjun file system is in the Running state.
- A maximum of 10 data flows can be created for a CPFS or CPFS for Lingjun file system.
- It generally takes 2 to 5 minutes to create a data flow. You can call the DescribeDataFlows operation to check whether the data flow has been created.
-
Permission
When you create a data flow, CPFS obtains the following two service-linked roles:
AliyunServiceRoleForNasOssDataflowandAliyunServiceRoleForNasEventNotification. For more information, see CPFS service-linked roles. -
CPFS usage notes
-
Billing
- If you create a data flow, you are charged for using the data flow throughput. For more information, see Billing of CPFS.
- When you configure the AutoRefresh feature for a data flow, CPFS must use EventBridge to collect object modification events from the source Object Storage Service (OSS) bucket. Event fees are incurred. For more information, see Billing of EventBridge.
-
Data flow specifications
- The data flow throughput supports the following specifications: 600 MB/s, 1,200 MB/s, and 1,500 MB/s. The data flow throughput is the maximum transmission bandwidth that can be reached when data is imported or exported for a data flow.
- When you create a data flow, the vSwitch IP addresses used by a CPFS mount target are consumed. Make sure that the vSwitch can provide sufficient IP addresses.
- Inventory query: If you set the DryRun parameter to true, you can check whether the resources for the data flow whose throughput is changed meet the requirements.
-
Fileset
- The destination for a data flow is a fileset in the CPFS file system. A fileset is a new directory tree structure (a small file directory) in a CPFS file system. Each fileset independently manages an inode space.
- When you create a data flow for a CPFS file system, the related fileset must already exist and cannot be nested with other filesets. Only one data flow can be created in a fileset, which corresponds to one source storage.
- A fileset supports a maximum of one million files. If the number of files imported from an OSS bucket into the fileset exceeds the upper limit, the
no spaceerror message is returned when you add new files.
**
**Note **If data already exists in the fileset, after you create a data flow, the existing data in the fileset is cleared and replaced with the data synchronized from the OSS bucket.
-
AutoRefresh
- After AutoRefresh is configured, if the data in the source OSS bucket is updated, the updated metadata is automatically synchronized to the CPFS file system. You can load the updated data when you access files, or run a data flow task to load the updated data.
- AutoRefresh depends on the object modification events collected by EventBridge from the source OSS bucket. You must first activate EventBridge.
- The AutoRefresh configuration applies only to the prefix and is specified by the RefreshPath parameter. You can configure a maximum of five AutoRefresh directories for a data flow.
- AutoRefreshInterval refers to the interval at which CPFS checks whether data is updated in the prefix of the source OSS bucket. If data is updated, CPFS runs an AutoRefresh task. If the frequency of triggering the object modification event in the source OSS bucket exceeds the processing capability of the CPFS data flow, AutoRefresh tasks are accumulated, metadata updates are delayed, and the data flow status becomes
Misconfigured. To resolve these issues, you can increase the data flow specifications or reduce the frequency of triggering the object modification event. - When you add an AutoRefresh configuration to the prefix for a CPFS data flow, an event bus is created at the user side and an event rule is created for the prefix of the source OSS bucket. When an object is modified in the prefix of the source OSS bucket, an OSS event is generated in the EventBridge console. The event is processed by the CPFS data flow.
**
**Note **The event buses and event rules created for CPFS in the EventBridge console contain the
Create for cpfs auto refreshdescription. The event buses and event rules cannot be modified or deleted. Otherwise, AutoRefresh cannot work properly. -
Source storage
- The source storage is an OSS bucket. SourceStorage for a data flow must be an OSS bucket.
- CPFS data flows support both encrypted and unencrypted access to OSS. If you select SSL-encrypted access to OSS, make sure that encryption in transit for OSS buckets supports encrypted access.
- If data flows for multiple CPFS file systems or multiple data flows for the same CPFS file system are stored in the same OSS bucket, you must enable versioning for the OSS bucket to prevent data conflicts caused by data export from multiple CPFS file systems to one OSS bucket.
- Data flows are not supported for OSS buckets across regions. The OSS bucket must reside in the same region as the CPFS file system.
**
**Note **Before you create a data flow, you must configure a tag (key: cpfs-dataflow, value: true) for the source OSS bucket. This way, the created data flow can access the data in the OSS bucket. When a data flow is being used, do not delete or modify the tag. Otherwise, the data flow for CPFS cannot access the data in the OSS bucket.
-
-
CPFS for Lingjun usage notes
-
Source storage
-
The source storage is an OSS bucket. SourceStorage for a data flow must be an OSS bucket.
-
CPFS for Lingjun data flows support both encrypted and unencrypted access to OSS. If you select SSL-encrypted access to OSS, make sure that encryption in transit for OSS buckets supports encrypted access.
-
If data flows for multiple CPFS for Lingjun file systems or multiple data flows for the same CPFS for Lingjun file system are stored in the same OSS bucket, you must enable versioning for the OSS bucket to prevent data conflicts caused by data export from multiple CPFS for Lingjun file systems to one OSS bucket.
-
Data flows are not supported for OSS buckets across regions. The OSS bucket must reside in the same region as the CPFS file system.
-
CPFS for Lingjun V2.6.0 and later allow you to create data flows for OSS buckets across accounts.
-
The account id parameter is required only when you use OSS buckets across accounts.
-
To use OSS buckets across accounts, you must first grant permissions to the related accounts. For more information, see Cross-account authorization on data flows.
**
**Note **Before you create a data flow, you must configure a tag (key: cpfs-dataflow, value: true) for the source OSS bucket. This way, the created data flow can access the data in the OSS bucket. When a data flow is being used, do not delete or modify the tag. Otherwise, the data flow for CPFS for Lingjun cannot access the data in the OSS bucket.
-
-
Limits of data flows on file systems
- You cannot rename a non-empty directory in a path that is associated with a data flow. Otherwise, the Permission Denied error message or an error message indicating that the directory is not empty is returned.
- Proceed with caution when you use special characters in the names of directories and files. The following characters are supported: letters, digits, exclamation points (!), hyphens (-), underscores (_), periods (.), asterisks (*), and parentheses (()).
- The path can be up to 1,023 characters in length.
-
Limits of data flows on import
- After a symbolic link is imported to CPFS for Lingjun, the symbolic link is converted into a common data file that contains no symbolic link information.
- If an OSS bucket has multiple versions, only data of the latest version is used.
- The name of a file or a subdirectory can be up to 255 bytes in length.
-
Limits of data flows on export
- After a symbolic link is synchronized to OSS, the file that the symbolic link points to is not synchronized to OSS. In this case, the symbolic link is converted into a common object that contains no data.
- Hard links can be synchronized to OSS only as common files that contain no link information.
- After a file of the Socket, Device, or Pipe type is exported to an OSS bucket, the file is converted into a common object that contains no data.
- The directory path can be up to 1,023 characters in length.
-
Debugging
Authorization information
The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:
- Operation: the value that you can use in the Action element to specify the operation on a resource.
- Access level: the access level of each operation. The levels are read, write, and list.
- Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- For mandatory resource types, indicate with a prefix of * .
- If the permissions cannot be granted at the resource level,
All Resourcesis used in the Resource type column of the operation.
- Condition Key: the condition key that is defined by the cloud service.
- Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
| Operation | Access level | Resource type | Condition key | Associated operation |
|---|---|---|---|---|
| nas:CreateDataFlow | create | *DataFlow acs:nas:{#regionId}:{#accountId}:filesystem/{#filesystemId} |
| none |
Request parameters
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
| FileSystemId | string | Yes | The ID of the file system.
| cpfs-099394bd928c**** |
| FsetId | string | No | The fileset ID. Note
This parameter is required for CPFS file systems.
| fset-1902718ea0ae**** |
| SourceStorage | string | Yes | The access path of the source storage. Format: Parameters:
Note
| oss://bucket1 |
| SourceSecurityType | string | No | The type of security mechanism for the source storage. This parameter must be specified if the source storage is accessed with a security mechanism. Valid values:
| SSL |
| Throughput | long | No | The maximum data flow throughput. Unit: MB/s. Valid values:
Note
The data flow throughput must be less than the I/O throughput of the file system. This parameter is required for CPFS file systems.
| 600 |
| Description | string | No | The description of the dataflow. Limits:
| Bucket01 DataFlow |
| AutoRefreshs | array<object> | No | The automatic update configurations. Note
This parameter takes effect only for CPFS file systems.
| |
| object | No | |||
| RefreshPath | string | No | The automatic update directory. CPFS registers the data update event in the source storage, and automatically checks whether the source data in the directory is updated and imports the updated data. This parameter is empty by default. Updated data in the source storage is not automatically imported into the CPFS file system. You must import the updated data by running a manual task. Limits:
| /prefix1/prefix2/ |
| AutoRefreshPolicy | string | No | The automatic update policy. The updated data in the source storage is imported into the CPFS file system based on the policy.
Note
This parameter takes effect only for CPFS file systems.
| None |
| AutoRefreshInterval | long | No | The automatic update interval. CPFS checks whether data is updated in the directory at the interval specified by this parameter. If data is updated, CPFS starts an automatic update task. Unit: minutes. Valid values: 10 to 525600. Default value: 10. Note
This parameter takes effect only for CPFS file systems.
| 10 |
| DryRun | boolean | No | Specifies whether to perform a dry run. During the dry run, the system checks whether the request parameters are valid and whether the requested resources are available. During the dry run, no file system is created and no fee is incurred. Valid values:
| false |
| ClientToken | string | No | The client token that is used to ensure the idempotence of the request. You can use the client to generate the token, but you must make sure that the token is unique among different requests. The token can contain only ASCII characters and cannot exceed 64 characters in length. For more information, see How do I ensure the idempotence? Note
If you do not specify this parameter, the system automatically uses the request ID as the client token. The value of RequestId may be different for each API request.
| 123e4567-e89b-12d3-a456-42665544**** |
| SourceStoragePath | string | No | The access path in the bucket of the source storage. Limits:
Note
This parameter is required for CPFS for LINGJUN file systems.
| /prefix/ |
| FileSystemPath | string | No | The directory in the CPFS for LINGJUN file system. Limits:
Note
This parameter is required for CPFS for LINGJUN file systems.
| /path/ |
Response parameters
Examples
Sample success responses
JSONformat
{
"RequestId": "473469C7-AA6F-4DC5-B3DB-A3DC0D****3E",
"DataFlowId": "df-194433a5be31****"
}Error codes
| HTTP status code | Error code | Error message | Description |
|---|---|---|---|
| 400 | IllegalCharacters | The parameter contains illegal characters. | The parameter contains illegal characters. |
| 400 | MissingFsetId | FsetId is mandatory for this action. | - |
| 400 | MissingSourceStorage | SourceStorage is mandatory for this action. | - |
| 400 | MissingThroughput | Throughput is mandatory for this action. | - |
| 400 | MissingFileSystemId | FileSystemId is mandatory for this action. | - |
| 400 | InvalidFilesystemVersion.NotSupport | This Api does not support this fileSystem version. | This Api does not support this fileSystem version. |
| 400 | DataFlow.Bucket.RegionUnmatched | The bucket and file system are not in the same region. | The bucket is inconsistent with the filesystem region. |
| 403 | OperationDenied.DependencyViolation | The operation is denied due to dependancy violation. | - |
| 403 | OperationDenied.NoStock | The operation is denied due to no stock. | - |
| 403 | OperationDenied.DependFset | The operation is denied due to invalid fileset state. | - |
| 403 | OperationDenied.ConflictOperation | The operation is denied due to a conflict with an ongoing operation. | - |
| 403 | OperationDenied.DependMountpoint | The operation is denied because no mount point is found. | - |
| 403 | OperationDenied.FsetAlreadyInUse | The Fset is already bound to another data flow. | - |
| 403 | OperationDenied.AutoRefreshNotSupport | The operation is denied. Auto refresh is not supported. | - |
| 403 | OperationDenied.DependBucketTag | The operation is denied. The OSS Bucket tag cpfs-dataflow is missing. | - |
| 403 | OperationDenied.DataFlowNotSupported | The operation is not supported. | - |
| 403 | InvalidOperation.DeletionProtection | The operation is not allowed due to resource is protected by deletion protection. | - |
| 403 | DataFlow.Bucket.AccessDenied | Bucket access denied. | Data flow authentication error. |
| 404 | InvalidFileSystem.NotFound | The specified file system does not exist. | The specified file system does not exist. |
| 404 | InvalidThroughput.OutOfBounds | Throughput is out of bounds. | - |
| 404 | InvalidDescription.InvalidFormat | Description format is invalid. | - |
| 404 | InvalidRefreshPath.InvalidParameter | Refresh path is invalid. | - |
| 404 | InvalidRefreshPath.Duplicated | Refresh path is duplicated. | - |
| 404 | InvalidRefreshPath.NotFound | Refresh path does not exist. | - |
| 404 | InvalidRefreshPolicy.InvalidParameter | Refresh policy is invalid. | - |
| 404 | InvalidRefreshInterval.OutOfBounds | Refresh interval is out of bounds. | - |
| 404 | InvalidSourceStorage.Unreachable | Source storage cannot be accessed. | - |
| 404 | InvalidSourceStorage.NotFound | Source storage is not found. | - |
| 404 | InvalidSourceStorage.NotSupport | Source storage type is not supported. | - |
| 404 | InvalidSourceStorage.PermissionDenied | The source storage access permission is denied. | - |
| 404 | InvalidSourceStorage.InvalidRegion | Source storage region is invalid. | - |
| 404 | InvalidSourceStorage.InvalidParameter | Source storage has invalid parameters. | - |
| 404 | InvalidSourceSecurityType.NotSupport | The source security type is not supported. | - |
| 404 | InvalidAutoRefresh.TooManyAutoRefreshes | The number of auto refreshes exceeds the limit. | - |
| 404 | InvalidSourceStorage.NeedVersioning | Source storage must enable versioning. | - |
| 404 | InvalidFsetId.NotFound | The specified Fileset ID does not exist. | - |
| 404 | DataFlow.Bucket.NotExist | Bucket does not exist. | The bucket does not exist. |
For a list of error codes, visit the Service error codes.
Change history
| Change time | Summary of changes | Operation |
|---|---|---|
| 2024-09-09 | The Error code has changed | View Change Details |
| 2024-05-31 | The Error code has changed | View Change Details |
| 2024-02-29 | The Error code has changed. The request parameters of the API has changed | View Change Details |
