CreateDataFlow - Cloud Parallel File Storage - Alibaba Cloud Documentation Center

Creates a dataflow for a Cloud Parallel File Storage (CPFS) file system and source storage.

Operation description

Basic operations
- Only Cloud Parallel File Storage (CPFS) V2.2.0 and later and CPFS for Lingjun V2.4.0 and later support data flows.
- You can create a data flow only when a CPFS or CPFS for Lingjun file system is in the Running state.
- A maximum of 10 data flows can be created for a CPFS or CPFS for Lingjun file system.
- It generally takes 2 to 5 minutes to create a data flow. You can call the DescribeDataFlows operation to check whether the data flow has been created.
Permission

When you create a data flow, CPFS obtains the following two service-linked roles: AliyunServiceRoleForNasOssDataflow and AliyunServiceRoleForNasEventNotification. For more information, see CPFS service-linked roles.
CPFS usage notes
- Billing
  - If you create a data flow, you are charged for using the data flow throughput. For more information, see Billing of CPFS.
  - When you configure the AutoRefresh feature for a data flow, CPFS must use EventBridge to collect object modification events from the source Object Storage Service (OSS) bucket. Event fees are incurred. For more information, see Billing of EventBridge.
- Data flow specifications
  - The data flow throughput supports the following specifications: 600 MB/s, 1,200 MB/s, and 1,500 MB/s. The data flow throughput is the maximum transmission bandwidth that can be reached when data is imported or exported for a data flow.
  - When you create a data flow, the vSwitch IP addresses used by a CPFS mount target are consumed. Make sure that the vSwitch can provide sufficient IP addresses.
  - Inventory query: If you set the DryRun parameter to true, you can check whether the resources for the data flow whose throughput is changed meet the requirements.
- Fileset
  - The destination for a data flow is a fileset in the CPFS file system. A fileset is a new directory tree structure (a small file directory) in a CPFS file system. Each fileset independently manages an inode space.
  - When you create a data flow for a CPFS file system, the related fileset must already exist and cannot be nested with other filesets. Only one data flow can be created in a fileset, which corresponds to one source storage.
  - A fileset supports a maximum of one million files. If the number of files imported from an OSS bucket into the fileset exceeds the upper limit, the no space error message is returned when you add new files.
**

**Note **If data already exists in the fileset, after you create a data flow, the existing data in the fileset is cleared and replaced with the data synchronized from the OSS bucket.
- AutoRefresh
  - After AutoRefresh is configured, if the data in the source OSS bucket is updated, the updated metadata is automatically synchronized to the CPFS file system. You can load the updated data when you access files, or run a data flow task to load the updated data.
  - AutoRefresh depends on the object modification events collected by EventBridge from the source OSS bucket. You must first activate EventBridge.
  - The AutoRefresh configuration applies only to the prefix and is specified by the RefreshPath parameter. You can configure a maximum of five AutoRefresh directories for a data flow.
  - AutoRefreshInterval refers to the interval at which CPFS checks whether data is updated in the prefix of the source OSS bucket. If data is updated, CPFS runs an AutoRefresh task. If the frequency of triggering the object modification event in the source OSS bucket exceeds the processing capability of the CPFS data flow, AutoRefresh tasks are accumulated, metadata updates are delayed, and the data flow status becomes Misconfigured. To resolve these issues, you can increase the data flow specifications or reduce the frequency of triggering the object modification event.
  - When you add an AutoRefresh configuration to the prefix for a CPFS data flow, an event bus is created at the user side and an event rule is created for the prefix of the source OSS bucket. When an object is modified in the prefix of the source OSS bucket, an OSS event is generated in the EventBridge console. The event is processed by the CPFS data flow.
  **
  
  **Note **The event buses and event rules created for CPFS in the EventBridge console contain the Create for cpfs auto refresh description. The event buses and event rules cannot be modified or deleted. Otherwise, AutoRefresh cannot work properly.
- Source storage
  - The source storage is an OSS bucket. SourceStorage for a data flow must be an OSS bucket.
  - CPFS data flows support both encrypted and unencrypted access to OSS. If you select SSL-encrypted access to OSS, make sure that encryption in transit for OSS buckets supports encrypted access.
  - If data flows for multiple CPFS file systems or multiple data flows for the same CPFS file system are stored in the same OSS bucket, you must enable versioning for the OSS bucket to prevent data conflicts caused by data export from multiple CPFS file systems to one OSS bucket.
  - Data flows are not supported for OSS buckets across regions. The OSS bucket must reside in the same region as the CPFS file system.
  **
  
  **Note **Before you create a data flow, you must configure a tag (key: cpfs-dataflow, value: true) for the source OSS bucket. This way, the created data flow can access the data in the OSS bucket. When a data flow is being used, do not delete or modify the tag. Otherwise, the data flow for CPFS cannot access the data in the OSS bucket.
CPFS for Lingjun usage notes
- Source storage
  - The source storage is an OSS bucket. SourceStorage for a data flow must be an OSS bucket.
  - CPFS for Lingjun data flows support both encrypted and unencrypted access to OSS. If you select SSL-encrypted access to OSS, make sure that encryption in transit for OSS buckets supports encrypted access.
  - If data flows for multiple CPFS for Lingjun file systems or multiple data flows for the same CPFS for Lingjun file system are stored in the same OSS bucket, you must enable versioning for the OSS bucket to prevent data conflicts caused by data export from multiple CPFS for Lingjun file systems to one OSS bucket.
  - Data flows are not supported for OSS buckets across regions. The OSS bucket must reside in the same region as the CPFS file system.
  - CPFS for Lingjun V2.6.0 and later allow you to create data flows for OSS buckets across accounts.
  - The account id parameter is required only when you use OSS buckets across accounts.
  - To use OSS buckets across accounts, you must first grant permissions to the related accounts. For more information, see Cross-account authorization on data flows.
    
    **
    
    **Note **Before you create a data flow, you must configure a tag (key: cpfs-dataflow, value: true) for the source OSS bucket. This way, the created data flow can access the data in the OSS bucket. When a data flow is being used, do not delete or modify the tag. Otherwise, the data flow for CPFS for Lingjun cannot access the data in the OSS bucket.
- Limits of data flows on file systems
  - You cannot rename a non-empty directory in a path that is associated with a data flow. Otherwise, the Permission Denied error message or an error message indicating that the directory is not empty is returned.
  - Proceed with caution when you use special characters in the names of directories and files. The following characters are supported: letters, digits, exclamation points (!), hyphens (-), underscores (_), periods (.), asterisks (*), and parentheses (()).
  - The path can be up to 1,023 characters in length.
- Limits of data flows on import
  - After a symbolic link is imported to CPFS for Lingjun, the symbolic link is converted into a common data file that contains no symbolic link information.
  - If an OSS bucket has multiple versions, only data of the latest version is used.
  - The name of a file or a subdirectory can be up to 255 bytes in length.
- Limits of data flows on export
  - After a symbolic link is synchronized to OSS, the file that the symbolic link points to is not synchronized to OSS. In this case, the symbolic link is converted into a common object that contains no data.
  - Hard links can be synchronized to OSS only as common files that contain no link information.
  - After a file of the Socket, Device, or Pipe type is exported to an OSS bucket, the file is converted into a common object that contains no data.
  - The directory path can be up to 1,023 characters in length.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Debug

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

Operation: the value that you can use in the Action element to specify the operation on a resource.
Access level: the access level of each operation. The levels are read, write, and list.
Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- For mandatory resource types, indicate with a prefix of * .
- If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
Condition Key: the condition key that is defined by the cloud service.
Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.

Operation	Access level	Resource type	Condition key	Associated operation
nas:CreateDataFlow	create	*DataFlow `acs:nas:{#regionId}:{#accountId}:filesystem/{#filesystemId}`	none	none

Request parameters

Parameter	Type	Required	Description	Example
FileSystemId	string	Yes	The ID of the file system. The IDs of CPFS file systems must start with `cpfs-`. Example: cpfs-125487**. The IDs of CPFS for Lingjun file systems must start with `bmcpfs-`. Example: bmcpfs-0015**.	cpfs-099394bd928c****
FsetId	string	No	The fileset ID. Note This parameter is required for CPFS file systems.	fset-1902718ea0ae****
SourceStorage	string	Yes	The access path of the source storage. Format: `<storage type>://[<account id>:]<path>`. Parameters: storage type: Only OSS is supported. account id (optional): the UID of the account of the source storage. This parameter is required when you use OSS buckets across accounts. path: the name of the OSS bucket. Limits: The name can contain only lowercase letters, digits, and hyphens (-). The name must start and end with a lowercase letter or digit. The name can be up to 128 characters in length. The name must be encoded in UTF-8. Note The OSS bucket must be an existing bucket in the region. Only CPFS for LINGJUN V2.6.0 and later support the account id parameter.	oss://bucket1
SourceSecurityType	string	No	The type of security mechanism for the source storage. This parameter must be specified if the source storage is accessed with a security mechanism. Valid values: None (default): The source storage can be accessed without a security mechanism. SSL: The source storage must be accessed with an SSL certificate.	SSL
Throughput	long	No	The maximum data flow throughput. Unit: MB/s. Valid values: 600 1200 1500 Note The data flow throughput must be less than the I/O throughput of the file system. This parameter is required for CPFS file systems.	600
Description	string	No	The description of the dataflow. Limits: The description must be 2 to 128 characters in length. The description must start with a letter but cannot start with `http://` or `https://`. The description can contain letters, digits, colons (:), underscores (_), and hyphens (-).	Bucket01 DataFlow
AutoRefreshs	array<object>	No	The automatic update configurations. Note This parameter takes effect only for CPFS file systems.
	object	No
RefreshPath	string	No	The automatic update directory. CPFS registers the data update event in the source storage, and automatically checks whether the source data in the directory is updated and imports the updated data. This parameter is empty by default. Updated data in the source storage is not automatically imported into the CPFS file system. You must import the updated data by running a manual task. Limits: The directory must be 2 to 1,024 characters in length. The directory must be encoded in UTF-8. The directory must start and end with a forward slash (/). The directory must be an existing directory in the CPFS file system and must be in a fileset where the data flow is enabled.	/prefix1/prefix2/
AutoRefreshPolicy	string	No	The automatic update policy. The updated data in the source storage is imported into the CPFS file system based on the policy. None (default): Updated data in the source storage is not automatically imported into the CPFS file system. You can run a data flow task to import the updated data from the source storage. ImportChanged: Updated data in the source storage is automatically imported into the CPFS file system. Note This parameter takes effect only for CPFS file systems.	None
AutoRefreshInterval	long	No	The automatic update interval. CPFS checks whether data is updated in the directory at the interval specified by this parameter. If data is updated, CPFS starts an automatic update task. Unit: minutes. Valid values: 10 to 525600. Default value: 10. Note This parameter takes effect only for CPFS file systems.	10
DryRun	boolean	No	Specifies whether to perform a dry run. During the dry run, the system checks whether the request parameters are valid and whether the requested resources are available. During the dry run, no file system is created and no fee is incurred. Valid values: true: performs a dry run. The system checks the required parameters, request syntax, limits, and available NAS resources. If the request fails the dry run, an error message is returned. If the request passes the dry run, the HTTP status code 200 is returned. No value is returned for the FileSystemId parameter. false (default): performs a dry run and sends the request. If the request passes the dry run, a file system is created.	false
ClientToken	string	No	The client token that is used to ensure the idempotence of the request. You can use the client to generate the token, but you must make sure that the token is unique among different requests. The token can contain only ASCII characters and cannot exceed 64 characters in length. For more information, see How do I ensure the idempotence? Note If you do not specify this parameter, the system automatically uses the request ID as the client token. The value of RequestId may be different for each API request.	123e4567-e89b-12d3-a456-42665544****
SourceStoragePath	string	No	The access path in the bucket of the source storage. Limits: The path must start and end with a forward slash (/). The path is case-sensitive. The path must be 1 to 1023 characters in length. The path must be encoded in UTF-8. Note This parameter is required for CPFS for LINGJUN file systems.	/prefix/
FileSystemPath	string	No	The directory in the CPFS for LINGJUN file system. Limits: The directory must start and end with a forward slash (/). The directory must be an existing directory in the CPFS for LINGJUN file system. The directory must be 1 to 1023 characters in length. The directory must be encoded in UTF-8. Note This parameter is required for CPFS for LINGJUN file systems.	/path/

Response parameters

Parameter	Type	Description	Example
	object
RequestId	string	The request ID.	473469C7-AA6F-4DC5-B3DB-A3DC0D****3E
DataFlowId	string	The ID of the dataflow.	df-194433a5be31****

Examples

Sample success responses

JSONformat

{
  "RequestId": "473469C7-AA6F-4DC5-B3DB-A3DC0D****3E",
  "DataFlowId": "df-194433a5be31****"
}

Error codes

HTTP status code	Error code	Error message	Description
400	IllegalCharacters	The parameter contains illegal characters.	The parameter contains illegal characters.
400	MissingFsetId	FsetId is mandatory for this action.	-
400	MissingSourceStorage	SourceStorage is mandatory for this action.	-
400	MissingThroughput	Throughput is mandatory for this action.	-
400	MissingFileSystemId	FileSystemId is mandatory for this action.	-
400	InvalidFilesystemVersion.NotSupport	This Api does not support this fileSystem version.	This Api does not support this fileSystem version.
400	DataFlow.Bucket.RegionUnmatched	The bucket and file system are not in the same region.	The bucket is inconsistent with the filesystem region.
403	OperationDenied.DependencyViolation	The operation is denied due to dependancy violation.	-
403	OperationDenied.NoStock	The operation is denied due to no stock.	-
403	OperationDenied.DependFset	The operation is denied due to invalid fileset state.	-
403	OperationDenied.ConflictOperation	The operation is denied due to a conflict with an ongoing operation.	-
403	OperationDenied.DependMountpoint	The operation is denied because no mount point is found.	-
403	OperationDenied.FsetAlreadyInUse	The Fset is already bound to another data flow.	-
403	OperationDenied.AutoRefreshNotSupport	The operation is denied. Auto refresh is not supported.	-
403	OperationDenied.DependBucketTag	The operation is denied. The OSS Bucket tag cpfs-dataflow is missing.	-
403	OperationDenied.DataFlowNotSupported	The operation is not supported.	-
403	InvalidOperation.DeletionProtection	The operation is not allowed due to resource is protected by deletion protection.	-
403	DataFlow.Bucket.AccessDenied	Bucket access denied.	Data flow authentication error.
404	InvalidFileSystem.NotFound	The specified file system does not exist.	The specified file system does not exist.
404	InvalidThroughput.OutOfBounds	Throughput is out of bounds.	-
404	InvalidDescription.InvalidFormat	Description format is invalid.	-
404	InvalidRefreshPath.InvalidParameter	Refresh path is invalid.	-
404	InvalidRefreshPath.Duplicated	Refresh path is duplicated.	-
404	InvalidRefreshPath.NotFound	Refresh path does not exist.	-
404	InvalidRefreshPolicy.InvalidParameter	Refresh policy is invalid.	-
404	InvalidRefreshInterval.OutOfBounds	Refresh interval is out of bounds.	-
404	InvalidSourceStorage.Unreachable	Source storage cannot be accessed.	-
404	InvalidSourceStorage.NotFound	Source storage is not found.	-
404	InvalidSourceStorage.NotSupport	Source storage type is not supported.	-
404	InvalidSourceStorage.PermissionDenied	The source storage access permission is denied.	-
404	InvalidSourceStorage.InvalidRegion	Source storage region is invalid.	-
404	InvalidSourceStorage.InvalidParameter	Source storage has invalid parameters.	-
404	InvalidSourceSecurityType.NotSupport	The source security type is not supported.	-
404	InvalidAutoRefresh.TooManyAutoRefreshes	The number of auto refreshes exceeds the limit.	-
404	InvalidSourceStorage.NeedVersioning	Source storage must enable versioning.	-
404	InvalidFsetId.NotFound	The specified Fileset ID does not exist.	-
404	DataFlow.Bucket.NotExist	Bucket does not exist.	The bucket does not exist.

For a list of error codes, visit the Service error codes.

Change history

Change time	Summary of changes	Operation
2024-09-09	The Error code has changed	View Change Details
2024-05-31	The Error code has changed	View Change Details
2024-02-29	The Error code has changed. The request parameters of the API has changed	View Change Details