All Products
Search
Document Center

Platform For AI:CreateDataset

Last Updated:May 23, 2025

Creates a dataset.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • For mandatory resource types, indicate with a prefix of * .
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
paidataset:CreateDatasetcreate
*All Resources
*
    none
none

Request syntax

POST /api/v1/datasets HTTP/1.1

Request parameters

ParameterTypeRequiredDescriptionExample
bodyobjectNo

The request data.

NamestringYes

The dataset name. The name must meet the following requirements:

  • The name must start with a letter, digit, or Chinese character.
  • The name can contain underscores (_) and hyphens (-).
  • The name must be 1 to 127 characters in length.
myName
PropertystringYes

The property of the dataset. Valid values:

  • FILE
  • DIRECTORY
DIRECTORY
DataSourceTypestringYes

The data source type. Valid values:

  • OSS: Object Storage Service (OSS).
  • NAS: File Storage NAS (NAS).
NAS
UristringYes

The URI of the data source.

  • Value format if DataSourceType is set to OSS: oss://bucket.endpoint/object.
  • Value formats if DataSourceType is set to NAS: General-purpose NAS: nas://<nasfisid>.region/subpath/to/dir/. CPFS 1.0: nas://<cpfs-fsid>.region/subpath/to/dir/. CPFS 2.0: nas://<cpfs-fsid>.region/<protocolserviceid>/. You can distinguish CPFS 1.0 and CPFS 2.0 file systems based on the format of the file system ID: The ID for CPFS 1.0 is in the cpfs-<8-bit ASCII characters> format. The ID for CPFS 2.0 is in the cpfs-<16-bit ASCII characters> format.
nas://09f****f2.cn-hangzhou/
DataTypestringNo

The type of the dataset. Default value: COMMON. Valid values:

  • COMMON: common
  • PIC: picture
  • TEXT: text
  • Video: video
  • AUDIO: audio
Enumeration Value:
  • VIDEO: VIDEO.
  • COMMON: COMMON.
  • TEXT: TEXT.
  • PIC: PIC.
  • AUDIO: AUDIO.
COMMON
LabelsarrayNo

The tags.

LabelNo

The tag to be added to the dataset.

SourceTypestringNo

The type of the data source. Default value: USER.

Valid values:

  • PAI_PUBLIC_DATASET: a public dataset of PAI.
  • ITAG: a dataset generated from a labeling job of iTAG.
  • USER: a dataset registered by a user.
USER
SourceIdstringNo

The data source ID.

  • If SourceType is set to USER, the value of SourceId is a custom string.
  • If SourceType is set to ITAG, the value of SourceId is the ID of the labeling job of iTAG.
  • If SourceType is set to PAI_PUBLIC_DATASET, SourceId is empty by default.
jdnhf***fnrimv
DescriptionstringNo

The description of the dataset. Descriptions are used to differentiate datasets.

WorkspaceIdstringNo

The ID of the workspace to which the dataset belongs. You can call ListWorkspaces to obtain the workspace ID. If you do not specify this parameter, the default workspace is used. If the default workspace does not exist, an error is reported.

478**
OptionsstringNo

The extended field, which is a JSON string. When you use the dataset in Deep Learning Containers (DLC), you can configure the mountPath field to specify the default mount path of the dataset.

{ "mountPath": "/mnt/data/" }
AccessibilitystringNo

The workspace accessibility. Valid values:

  • PRIVATE: The workspace is accessible only to you and the administrator of the workspace. This is the default value.
  • PUBLIC: The workspace is accessible to all users.
PRIVATE
ProviderTypestringNo

The source type of the dataset. Valid values:

  • Ecs (default)
  • Lingjun
Ecs
ProviderstringNo

The dataset provider. The value cannot be set to pai.

Github
UserIdstringNo

The ID of the Alibaba Cloud account to which the dataset belongs. The workspace owner and administrator have permissions to create datasets for specified members in the workspace.

2485765****023475
SourceDatasetIdstringNo

The ID of the source dataset for the labeled dataset.

d-bvfasdfxxxxj8o411
SourceDatasetVersionstringNo

The version of the source dataset for the labeled dataset.

v2
VersionDescriptionstringNo

The description of the dataset of the initial version.

The initial version
VersionLabelsarrayNo

The list of tags to be added to the dataset of the initial version.

LabelNo

The tag to be added to the dataset of the initial version.

DataSizelongNo

The size of the dataset file. Unit: bytes.

10000
DataCountlongNo

The number of dataset files.

500
MountAccessReadWriteRoleIdListarrayNo

The list of role names in the workspace that have read and write permissions on the mounted database. The names start with PAI are basic role names and the names start with role- are custom role names. If the list contains asterisks (*), all roles have read and write permissions.

  • If you set the value to ["PAI.AlgoOperator", "role-hiuwpd01ncrokkgp21"], the account of the specified role is granted the read and write permissions.
  • If you set the value to ["*"], all accounts are granted the read and write permissions.
  • If you set the value to [], only the creator of the dataset has the read and write permissions.
stringNo

The ID of the workspace role.

PAI.AlgoOperator
ImportInfostringNo

The dataset configurations to be imported to a storage, such as OSS, NAS, or Cloud Parallel File Storage (CPFS).

OSS

{
"region": "${region}",// The region ID.
"bucket": "${bucket}",//The bucket name.
"path": "${path}" // The file path.
}\

NAS

{
"region": "${region}",// The region ID.
"fileSystemId": "${file_system_id}", // The file system ID.
"path": "${path}", // The file system path.
"mountTarget": "${mount_target}" // The mount point of the file system.
}\

CPFS

{
"region": "${region}",// The region ID.
"fileSystemId": "${file_system_id}", // The file system ID.
"protocolServiceId":"${protocol_service_id}", // The file system protocol service.
"exportId": "${export_id}", // The file system export directory.
"path": "${path}", // The file system path.
}\

CPFS for Lingjun

{
"region": "${region}",// The region ID.
"fileSystemId": "${file_system_id}", // The file system ID.
"path": "${path}", // The file system path.
"mountTarget": "${mount_target}" // The mount point of the file system, CPFS for Lingjun only.
"isVpcMount": boolean, // Whether the mount point is a virtual private cloud (VPC) mount point, CPFS for Lingjun only.
}\

{ "region": "cn-wulanchabu", "fileSystemId": "bmcpfs-xxxxxxxxxxx", "path": "/mnt", "mountTarget": "cpfs-xxxxxxxxxxxx-vpc-gacs9f.cn-wulanchabu.cpfs.aliyuncs.com", "isVpcMount": true }

Response parameters

ParameterTypeDescriptionExample
object

The returned data.

RequestIdstring

The request ID.

B2C51F93-1C07-5477-9705-5FDB****F19F
DatasetIdstring

The dataset ID.

d-rbvg5*****jhc9ks92

Examples

Sample success responses

JSONformat

{
  "RequestId": "B2C51F93-1C07-5477-9705-5FDB****F19F",
  "DatasetId": "d-rbvg5*****jhc9ks92"
}

Error codes

For a list of error codes, visit the Service error codes.

Change history

Change timeSummary of changesOperation
2025-02-06The internal configuration of the API is changed, but the call is not affectedView Change Details
2024-10-18The internal configuration of the API is changed, but the call is not affectedView Change Details
2024-07-09The internal configuration of the API is changed, but the call is not affectedView Change Details
2024-06-20The internal configuration of the API is changed, but the call is not affectedView Change Details
2024-02-27The internal configuration of the API is changed, but the call is not affectedView Change Details
2023-04-26The internal configuration of the API is changed, but the call is not affectedView Change Details