All Products
Search
Document Center

Platform For AI:CreateDataset

Last Updated:Mar 30, 2026

Creates a dataset.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

paidataset:CreateDataset

create

*All Resource

*

None None

Request syntax

POST /api/v1/datasets HTTP/1.1

Request parameters

Parameter

Type

Required

Description

Example

body

object

No

The request body.

Name

string

Yes

The name of the dataset. The naming convention is as follows:

  • Starts with a lowercase letter, an uppercase letter, a digit, or a Chinese character.

  • Can contain underscores (_) or hyphens (-).

  • Must be 1 to 127 characters in length.

myName

Property

string

Yes

The property of the dataset. Valid values:

  • FILE: a file.

  • DIRECTORY: a folder.

Valid values:

  • DIRECTORY :

    DIRECTORY

  • FILE :

    FILE

DIRECTORY

DataSourceType

string

Yes

The type of the data source. Valid values:

  • OSS: Alibaba Cloud Object Storage Service (OSS).

  • NAS: General-purpose Alibaba Cloud File Storage (NAS).

  • EXTREMENAS: Extreme Alibaba Cloud NAS.

  • CPFS: General-purpose Edition of Cloud Parallel File System (CPFS).

  • BMCPFS: AI Computing Edition of CPFS.

  • MAXCOMPUTE: Alibaba Cloud cloud-native big data computing service (MaxCompute).

  • URL: a public HTTP or HTTPS URL.

Valid values:

  • NAS :

    NAS

  • OSS :

    OSS

NAS

Uri

string

Yes

The URI of the data source. The following examples show the URI format:

  • For an OSS data source: oss://bucket.endpoint/object

  • For a NAS data source: General-purpose NAS: nas://<nasfisid>.region/subpath/to/dir/ CPFS 1.0: nas://<cpfs-fsid>.region/subpath/to/dir/ CPFS 2.0: nas://<cpfs-fsid>.region/<protocolserviceid>/ CPFS 1.0 and CPFS 2.0 are distinguished by the format of the fsid. The fsid for CPFS 1.0 is in the cpfs-<8-character ASCII string> format. The fsid for CPFS 2.0 is in the cpfs-<16-character ASCII string> format.

nas://09f****f2.cn-hangzhou/

DataType

string

No

The data type of the dataset. The default value is COMMON. Valid values:

  • COMMON: common data.

  • PIC: images.

  • TEXT: text.

  • VIDEO: videos.

  • AUDIO: audio.

Valid values:

  • VIDEO :

    VIDEO

  • COMMON :

    COMMON

  • TEXT :

    TEXT

  • PIC :

    PIC

  • AUDIO :

    AUDIO

COMMON

Labels

array

No

The list of labels.

Label

No

The labels to add to the dataset.

SourceType

string

No

The type of the data source. The default value is USER.

Valid values:

  • PAI_PUBLIC_DATASET :

    PAI_PUBLIC_DATASET

  • ITAG :

    ITAG

  • USER :

    USER

USER

SourceId

string

No

The ID of the data source.

  • If SourceType is USER, you can customize SourceId.

  • If SourceType is ITAG, which indicates that the dataset is generated from the annotation results of the iTAG module, SourceId is the task ID in iTAG.

  • If SourceType is PAI_PUBLIC_DATASET, which indicates that the dataset is created from a PAI public dataset, SourceId is empty by default.

jdnhf***fnrimv

Description

string

No

A custom description of the dataset. This helps you distinguish it from other datasets.

This is a description of the dataset.

WorkspaceId

string

No

The ID of the workspace where the dataset resides. For more information about how to obtain a workspace ID, see ListWorkspaces. If you do not specify this parameter, the default workspace is used. If the default workspace does not exist, an error is reported.

478**

Options

string

No

The extended field, which is a JSON string. When a Data Lake Compute (DLC) job uses the dataset, you can configure the mountPath field to specify the default mount path for the dataset.

{ "mountPath": "/mnt/data/" }

Accessibility

string

No

The visibility of the dataset in the workspace. Valid values:

  • PRIVATE (default): The dataset is visible only to its owner and administrators in the workspace.

  • PUBLIC: The dataset is visible to all users in the workspace.

Valid values:

  • PUBLIC :

    PUBLIC

  • PRIVATE :

    PRIVATE

PRIVATE

ProviderType

string

No

The type of the data source provider. Valid values:

  • Ecs (default)

  • Lingjun

Ecs

Provider

string

No

The provider of the dataset. You cannot set this parameter to pai.

Github

UserId

string

No

The Alibaba Cloud account ID of the dataset owner. Workspace owners and administrators can create datasets for specified workspace members.

2485765****023475

SourceDatasetId

string

No

The ID of the source dataset for a labeled dataset.

d-bvfasdfxxxxj8o411

SourceDatasetVersion

string

No

The version of the source dataset for a labeled dataset.

v2

VersionDescription

string

No

The description of the initial version of the dataset.

This is a description of the first dataset version.

VersionLabels

array

No

The list of labels for the initial version.

Label

No

The labels to add to the initial version of the dataset.

DataSize

integer

No

The size of the dataset files. Unit: byte.

10000

DataCount

integer

No

The number of files in the dataset.

500

MountAccessReadWriteRoleIdList

array

No

A list of workspace role names that have read and write permissions when the dataset is mounted. Role IDs that start with PAI. are basic role IDs. Role IDs that start with role- are custom role IDs. If the list contains an asterisk (*), all roles have read and write permissions.

  • Accounts with specified roles: ["PAI.AlgoOperator", "role-hiuwpd01ncrokkgp21"]

  • All accounts: ["*"]

  • Dataset creator only: []

string

No

The workspace role ID.

PAI.AlgoOperator

ImportInfo

string

No

The storage import configuration of the dataset. OSS, NAS, and CPFS are supported.

OSS

{
"region": "${region}",// The region ID.
"bucket": "${bucket}",// The bucket name.
"path": "${path}" // The file path.
}














NAS

{
"region": "${region}",// The region ID.
"fileSystemId": "${file_system_id}", // The file system ID.
"path": "${path}", // The file system path.
"mountTarget": "${mount_target}" // The mount target of the file system.
}

















CPFS

{
"region": "${region}",// The region ID.
"fileSystemId": "${file_system_id}", // The file system ID.
"protocolServiceId":"${protocol_service_id}", // The protocol service of the file system.
"exportId": "${export_id}", // The exported directory of the file system.
"path": "${path}", // The file system path.
}




















AI Computing CPFS

{
"region": "${region}",// The region ID.
"fileSystemId": "${file_system_id}", // The file system ID.
"path": "${path}", // The file system path.
"mountTarget": "${mount_target}", // The mount target of the file system. This parameter is specific to the AI Computing Edition.
"isVpcMount": boolean, // Specifies whether the mount target is in a VPC. This parameter is specific to the AI Computing Edition.
}




















{ "region": "cn-wulanchabu", "fileSystemId": "bmcpfs-xxxxxxxxxxx", "path": "/mnt", "mountTarget": "cpfs-xxxxxxxxxxxx-vpc-gacs9f.cn-wulanchabu.cpfs.aliyuncs.com", "isVpcMount": true }

Edition

string

No

The edition of the dataset. The default value is BASIC. Valid values:

  • BASIC: Basic Edition. Does not support file metadata management for the dataset.

  • ADVANCED: Advanced Edition. Supported only for OSS datasets. Each version can manage metadata for up to 1 million files.

  • LOGICAL: Logical Edition. Supported only for OSS datasets. Each version can manage metadata for up to 3 million files.

Valid values:

  • LOGICAL :

    LOGICAL

  • BASIC :

    BASIC

  • ADVANCED :

    ADVANCED

ADVANCED

AccessibleRoleIdList

array

No

string

No

Response elements

Element

Type

Description

Example

object

The response body.

RequestId

string

The request ID.

B2C51F93-1C07-5477-9705-5FDB****F19F

DatasetId

string

The dataset ID.

d-rbvg5*****jhc9ks92

Examples

Success response

JSON format

{
  "RequestId": "B2C51F93-1C07-5477-9705-5FDB****F19F",
  "DatasetId": "d-rbvg5*****jhc9ks92"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.