All Products
Search
Document Center

Platform For AI:CreateDatasetJob

Last Updated:Jan 13, 2026

Creates a dataset job.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

paidataset:CreateDatasetJob

create

*All Resource

*

None None

Request syntax

POST /api/v1/datasets/{DatasetId}/datasetjobs HTTP/1.1

Path Parameters

Parameter

Type

Required

Description

Example

DatasetId

string

Yes

The dataset ID. For more information about how to obtain the dataset ID, see ListDatasets.

d-rbvg5wz****c9ks92

Request parameters

Parameter

Type

Required

Description

Example

body

object

No

The request body.

DatasetVersion

string

No

The name of the dataset version.

v1

WorkspaceId

string

Yes

The workspace ID. For more information about how to obtain the workspace ID, see ListWorkspaces.

478**

JobAction

string

Yes

The task operation.

  • SemanticIndex: semantic index

  • IntelligentTag: intelligent tagging

  • FileMetaExport: metadata export

  • FileMetaBuild: build and update metadata

  • IntelligentTagRevert: revoke intelligent tagging

  • FileMetaImport: metadata import

Valid values:

  • SemanticIndex :

    SemanticIndex

  • IntelligentTag :

    IntelligentTag

  • FileMetaExport :

    FileMetaExport

SemanticIndex

JobMode

string

No

The task type.

  • Full (default): forces the processing of all metadata. This task takes a long time to execute.

  • Increment: processes only changed or unsuccessfully processed metadata. The SemanticIndex and IntelligentTag tasks support Increment and Full. Other tasks support only Full.

Valid values:

  • Full :

    Full

Full

Description

string

No

The description.

This is a job description.

JobSpec

string

Yes

The task details.

{\"modelId\":\"xxx\"}

Description of the JobSpec parameter in CreateDatasetJob:

Semantic index task

Example:

  "modelId": "xxx",
  "modelVersion": "1.0.0",
  "contentList": ["file"],
  "embeddingConnectionId": "conn-xxx",
  "embeddingModel": "default",
  "databaseConnectionId": "conn-xxx",
  "databaseTableName": "table_xxx",
  "vectorIndexConfig":"{\"shards\":1,\"similarity\":\"cosine\",\"indexType\":\"hnsw\",\"indexOptions\":{\"m\":16,\"efConstruction\":200}}",
  "concurrency": 2
}

Field description:

Field NameTypeExampleRequiredDescription
modelIdStringmodel-xxxNoThe ID of the official model.
modelVersionString1.0.0NoThe version of the official model.
embeddingConnectionIdStringconn-xxxNoThe ID of the Elastic Algorithm Service (EAS) model service connection.
embeddingModelStringdefaultNoThe name of the model that corresponds to the EAS model service.
databaseConnectionIdStringconn-xxxNoThe ID of the vector database service connection.
databaseTableNameStringtable_xxxNoThe name of the vector database table.
concurrencyInteger2NoThe number of concurrent tasks.
contentListArrayYesThe list of content to index.
+-StringfileThe content to index. Currently, only file is supported.

Intelligent tagging task

Example:

{
  "intelligentTagConnectionId": "conn-keltvufiud3quopq11",
  "promptId": "pmt-gh6qaj1kvkf6yk7qx2",
  "modelId":"qwen-vl-max"
}

Field description:

Field NameTypeExampleRequiredDescription
modelIdStringqwen-vl-maxYesThe model name.
intelligentTagConnectionIdStringconn-keltvufiud3quopq11YesThe connection for tagging management.
promptIdStringpmt-gh6qaj1kvkf6yk7qx2YesThe prompt ID.

Metadata export task

Example:

{
  "query":{
    "QueryType": "TAG",
    "QueryText": "",
    "TopK": 100,
    "ScoreThreshold":0.6
  },
  "filteredAttributes":"FileName,Uri",
  "exportDirUri": "oss://bucket/path/" 
}

Field description:

Field NameTypeExampleRequiredDescription
queryJSONNoThe query conditions for the export. The fields are the same as those in the ListDatasetMetas operation. See: QueryParams
filteredAttributesStringComma-separatedNoIf specified, the exported results contain only the specified attribute fields. The fields are:
* Uri (required)
* DatasetFileMetaId
* FileName
* DataSize
* FileType
* ContentType
* Comment
* MetaAttributes
* FileFingerPrint
* FileCreateTime
* FileUpdateTime
* Tags.user: custom tags
* Tags.user-delete-ai-tags: algorithm tags deleted by the user
* Tags.ai: aggregated algorithm tags from all tagging tasks
* Tags.all: algorithm tags and custom tags, excluding algorithm tags deleted by the user














exportDirUriStringoss://bucket/path/
or
pvfs://cata_log/DB/lanceTable

YesThe OSS storage path for the exported content. This must be a folder path.
A folder named {datasetId}-{datasetversion}-{time:yyyy-MM-dd-HH-mm-ss} is created in this path to store the YAML and JSONL files.

QueryParams:

Field NameTypeExampleRequiredDescription
QueryTypeStringMIXNoMIX, VECTOR, TAG
QueryTextString"fallen water-filled barrier"NoThe text to search for.
QueryImageStringoss://bucket.cn-hangzhou.aliyuncs.com/image.jpgNoWhen you search by image, this parameter specifies the image information.
You can use an OSS URL of an image that is accessible over the public network.
QueryTagsIncludeAllStringblue cone,lane lineNoIndicates "includes all of the following tags".
You can select multiple tags. The query results must contain all of these tags.
If this parameter is empty, this condition is not applied.
This parameter is valid when QueryType is set to TAG or MIX.


QueryTagsIncludeAnyStringblue skyNoIndicates "includes any of the following tags".
You can select multiple tags. The query results must contain at least one of these tags.
If this parameter is empty, this condition is not applied.
This parameter is valid when QueryType is set to TAG or MIX.


QueryTagsExcludeStringovercastNoIndicates "excludes the following tags".
You can select multiple tags. The query results cannot contain these tags.
If this parameter is empty, this condition is not applied.
This parameter is valid when QueryType is set to TAG or MIX.


QueryFileNameStringwater_barrierNoPerforms a fuzzy search for the file name based on a 2-gram fuzzy match.
QueryFileDirStringoss://cars/20250221/NoPerforms a fuzzy search for the file folder based on a 2-gram fuzzy match.
QueryFileTypeIncludeAnyStringimage,videoNoIndicates "includes any of the following file types".
You can select multiple file types. The query results must match at least one of these file types.
If this parameter is empty, this condition is not applied.

QueryContentTypeIncludeAnyStringimage/jpeg,application/pdfNoIndicates "includes any of the following MIME types".
You can select multiple MIME types. The query results must match at least one of these MIME types.
If this parameter is empty, this condition is not applied.

StartFileUpdateTimeString2021-01-12T14:36:01.000ZNoQueries the file metadata within a time range. The start of the file update time.
The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z.
EndFileUpdateTimeString2021-01-12T14:36:01.000ZNoQueries the file metadata within a time range. The end of the file update time.
The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z.
StartTagUpdateTimeString2021-01-12T14:36:01.000ZNoQueries the file metadata within a time range. The start of the last tag update time.
The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z.
This parameter is valid when QueryType is set to TAG or MIX.

EndTagUpdateTimeString2021-01-12T14:36:01.000ZNoQueries the file metadata within a time range. The end of the last tag update time.
The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z.
This parameter is valid when QueryType is set to TAG or MIX.

TopKInteger100NoThe maximum number of exported items. By default, there is no limit.
ScoreThresholdFloat0.6The similarity score threshold. Only results with a score greater than the ScoreThreshold value are returned.
This parameter is valid when QueryType is set to VECTOR or MIX.
DatasetFileMetaIdsStringNoA list of file metadata IDs. The maximum number of IDs is 20.

Build and update metadata task

Example:

{}

Revoke intelligent tagging task

Example:

{
  "intelligentTagJobId": "dsjob-gh6qaj1kvkf6yk7qx2"
}

Field description:

Field NameTypeExampleRequiredDescription
intelligentTagJobIdStringdsjob-gh6qaj1kvkf6yk7qx2YesThe ID of the intelligent tagging task to revoke.

Metadata import task

Example using query conditions:

{
    "srcDatasetId": "d-1234",
    "srcDatasetVersion": "v1",
    "srcWorkspaceId": "12729",
    "query":
    {
        "QueryType": "TAG",
        "QueryText": "",
        "TopK": 100,
        "ScoreThreshold": 0.6
    }
}

Example using a pai_dataset_filemeta_manifest file:

{
    "manifestUri":"oss://bucket/export_path/d-mpdxv0lm9sndij7gpb-v1-2025-06-18-12-23-30/pai_dataset_filemeta_manifest.yaml"
}

Field description:

Field NameTypeExampleRequiredDescription
srcDatasetIdStringdsjob-gh6qaj1kvkf6yk7qx2Yes (query-based)The ID of the source dataset for the import.
srcDatasetVersionStringv1Yes (query-based)The version of the source dataset for the import.
srcWorkspaceIdString12729Yes (query-based)The workspace of the source dataset for the import.
queryJSONNo (query-based)The query conditions applied to the source dataset version.
See: QueryParams
manifestUriStringoss://bucket/export_path/d-mpdxv0lm9sndij7gpb-v1-2025-06-18-12-23-30/pai_dataset_filemeta_manifest.yamlYes (file-based)The path of the source manifest file for the import.
Only OSS URIs without an endpoint are supported.
filteredAttributesStringFileName,Uri,FileFingerPrint,DataSize,DataSize,FileUpdateTime,Tags.aiNoBy default, all attribute fields are imported.
If specified, the imported content contains only the specified attribute fields.
The fields are:
* FileName (required)
* Uri (required)
* FileFingerPrint (required)
* DataSize (required)
* FileType (required)
* ContentType (required)
* Comment
* MetaAttributes
* FileCreateTime
* FileUpdateTime
* Tags.user: custom tags
* Tags.user-delete-ai-tags: algorithm tags deleted by the user
* Tags.ai: aggregated algorithm tags from all tagging tasks














importModeStringappendNoThe import mode:
append (default): In append mode, files with the same URI are overwritten with the imported content.
replace: In replace mode, the original file metadata in the dataset version is deleted.

Response elements

Element

Type

Description

Example

object

The response structure.

RequestId

string

The request ID.

99341606-****-0757724D97EE

DatasetJobId

string

The dataset task ID.

dsjob-9jx1******uj9e

Examples

Success response

JSON format

{
  "RequestId": "99341606-****-0757724D97EE",
  "DatasetJobId": "dsjob-9jx1******uj9e"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.