Trigger Dataset Jobs via CreateDatasetJob API - Platform For AI

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.
API: The API that you can call to perform the action.
Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.
Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.
- For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.
- For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.
Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.
Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

paidataset:CreateDatasetJob

create

*All Resource

*

None

Request syntax

POST /api/v1/datasets/{DatasetId}/datasetjobs HTTP/1.1

Path Parameters

Parameter	Type	Required	Description	Example
DatasetId	string	Yes	The dataset ID. For more information about how to obtain the dataset ID, see ListDatasets.	d-rbvg5wz****c9ks92

Request parameters

Parameter	Type	Required	Description	Example
body	object	No	The request body.
DatasetVersion	string	No	The name of the dataset version.	v1
WorkspaceId	string	Yes	The workspace ID. For more information about how to obtain the workspace ID, see ListWorkspaces.	478**
JobAction	string	Yes	The task operation. SemanticIndex: semantic index IntelligentTag: intelligent tagging FileMetaExport: metadata export FileMetaBuild: build and update metadata IntelligentTagRevert: revoke intelligent tagging FileMetaImport: metadata import Valid values: SemanticIndex : SemanticIndex IntelligentTag : IntelligentTag FileMetaExport : FileMetaExport	SemanticIndex
JobMode	string	No	The task type. Full (default): forces the processing of all metadata. This task takes a long time to execute. Increment: processes only changed or unsuccessfully processed metadata. The SemanticIndex and IntelligentTag tasks support Increment and Full. Other tasks support only Full. Valid values: Full : Full	Full
Description	string	No	The description.	This is a job description.
JobSpec	string	Yes	The task details.	{\"modelId\":\"xxx\"}

Description of the JobSpec parameter in CreateDatasetJob:

Semantic index task

Example:

  "modelId": "xxx",
  "modelVersion": "1.0.0",
  "contentList": ["file"],
  "embeddingConnectionId": "conn-xxx",
  "embeddingModel": "default",
  "databaseConnectionId": "conn-xxx",
  "databaseTableName": "table_xxx",
  "vectorIndexConfig":"{\"shards\":1,\"similarity\":\"cosine\",\"indexType\":\"hnsw\",\"indexOptions\":{\"m\":16,\"efConstruction\":200}}",
  "concurrency": 2
}

Field description:

Field Name	Type	Example	Required	Description
modelId	String	model-xxx	No	The ID of the official model.
modelVersion	String	1.0.0	No	The version of the official model.
embeddingConnectionId	String	conn-xxx	No	The ID of the Elastic Algorithm Service (EAS) model service connection.
embeddingModel	String	default	No	The name of the model that corresponds to the EAS model service.
databaseConnectionId	String	conn-xxx	No	The ID of the vector database service connection.
databaseTableName	String	table_xxx	No	The name of the vector database table.
concurrency	Integer	2	No	The number of concurrent tasks.
contentList	Array		Yes	The list of content to index.
+-	String	file		The content to index. Currently, only file is supported.

Intelligent tagging task

Example:

{
  "intelligentTagConnectionId": "conn-keltvufiud3quopq11",
  "promptId": "pmt-gh6qaj1kvkf6yk7qx2",
  "modelId":"qwen-vl-max"
}

Field description:

Field Name	Type	Example	Required	Description
modelId	String	qwen-vl-max	Yes	The model name.
intelligentTagConnectionId	String	conn-keltvufiud3quopq11	Yes	The connection for tagging management.
promptId	String	pmt-gh6qaj1kvkf6yk7qx2	Yes	The prompt ID.

Metadata export task

Example:

{
  "query":{
    "QueryType": "TAG",
    "QueryText": "",
    "TopK": 100,
    "ScoreThreshold":0.6
  },
  "filteredAttributes":"FileName,Uri",
  "exportDirUri": "oss://bucket/path/" 
}

Field description:

Field Name	Type	Example	Required	Description
query	JSON		No	The query conditions for the export. The fields are the same as those in the `ListDatasetMetas` operation. See: `QueryParams`
filteredAttributes	String	Comma-separated	No	If specified, the exported results contain only the specified attribute fields. The fields are: * Uri (required) * DatasetFileMetaId * FileName * DataSize * FileType * ContentType * Comment * MetaAttributes * FileFingerPrint * FileCreateTime * FileUpdateTime * Tags.user: custom tags * Tags.user-delete-ai-tags: algorithm tags deleted by the user * Tags.ai: aggregated algorithm tags from all tagging tasks * Tags.all: algorithm tags and custom tags, excluding algorithm tags deleted by the user
exportDirUri	String	oss://bucket/path/ or pvfs://cata_log/DB/lanceTable	Yes	The OSS storage path for the exported content. This must be a folder path. A folder named `{datasetId}-{datasetversion}-{time:yyyy-MM-dd-HH-mm-ss}` is created in this path to store the YAML and JSONL files.

QueryParams:

Field Name	Type	Example	Required	Description
QueryType	String	MIX	No	MIX, VECTOR, TAG
QueryText	String	"fallen water-filled barrier"	No	The text to search for.
QueryImage	String	oss://bucket.cn-hangzhou.aliyuncs.com/image.jpg	No	When you search by image, this parameter specifies the image information. You can use an OSS URL of an image that is accessible over the public network.
QueryTagsIncludeAll	String	blue cone,lane line	No	Indicates "includes all of the following tags". You can select multiple tags. The query results must contain all of these tags. If this parameter is empty, this condition is not applied. This parameter is valid when QueryType is set to TAG or MIX.
QueryTagsIncludeAny	String	blue sky	No	Indicates "includes any of the following tags". You can select multiple tags. The query results must contain at least one of these tags. If this parameter is empty, this condition is not applied. This parameter is valid when QueryType is set to TAG or MIX.
QueryTagsExclude	String	overcast	No	Indicates "excludes the following tags". You can select multiple tags. The query results cannot contain these tags. If this parameter is empty, this condition is not applied. This parameter is valid when QueryType is set to TAG or MIX.
QueryFileName	String	water_barrier	No	Performs a fuzzy search for the file name based on a 2-gram fuzzy match.
QueryFileDir	String	oss://cars/20250221/	No	Performs a fuzzy search for the file folder based on a 2-gram fuzzy match.
QueryFileTypeIncludeAny	String	image,video	No	Indicates "includes any of the following file types". You can select multiple file types. The query results must match at least one of these file types. If this parameter is empty, this condition is not applied.
QueryContentTypeIncludeAny	String	image/jpeg,application/pdf	No	Indicates "includes any of the following MIME types". You can select multiple MIME types. The query results must match at least one of these MIME types. If this parameter is empty, this condition is not applied.
StartFileUpdateTime	String	2021-01-12T14:36:01.000Z	No	Queries the file metadata within a time range. The start of the file update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z.
EndFileUpdateTime	String	2021-01-12T14:36:01.000Z	No	Queries the file metadata within a time range. The end of the file update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z.
StartTagUpdateTime	String	2021-01-12T14:36:01.000Z	No	Queries the file metadata within a time range. The start of the last tag update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z. This parameter is valid when QueryType is set to TAG or MIX.
EndTagUpdateTime	String	2021-01-12T14:36:01.000Z	No	Queries the file metadata within a time range. The end of the last tag update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z. This parameter is valid when QueryType is set to TAG or MIX.
TopK	Integer	100	No	The maximum number of exported items. By default, there is no limit.
ScoreThreshold	Float	0.6		The similarity score threshold. Only results with a score greater than the ScoreThreshold value are returned. This parameter is valid when QueryType is set to VECTOR or MIX.
DatasetFileMetaIds	String		No	A list of file metadata IDs. The maximum number of IDs is 20.

Build and update metadata task

Example:

{}

Revoke intelligent tagging task

Example:

{
  "intelligentTagJobId": "dsjob-gh6qaj1kvkf6yk7qx2"
}

Field description:

Field Name	Type	Example	Required	Description
intelligentTagJobId	String	dsjob-gh6qaj1kvkf6yk7qx2	Yes	The ID of the intelligent tagging task to revoke.

Metadata import task

Example using query conditions:

{
    "srcDatasetId": "d-1234",
    "srcDatasetVersion": "v1",
    "srcWorkspaceId": "12729",
    "query":
    {
        "QueryType": "TAG",
        "QueryText": "",
        "TopK": 100,
        "ScoreThreshold": 0.6
    }
}

Example using a pai_dataset_filemeta_manifest file:

{
    "manifestUri":"oss://bucket/export_path/d-mpdxv0lm9sndij7gpb-v1-2025-06-18-12-23-30/pai_dataset_filemeta_manifest.yaml"
}

Field description:

Field Name	Type	Example	Required	Description
srcDatasetId	String	dsjob-gh6qaj1kvkf6yk7qx2	Yes (query-based)	The ID of the source dataset for the import.
srcDatasetVersion	String	v1	Yes (query-based)	The version of the source dataset for the import.
srcWorkspaceId	String	12729	Yes (query-based)	The workspace of the source dataset for the import.
query	JSON		No (query-based)	The query conditions applied to the source dataset version. See: `QueryParams`
manifestUri	String	oss://bucket/export_path/d-mpdxv0lm9sndij7gpb-v1-2025-06-18-12-23-30/pai_dataset_filemeta_manifest.yaml	Yes (file-based)	The path of the source manifest file for the import. Only OSS URIs without an endpoint are supported.
filteredAttributes	String	FileName,Uri,FileFingerPrint,DataSize,DataSize,FileUpdateTime,Tags.ai	No	By default, all attribute fields are imported. If specified, the imported content contains only the specified attribute fields. The fields are: * FileName (required) * Uri (required) * FileFingerPrint (required) * DataSize (required) * FileType (required) * ContentType (required) * Comment * MetaAttributes * FileCreateTime * FileUpdateTime * Tags.user: custom tags * Tags.user-delete-ai-tags: algorithm tags deleted by the user * Tags.ai: aggregated algorithm tags from all tagging tasks
importMode	String	append	No	The import mode: append (default): In append mode, files with the same URI are overwritten with the imported content. replace: In replace mode, the original file metadata in the dataset version is deleted.

Response elements

Element	Type	Description	Example
	object	The response structure.
RequestId	string	The request ID.	99341606-****-0757724D97EE
DatasetJobId	string	The dataset task ID.	dsjob-9jx1******uj9e

Examples

Success response

JSON format

{
  "RequestId": "99341606-****-0757724D97EE",
  "DatasetJobId": "dsjob-9jx1******uj9e"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.