Creates a dataset job.
Try it now
Test
RAM authorization
|
Action |
Access level |
Resource type |
Condition key |
Dependent action |
|
paidataset:CreateDatasetJob |
create |
*All Resource
|
None | None |
Request syntax
POST /api/v1/datasets/{DatasetId}/datasetjobs HTTP/1.1
Path Parameters
|
Parameter |
Type |
Required |
Description |
Example |
| DatasetId |
string |
Yes |
The dataset ID. For more information about how to obtain the dataset ID, see ListDatasets. |
d-rbvg5wz****c9ks92 |
Request parameters
|
Parameter |
Type |
Required |
Description |
Example |
| body |
object |
No |
The request body. |
|
| DatasetVersion |
string |
No |
The name of the dataset version. |
v1 |
| WorkspaceId |
string |
Yes |
The workspace ID. For more information about how to obtain the workspace ID, see ListWorkspaces. |
478** |
| JobAction |
string |
Yes |
The task operation.
Valid values:
|
SemanticIndex |
| JobMode |
string |
No |
The task type.
Valid values:
|
Full |
| Description |
string |
No |
The description. |
This is a job description. |
| JobSpec |
string |
Yes |
The task details. |
{\"modelId\":\"xxx\"} |
Description of the JobSpec parameter in CreateDatasetJob:
Semantic index task
Example:
"modelId": "xxx",
"modelVersion": "1.0.0",
"contentList": ["file"],
"embeddingConnectionId": "conn-xxx",
"embeddingModel": "default",
"databaseConnectionId": "conn-xxx",
"databaseTableName": "table_xxx",
"vectorIndexConfig":"{\"shards\":1,\"similarity\":\"cosine\",\"indexType\":\"hnsw\",\"indexOptions\":{\"m\":16,\"efConstruction\":200}}",
"concurrency": 2
}
Field description:
| Field Name | Type | Example | Required | Description |
| modelId | String | model-xxx | No | The ID of the official model. |
| modelVersion | String | 1.0.0 | No | The version of the official model. |
| embeddingConnectionId | String | conn-xxx | No | The ID of the Elastic Algorithm Service (EAS) model service connection. |
| embeddingModel | String | default | No | The name of the model that corresponds to the EAS model service. |
| databaseConnectionId | String | conn-xxx | No | The ID of the vector database service connection. |
| databaseTableName | String | table_xxx | No | The name of the vector database table. |
| concurrency | Integer | 2 | No | The number of concurrent tasks. |
| contentList | Array | Yes | The list of content to index. | |
| +- | String | file | The content to index. Currently, only file is supported. |
Intelligent tagging task
Example:
{
"intelligentTagConnectionId": "conn-keltvufiud3quopq11",
"promptId": "pmt-gh6qaj1kvkf6yk7qx2",
"modelId":"qwen-vl-max"
}
Field description:
| Field Name | Type | Example | Required | Description |
| modelId | String | qwen-vl-max | Yes | The model name. |
| intelligentTagConnectionId | String | conn-keltvufiud3quopq11 | Yes | The connection for tagging management. |
| promptId | String | pmt-gh6qaj1kvkf6yk7qx2 | Yes | The prompt ID. |
Metadata export task
Example:
{
"query":{
"QueryType": "TAG",
"QueryText": "",
"TopK": 100,
"ScoreThreshold":0.6
},
"filteredAttributes":"FileName,Uri",
"exportDirUri": "oss://bucket/path/"
}
Field description:
| Field Name | Type | Example | Required | Description |
| query | JSON | No | The query conditions for the export. The fields are the same as those in the ListDatasetMetas operation. See: QueryParams | |
| filteredAttributes | String | Comma-separated | No | If specified, the exported results contain only the specified attribute fields. The fields are: * Uri (required) * DatasetFileMetaId * FileName * DataSize * FileType * ContentType * Comment * MetaAttributes * FileFingerPrint * FileCreateTime * FileUpdateTime * Tags.user: custom tags * Tags.user-delete-ai-tags: algorithm tags deleted by the user * Tags.ai: aggregated algorithm tags from all tagging tasks * Tags.all: algorithm tags and custom tags, excluding algorithm tags deleted by the user |
| exportDirUri | String | oss://bucket/path/ or pvfs://cata_log/DB/lanceTable | Yes | The OSS storage path for the exported content. This must be a folder path. A folder named {datasetId}-{datasetversion}-{time:yyyy-MM-dd-HH-mm-ss} is created in this path to store the YAML and JSONL files. |
QueryParams:
| Field Name | Type | Example | Required | Description |
| QueryType | String | MIX | No | MIX, VECTOR, TAG |
| QueryText | String | "fallen water-filled barrier" | No | The text to search for. |
| QueryImage | String | oss://bucket.cn-hangzhou.aliyuncs.com/image.jpg | No | When you search by image, this parameter specifies the image information. You can use an OSS URL of an image that is accessible over the public network. |
| QueryTagsIncludeAll | String | blue cone,lane line | No | Indicates "includes all of the following tags". You can select multiple tags. The query results must contain all of these tags. If this parameter is empty, this condition is not applied. This parameter is valid when QueryType is set to TAG or MIX. |
| QueryTagsIncludeAny | String | blue sky | No | Indicates "includes any of the following tags". You can select multiple tags. The query results must contain at least one of these tags. If this parameter is empty, this condition is not applied. This parameter is valid when QueryType is set to TAG or MIX. |
| QueryTagsExclude | String | overcast | No | Indicates "excludes the following tags". You can select multiple tags. The query results cannot contain these tags. If this parameter is empty, this condition is not applied. This parameter is valid when QueryType is set to TAG or MIX. |
| QueryFileName | String | water_barrier | No | Performs a fuzzy search for the file name based on a 2-gram fuzzy match. |
| QueryFileDir | String | oss://cars/20250221/ | No | Performs a fuzzy search for the file folder based on a 2-gram fuzzy match. |
| QueryFileTypeIncludeAny | String | image,video | No | Indicates "includes any of the following file types". You can select multiple file types. The query results must match at least one of these file types. If this parameter is empty, this condition is not applied. |
| QueryContentTypeIncludeAny | String | image/jpeg,application/pdf | No | Indicates "includes any of the following MIME types". You can select multiple MIME types. The query results must match at least one of these MIME types. If this parameter is empty, this condition is not applied. |
| StartFileUpdateTime | String | 2021-01-12T14:36:01.000Z | No | Queries the file metadata within a time range. The start of the file update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z. |
| EndFileUpdateTime | String | 2021-01-12T14:36:01.000Z | No | Queries the file metadata within a time range. The end of the file update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z. |
| StartTagUpdateTime | String | 2021-01-12T14:36:01.000Z | No | Queries the file metadata within a time range. The start of the last tag update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z. This parameter is valid when QueryType is set to TAG or MIX. |
| EndTagUpdateTime | String | 2021-01-12T14:36:01.000Z | No | Queries the file metadata within a time range. The end of the last tag update time. The time is a UTC timestamp in the ISO 8601 format: 2021-01-12T14:36:01.000Z. This parameter is valid when QueryType is set to TAG or MIX. |
| TopK | Integer | 100 | No | The maximum number of exported items. By default, there is no limit. |
| ScoreThreshold | Float | 0.6 | The similarity score threshold. Only results with a score greater than the ScoreThreshold value are returned. This parameter is valid when QueryType is set to VECTOR or MIX. | |
| DatasetFileMetaIds | String | No | A list of file metadata IDs. The maximum number of IDs is 20. |
Build and update metadata task
Example:
{}
Revoke intelligent tagging task
Example:
{
"intelligentTagJobId": "dsjob-gh6qaj1kvkf6yk7qx2"
}
Field description:
| Field Name | Type | Example | Required | Description |
| intelligentTagJobId | String | dsjob-gh6qaj1kvkf6yk7qx2 | Yes | The ID of the intelligent tagging task to revoke. |
Metadata import task
Example using query conditions:
{
"srcDatasetId": "d-1234",
"srcDatasetVersion": "v1",
"srcWorkspaceId": "12729",
"query":
{
"QueryType": "TAG",
"QueryText": "",
"TopK": 100,
"ScoreThreshold": 0.6
}
}
Example using a pai_dataset_filemeta_manifest file:
{
"manifestUri":"oss://bucket/export_path/d-mpdxv0lm9sndij7gpb-v1-2025-06-18-12-23-30/pai_dataset_filemeta_manifest.yaml"
}
Field description:
| Field Name | Type | Example | Required | Description |
| srcDatasetId | String | dsjob-gh6qaj1kvkf6yk7qx2 | Yes (query-based) | The ID of the source dataset for the import. |
| srcDatasetVersion | String | v1 | Yes (query-based) | The version of the source dataset for the import. |
| srcWorkspaceId | String | 12729 | Yes (query-based) | The workspace of the source dataset for the import. |
| query | JSON | No (query-based) | The query conditions applied to the source dataset version. See: QueryParams | |
| manifestUri | String | oss://bucket/export_path/d-mpdxv0lm9sndij7gpb-v1-2025-06-18-12-23-30/pai_dataset_filemeta_manifest.yaml | Yes (file-based) | The path of the source manifest file for the import. Only OSS URIs without an endpoint are supported. |
| filteredAttributes | String | FileName,Uri,FileFingerPrint,DataSize,DataSize,FileUpdateTime,Tags.ai | No | By default, all attribute fields are imported. If specified, the imported content contains only the specified attribute fields. The fields are: * FileName (required) * Uri (required) * FileFingerPrint (required) * DataSize (required) * FileType (required) * ContentType (required) * Comment * MetaAttributes * FileCreateTime * FileUpdateTime * Tags.user: custom tags * Tags.user-delete-ai-tags: algorithm tags deleted by the user * Tags.ai: aggregated algorithm tags from all tagging tasks |
| importMode | String | append | No | The import mode: append (default): In append mode, files with the same URI are overwritten with the imported content. replace: In replace mode, the original file metadata in the dataset version is deleted. |
Response elements
|
Element |
Type |
Description |
Example |
|
object |
The response structure. |
||
| RequestId |
string |
The request ID. |
99341606-****-0757724D97EE |
| DatasetJobId |
string |
The dataset task ID. |
dsjob-9jx1******uj9e |
Examples
Success response
JSON format
{
"RequestId": "99341606-****-0757724D97EE",
"DatasetJobId": "dsjob-9jx1******uj9e"
}
Error codes
See Error Codes for a complete list.
Release notes
See Release Notes for a complete list.