All Products
Search
Document Center

:QuerySmarttagJob

Last Updated:Mar 30, 2026

Query Smart Tag tasks.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:QuerySmarttagJob

*All Resource

*

None None

Request parameters

Parameter

Type

Required

Description

Example

JobId

string

Yes

The ID of the smart tagging job to query. You can obtain this ID from the response to the SubmitSmarttagJob operation.

88c6ca184c0e47098a5b665e2****

Params

string

No

Additional request parameters, specified as a JSON string. For example, {"labelResultType":"auto"}. The labelResultType field specifies the type of results to return. Valid values:

  • auto: Machine-generated results.

  • hmi: Human-machine co-annotation results.

{"labelResultType":"auto"}

Response elements

Element

Type

Description

Example

object

JobStatus

string

The status of the job. Valid values:

  • Success: The job was successful.

  • Fail: The job failed.

  • Processing: The job is in progress.

  • Submitted: The job is awaiting processing.

Success

RequestId

string

The ID of the request.

******11-DB8D-4A9A-875B-275798******

UserData

string

The custom data that you specified in the request.

{"userId":"123432412831"}

Results

object

Result

array<object>

The array of analysis results.

object

Type

string

The type of the analysis result.

  • - Analysis result types for Label 1.0:

  1. TextLabel: text label.

  2. VideoLabel: video label.

  3. ASR: Automatic Speech Recognition results, which are not returned by default.

  4. OCR: Optical Character Recognition results, which are not returned by default.

  5. NLP: Natural Language Processing results, which are not returned by default.

  • - Analysis result types for Label 2.0:

  1. CPVLabel

  2. Meta: metadata such as the video title, which is not returned by default.

  • - Analysis result types for Label 2.0-custom:

  1. CPVLabel

  2. Meta: metadata such as the video title, which is not returned by default.

Meta

Data

string

The detailed analysis results in a JSON-formatted string. The data structure varies based on the value of Type. For more information, see the description of the Result parameter.

{"title":"example-title-****"}

Usages

object

Usage

array<object>

object

Type

string

Quota

integer

Callback message format When the status of a Smart Tagging job changes, ApsaraVideo for Media Processing (MPS) sends a message to the user-specified queue. (To specify the callback queue, see the UpdatePipeline API.) The message body is a JSON string with the following fields:

ParameterTypeDescription
TypeStringThe fixed string smarttag, which indicates a Smarttag Job.
JobIdStringThe unique ID of the job.
StateStringThe current status of the job. This value corresponds to the JobStatus parameter returned by the QuerySmarttagJob operation.
StateStringThe current status of the job. This value corresponds to the JobStatus parameter returned by the QuerySmarttagJob operation.
UserDataStringThe user-defined data passed in the SubmitSmarttagJob request.
UserDataStringThe user-defined data passed in the SubmitSmarttagJob request.

Result parameters

VideoLabel data structure

ParameterTypeDescription
personsArray of objectsAn array of objects, each containing the results for a detected Person.
persons.nameStringThe name of the detected Person.
persons.categoryStringThe Person Category. Valid values: celebrity, politician, sensitive, and unknown. For a Custom Person, this field returns the ID of the Custom Person library.
persons.ratioDoubleProportion of the video duration where the Person appears. Value range: 0 to 1.
persons.occurrencesArray of objectsArray of objects, each detailing an occurrence of the Person.
persons.occurrences.scoreDoubleThe Confidence Score for the detection.
persons.occurrences.fromDoubleThe Start Time, in seconds, of the segment where the Person appears.
persons.occurrences.toDoubleThe End Time, in seconds, of the segment where the Person appears.
persons.occurrences.positionObjectThe coordinates of the Face Bounding Box.
persons.occurrences.position.leftTopArray of integersThe x and y coordinates of the top-left corner of the Face Bounding Box.
persons.occurrences.position.rightBottomArray of integersThe x and y coordinates of the bottom-right corner of the Face Bounding Box.
persons.occurrences.timestampDoubleThe Timestamp, in seconds, for the frame's Face Bounding Box.
persons.occurrences.sceneStringThe shot type. Valid values: closeUp (Close-up), medium-closeUp (Medium Close-up), medium (Medium Shot), and medium-long (Long Shot).
tagsArray of objectsArray of objects containing detected Labels for entities like objects and scenes. See the example table below.
tags.mainTagNameStringThe main Label.
tags.subTagNameStringThe sub-Label.
tags.ratioDoubleProportion of the video duration where the Label appears. Value range: 0 to 1.
tags.occurrencesArray of objectsArray of objects, each detailing an occurrence of the Label.
tags.occurrences.scoreDoubleThe Confidence Score for the detected Label.
tags.occurrences.fromDoubleThe Start Time, in seconds, of the segment where the Label appears.
tags.occurrences.toDoubleThe End Time, in seconds, of the segment where the Label appears.
classificationsArray of objectsArray of objects containing Video Classification information.
classifications.scoreDoubleThe Confidence Score for the Classification.
classifications.category1StringThe Level-1 Category, for example, Lifestyle, Animation, or Automotive.
classifications.category2StringThe Level-2 Category. For example, under the Lifestyle category, subcategories include Health and Home & Garden.

Video Tag Examples

Main tag nameSub tag name
ProgramTalk shows, reality TV, and comedy specials
CharacterDoctor, nurse, and teacher
ObjectPiano, cup, table, car, cosmetics, and food
LogoChannel and brand logos
ActionDancing, kissing, hugging, singing, fighting, and making a phone call
LandmarkStatue of Liberty, Eiffel Tower, and Tiananmen Square
SceneBedroom, subway station, beach, desert, and terraced fields

ImageLabel data structure

ParameterTypeDescription
personsArray of objectsAn array of objects, each containing the result for a detected person.
persons.nameStringThe name of the detected person.
persons.categoryStringThe category of the person. Valid values are: celebrity, politician, and sensitive (Sensitive person).
persons.scoreDoubleThe confidence score for the detected person.
persons.positionObjectThe coordinates of the face bounding box.
persons.position.leftTopArray of integersThe x and y coordinates of the upper-left corner of the bounding box.
persons.position.rightBottomArray of integersThe x and y coordinates of the lower-right corner of the bounding box.
persons.sceneStringThe shot type for the person. Valid values: closeUp (Close-up), medium-closeUp (Medium-close up), medium (Medium shot), and medium-long (Long shot).
tagsArray of objectsAn array of objects, each containing a detected tag for an entity (such as an object or scene).
tags.mainTagNameStringThe primary tag.
tags.subTagNameStringThe subtag.
tags.scoreDoubleThe confidence score for the detected tag.

Image tag examples

Tag nameExamples
Charactere.g., doctor, nurse, teacher
Locatione.g., Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, the United States
Actione.g., giving a speech
Logoe.g., CCTV1, CCTV2, Youku, Dragon TV
Actione.g., dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting
Objecte.g., piano, cup, table, stir-fried tomato and egg, car, cosmetics
Scenee.g., bedroom, subway station, terraced fields, beach, desert

TextLabel Data Structure from ASR and OCR

ParameterTypeDescription
tagsArray of objectsAn array of tag objects.
tags.nameStringThe tag key.
tags.valueStringThe tag value. Use a comma (,) to separate multiple values.

Image Tag Examples

ParameterValue
Regione.g., China (Hangzhou), US (Silicon Valley).
Organizatione.g., Google, the United Nations, and the World Health Organization.
Brande.g., Nike and Apple.
Keyworde.g., sustainable development.

CPVLabel Data Structure

  • cates: Categories organized into a three-level hierarchy: Level 1, Level 2, and Level 3.

  • entities: Category attributes enriched with information from a knowledge graph.

  • hotwords: Keywords that are currently trending among users.

  • freeTags: Free-form tags or keywords that describe content.

ParameterTypeExample valueDescription
typeStringhmiSpecifies the result type: hmi for Human-Machine Collaboration results and auto for Automated Tagging results.
catesArray-An array of category objects.
cates.labelLevel1StringTravelThe Level-1 Label.
cates.labelLevel2StringSceneryThe Level-2 Label.
cates.labelString""The Label Name. May be an empty string.
cates.appearanceProbabilityDouble0.96The appearance probability of the category.
cates.detailInfoArray-An array of objects, each detailing an occurrence of the category.
cates.detailInfo.scoreDouble0.9The confidence score.
cates.detailInfo.startTimeDouble0.021The start time of the segment, in seconds.
cates.detailInfo.endTimeDouble29.021The end time of the segment, in seconds.
entitiesArray-An array of detected Entity objects.
entities.labelLevel1StringLocationThe Level-1 Label.
entities.labelLevel2StringLandmarkThe Level-2 Label.
entities.labelStringHuangguoshu WaterfallThe Label Name.
entities.appearanceProbabilityDouble0.067The appearance probability of the entity.
entities.knowledgeInfoString{"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the largest waterfalls in Asia."}A JSON string with information from the Knowledge Graph. For a complete list of fields, see the appendices for the Film/TV/Variety IP, Music, Person, Landmark, and Object Graphs.
entities.detailInfoArray-An array of objects, each detailing an occurrence of the Entity.
entities.detailInfo.scoreDouble0.33292606472969055The confidence score.
entities.detailInfo.startTimeDouble6.021The start time of the segment, in seconds.
entities.detailInfo.endTimeDouble8.021The end time of the segment, in seconds.
entities.detailInfo.trackDataArray-An array of objects with structured information for the Entity Label.
entities.detailInfo.trackData.scoreDouble0.32The confidence score.
entities.detailInfo.trackData.bboxArray of integers23, 43, 45, 67The coordinates of the bounding box.
entities.detailInfo.trackData.timestampDouble7.9The timestamp of the frame containing the bounding box, in seconds.
hotwordsArray-An array of detected hotword objects.
hotwords.labelLevel1StringKeywordThe Level-1 Label.
hotwords.labelLevel2String""The Level-2 Label.
hotwords.labelStringChina Meteorological AdministrationThe hotword content.
hotwords.appearanceProbabilityDouble0.96The appearance probability of the hotword.
hotwords.detailInfoArrayAn array of objects, each detailing an occurrence of the hotword.
hotwords.detailInfo.scoreDouble1.0The confidence score.
hotwords.detailInfo.startTimeDouble0.021The start time of the segment, in seconds.
hotwords.detailInfo.endTimeDouble29.021The end time of the segment, in seconds.
freeTagsArrayAn array of free-form Label objects.
freeTags.labelLevel1StringKeywordThe Level-1 Label.
freeTags.labelLevel2String""The Level-2 Label.
freeTags.labelStringNational Meteorological CenterThe content of the free-form Label.
freeTags.appearanceProbabilityDouble0.96The appearance probability of the free-form Label.
freeTags.detailInfoArrayAn array of objects, each detailing an occurrence of the free-form Label.
freeTags.detailInfo.scoreDouble0.9The confidence score.
freeTags.detailInfo.startTimeDouble0.021The start time of the segment, in seconds.
freeTags.detailInfo.endTimeDouble29.021The end time of the segment, in seconds.

ASR results

ParameterTypeDescription
detailsArray of objectsAn array of objects containing the detailed recognition results.
details.fromdoubleStart timestamp of the recognized text segment, in seconds.
details.todoubleEnd timestamp of the recognized text segment, in seconds.
details.contentStringThe recognized text.

Optical Character Recognition (OCR) Result

ParameterTypeDescription
detailsArray of objectsAn array of objects containing the detailed results of the task.
details.timestampDoubleThe timestamp, in seconds, of the frame in which text is detected.
details.infoArray of objectsAn array of objects with details about the text recognized at this Timestamp.
details.info.scoreDoubleThe confidence score.
details.info.positionObjectThe bounding box coordinates of the detected text.
details.info.position.leftTopArray of integersThe x and y coordinates of the top-left corner of the bounding box.
details.info.position.rightBottomArray of integersThe x and y coordinates of the bottom-right corner of the bounding box.
details.info.contentStringThe recognized text.

Meta annotation results

Note

When calling SubmitSmarttagJob with the needMetaData parameter and not using Human-assisted Annotation, QuerySmarttagJob currently returns the Title originally provided in the request.

ParameterTypeDescription
titleStringThe title.

Extracted Subtitles

ParameterTypeDescription
detailsArray of objectsDetailed results for the task.
details.allResultUrlStringThe URL for all subtitle results. This URL is valid for six months after the Task completes.
details.chResultUrlStringThe URL for the Chinese results. This URL is valid for six months after the Task completes.
details.engResultUrlStringThe URL for the English results. This URL is valid for six months after the Task completes.
Note

The content of the Subtitle Result URL has the following format: Sequence number + Time range + Subtitle Content (one subtitle per line).

NLP results

ParameterTypeDescription
transcriptionobjectSpeech transcription output.
autoChaptersobjectAutomatic chapters output.
summarizationobjectLarge model summarization output.
meetingAssistanceobjectIntelligent meeting minutes output.
translationobjectText translation output.

Transcription

ParameterTypeDescription
transcriptionobjectAn object containing the speech-to-text transcription result.
transcription.paragraphsarrayAn array of paragraph objects that structure the speech-to-text transcription.
transcription.paragraphs[i].paragraphIdstringThe unique identifier for the paragraph.
transcription.paragraphs[i].speakerIdstringThe unique identifier for the speaker.
transcription.paragraphs[i].wordsarrayAn array of word objects in the paragraph.
transcription.paragraphs[i].words[i].idintegerThe sequence number of the word. You can typically ignore this value.
transcription.paragraphs[i].words[i].sentenceIdintegerThe identifier for the sentence. Assemble words with the same sentenceId to form a complete sentence.
transcription.paragraphs[i].words[i].startlongThe start time of the word relative to the beginning of the audio, in milliseconds.
transcription.paragraphs[i].words[i].endlongThe end time of the word relative to the beginning of the audio, in milliseconds.
transcription.paragraphs[i].words[i].textstringThe text content of the word.

Summarization (Full-text summary, Speaker summary, Question summary)

ParameterTypeDescription
summarizationobjectThe summarization result object, which can contain results for zero or more summary types.
summarization.paragraphSummarystringThe full-text summary.
summarization.conversationalSummaryarrayAn array of conversational summary results.
summarization.conversationalSummary[i].speakerIdstringThe speaker ID.
summarization.conversationalSummary[i].speakerNamestringThe speaker's name.
summarization.conversationalSummary[i].summarystringThis speaker's summary.
summarization.questionsAnsweringSummaryarrayAn array of question-answering summary results.
summarization.questionsAnsweringSummary[i].questionstringThe question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestionarrayAn array of sentenceId values from the original transcription corresponding to the question.
summarization.questionsAnsweringSummary[i].answerstringThe answer.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswerarrayAn array of sentenceId values from the original transcription corresponding to the answer.
summarization.mindMapSummaryarray of objectsAn array of mind map summary results, which can contain topic summaries and their relationships.
summarization.mindMapSummary[i].titlestringThe title of the topic.
summarization.mindMapSummary[i].topicarray of objectsAn array of topic objects.
summarization.mindMapSummary[i].topic[i].titlestringThe title of the topic.
summarization.mindMapSummary[i].topic[i].topicarray of objectsAn array of sub-topics for the parent topic, which can be empty.

Full-Text Translation

ParameterTypeDescription
translationobjectContains the translation results.
translation.paragraphsarrayAn array of Paragraph objects containing the translation results. This array corresponds to the Message from the Speech Recognition result.
translation.paragraphs.paragraphIdstringThe unique identifier for the Paragraph. This ID corresponds to the ParagraphId in the Speech Recognition result.
translation.paragraphs.sentencesarrayAn array of Sentence objects that form the translated paragraph.
translation.paragraphs.sentences[i].sentenceIdlongThe unique identifier for the Sentence.
translation.paragraphs.sentences[i].startlongThe start time of the Sentence, a relative timestamp in milliseconds from the start of the Audio.
translation.paragraphs.sentences[i].endlongThe end time of the Sentence, a relative timestamp in milliseconds from the start of the Audio.
translation.paragraphs.sentences[i].textstringThe translated text from the Message in the Speech Recognition result.

autoChapters (automatic chapter detection)

ParameterTypeDescription
autoChaptersarrayAn array of auto-chapter objects.
autoChapters[i].idintegerThe chapter ID.
autoChapters[i].startlongThe chapter's start time in milliseconds, relative to the start of the audio.
autoChapters[i].endlongThe chapter's end time in milliseconds, relative to the start of the audio.
autoChapters[i].headlinestringThe chapter's headline.
autoChapters[i].summarystringThe chapter summary.

meetingAssistance: Intelligent Minute Extraction (Keywords, Key Sentences, and Action Items)

ParameterTypeDescription
meetingAssistanceobjectContains the 'Intelligent Meeting Minutes' results. This can include results for zero or more result types.
meetingAssistance.keywordsarrayAn array of 'Keyword' results.
meetingAssistance.keySentencesarrayAn array of 'Key Sentence' results, also known as highlights.
meetingAssistance.keySentences[i].idlongThe sequence number of the 'Key Sentence'.
meetingAssistance.keySentences[i].sentenceIdlongThe ID of the corresponding sentence in the original 'Transcription'.
meetingAssistance.keySentences[i].startlongThe Start Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.keySentences[i].endlongThe End Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.keySentences[i].textstringThe text content of the 'Key Sentence'.
meetingAssistance.actionsarrayAn array of 'To-do' items.
meetingAssistance.actions[i].idlongThe sequence number of the 'To-do'.
meetingAssistance.actions[i].sentenceIdlongThe ID of the corresponding sentence in the original 'Transcription'.
meetingAssistance.actions[i].startlongThe Start Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.actions[i].endlongThe End Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.actions[i].textstringThe text content of the 'To-do'.
meetingAssistance.classificationsobjectContains the 'Scene Classification' results. Three scene types are currently supported.
meetingAssistance.classifications.interviewfloatThe confidence score for the 'interview' scene.
meetingAssistance.classifications.lecturefloatThe confidence score for the 'lecture' scene.
meetingAssistance.classifications.meetingfloatThe confidence score for the 'meeting' scene.

Examples

Success response

JSON format

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******",
  "UserData": "{\"userId\":\"123432412831\"}",
  "Results": {
    "Result": [
      {
        "Type": "Meta",
        "Data": "{\"title\":\"example-title-****\"}\t\n"
      }
    ]
  },
  "Usages": {
    "Usage": [
      {
        "Type": "",
        "Quota": 0
      }
    ]
  }
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.