All Products
Search
Document Center

Intelligent Media Services:QueryVideoCognitionJob

Last Updated:Jan 14, 2026

Retrieves the results of an AI analysis and processing task.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • For mandatory resource types, indicate with a prefix of * .
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
ice:QueryVideoCognitionJobget
*All Resources
*
    none
none

Request parameters

ParameterTypeRequiredDescriptionExample
JobIdstringYes

The ID of the task to query. It is returned when you call the SubmitSmarttagJob operation.

****20b48fb04483915d4f2cd8ac****
ParamsstringNo

Additional request parameters, provided as a JSON string.

{}
IncludeResultsobjectNo

Specifies whether to include the full algorithm results in the response.

NeedAsrbooleanNo

Specifies whether to include Automatic Speech Recognition (ASR) results.

true
NeedOcrbooleanNo

Specifies whether to include Optical Character Recognition (OCR) results.

true
NeedProcessbooleanNo

Specifies whether to include the URL to the raw output of the algorithm.

true

Response parameters

ParameterTypeDescriptionExample
object
JobStatusstring

The status of the task. Valid values:

  • Success
  • Fail
  • Processing
  • Submitted
Success
RequestIdstring

The request ID.

******11-DB8D-4A9A-875B-275798******
UserDatastring

The user-defined data.

{"userId":"123432412831"}
Resultsarray<object>

An array of analysis result objects.

resultobject
Typestring

The type of analysis result. Valid values:

  1. TextLabel: Tags from text content.
  2. VideoLabel: Tags from video content.
  3. ASR: Raw speech recognition results. Not returned by default.
  4. OCR: Raw text recognition results. Not returned by default.
  5. NLP: Natural Language Processing results. Not returned by default.
  6. Process: URL to the raw algorithm output. Not returned by default.
ASR
Datastring

A JSON string containing the detailed analysis data. The structure of this data depends on the Type field. For details, see the Result parameters section below.

{"title":"example-title-****"}

Result parameters

VideoLabel data structure

ParameterTypeDescription
personsJSONArrayAn array of detected person results.
persons.nameStringThe name of the recognized person.
persons.categoryStringThe category of the person. Valid values: celebrity, politician, sensitive, unknown, and the ID of a custom figure library.
persons.ratiodoubleThe appearance rate of the person. Valid values: 0 to 1.
persons.occurrencesJSONArrayAn array of detailed appearance information for the person.
persons.occurrences.scoredoubleConfidence score of the recognition.
persons.occurrences.fromdoubleStart time of the appearance, in seconds.
persons.occurrences.todoubleEnd time of the appearance, in seconds.
persons.occurrences.positionJSONObjectFace coordinates.
persons.occurrences.position.leftTopint[]The x and y coordinates of the top-left corner.
persons.occurrences.position.rightBottomint[]The x and y coordinates of the bottom-right corner.
persons.occurrences.timestampdoubleThe timestamp of this specific coordinate capture, in seconds.
persons.occurrences.sceneStringThe shot type. Valid values: closeUp, medium-closeUp, medium, and medium-long (full shot).
tagsJSONArrayAn array of detected objects, scenes, and other tags. See examples below.
tags.mainTagNameStringThe main tag.
tags.subTagNameStringThe subtag.
tags.ratiodoubleThe appearance rate of the tag. Valid values: 0 to 1.
tags.occurrencesJSONArrayAn array of detailed appearance information for the tag.
tags.occurrences.scoredoubleThe confidence score.
tags.occurrences.fromdoubleStart time, in seconds.
tags.occurrences.todoubleEnd time, in seconds.
classificationsJSONArrayAn array of video classification information.
classifications.scoredoubleThe confidence score of the classification.
classifications.category1StringThe level-1 category, such as Lifestyle, Anime, or Automotive.
classifications.category2StringThe level-2 category, such as health or home under the level-1 category Lifestyle.

Tags examples

mainTagNamesubTagName
ProgramDad, Where Are We Going?, Top Funny Comedian
RoleDoctor, Nurse, Teacher
ObjectPiano, Cup, Table, Scrambled eggs with tomato, car, cosmetics
LogoCCTV-1, CCTV-2, CNN, BBC
ActionDancing, Kissing, Hugging, Meeting, Singing, Calling, Horseback riding, Fighting
LocationTiananmen Square, Statue of Liberty, Leshan Giant Buddha, China, America
SceneBedroom, Subway Station, Terraced Field, Beach, Desert

ImageLabel data structure

ParameterTypeDescription
personsJSONArrayThe information about the recognized people.
persons.nameStringThe name of the recognized person.
persons.categoryStringThe type of the recognized person. Valid values: celebrity, politician, and sensitive.
persons.scoredoubleConfidence score of the recognition.
persons.positionJSONObjectFace coordinates.
persons.position.leftTopint[]The x and y coordinates of the top-left corner.
persons.position.rightBottomint[]The x and y coordinates of the bottom-right corner.
persons.sceneStringThe shot type. Valid values: closeUp, medium-closeUp, medium, medium-long.
tagsJSONArrayAn array of detected objects, scenes, and other tags. See examples below.
tags.mainTagNameStringThe main tag.
tags.subTagNameStringThe subtag.
tags.scoredoubleThe confidence score.

Tags examples

mainTagNamesubTagName
RoleDoctor, Nurse, Teacher
LocationTiananmen Square, Statue of Liberty, Leshan Giant Buddha, China, America
ActionTalking
LogoCCTV-1, CCTV-2, CNN, BBC
ActionDancing, Kissing, Hugging, Meeting, Singing, Calling, Horseback riding, Fighting
ObjectPiano, Cup, Table, Scrambled eggs with tomato, car, cosmetics
SceneBedroom, Subway Station, Terraced Field, Beach, Desert

TextLabel data structure from ASR & OCR

ParameterTypeDescription
tagsJSONArrayThe text tags. For more information, see the following table.
tags.nameStringThe category of the tag.
tags.valueStringThe detected values. May contain multiple values separated by a comma.

Tags examples

namevalue
LocationTiananmen Square, Statue of Liberty, Leshan Giant Buddha, China, America
OrganizationWorld Wildlife Fund, China Media Group
LogoNike, Li-Ning
KeywordBackbone force

CPVLabel data structure

  • cates: An array of hierarchical category results.
  • entities: An array of entity results, with knowledge graph data.
  • hotwords: An array of detected hotwords from a watchlist.
  • freeTags: free tags (keywords).
ParameterTypeExampleDescription
typeStringhmiThe type of the result. Valid values: hmi (human-in-the-loop), autp (automated tagging).
catesJSONArray-An array of hierarchical category results.
cates.labelLevel1StringTravelThe level-1 tag.
cates.labelLevel2StringSceneryThe level-2 tag.
cates.labelString""The name of the tag. An empty value may be returned by the algorithm.
cates.appearanceProbabilitydouble0.96The appearance rate.
cates.detailInfoJSONArray--
cates.detailInfo.scoredouble0.9The confidence score.
cates.detailInfo.startTimedouble0.021Start time, in seconds.
cates.detailInfo.endTimedouble29.021End time, in seconds.
entitiesJSONArray--
entities.labelLevel1StringLocationThe level-1 tag.
entities.labelLevel2StringLandmarkThe level-2 tag.
entities.labelStringHuangguoshu WaterfallThe name of the tag.
entities.appearanceProbabilitydouble0.067The appearance rate.
entities.knowledgeInfoString{"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the four largest waterfalls in Asia"}The knowledge graph information. The fields are provided in the Appendix.
entities.detailInfoJSONArray--
entities.detailInfo.scoredouble0.33292606472969055The confidence score.
entities.detailInfo.startTimedouble6.021Start time, in seconds.
entities.detailInfo.endTimedouble8.021End time, in seconds.
entities.detailInfo.trackDataJSONArray-Structured tracking data for the entity.
entities.detailInfo.trackData.scoredouble0.32The confidence score.
entities.detailInfo.trackData.bboxinteger[]23, 43, 45, 67The bounding box.
entities.detailInfo.trackData.timestampdouble7.9The timestamp.
hotwordsJSONArray--
hotwords.labelLevel1StringThe information about the hotword.The level-1 tag.
hotwords.labelLevel1StringHotwordThe level-2 tag.
hotwords.labelLevel2String""The level-2 tag.
hotwords.labelStringChina Meteorological AdministrationThe content of the hotword.
hotwords.appearanceProbabilitydouble0.96The appearance rate.
hotwords.detailInfoJSONArray
hotwords.detailInfo.scoredouble1.0The confidence score.
hotwords.detailInfo.startTimedouble0.021Start time, in seconds.
hotwords.detailInfo.endTimedouble29.021End time, in seconds.
freeTagsJSONArray
freeTags.labelLevel1StringHotwordThe level-1 tag.
freeTags.labelLevel2String""The level-2 tag.
freeTags.labelStringCentral Meteorological ObservatoryThe content of the keyword.
freeTags.appearanceProbabilitydouble0.96The appearance rate.
freeTags.detailInfoJSONArray
freeTags.detailInfo.scoredouble0.9The confidence score.
freeTags.detailInfo.startTimedouble0.021Start time, in seconds.
freeTags.detailInfo.endTimedouble29.021End time, in seconds.

ASR result

ParameterTypeDescription
detailsJSONArrayThe details of the result.
details.fromdoubleThe start timestamp. Unit: seconds.
details.todoubleThe end timestamp. Unit: seconds.
details.contentStringThe recognized text.

OCR result

ParameterTypeDescription
detailsJSONArrayThe details of the result.
details.timestampdoubleThe timestamp. Unit: seconds.
details.infoJSONArrayThe details of the recognized text at the specified timestamp.
details.info.scoredoubleThe confidence score.
details.info.positionJSONObjectThe coordinates of the text.
details.info.position.leftTopint[]The x and y coordinates of the top-left corner.
details.info.position.rightBottomint[]The x and y coordinates of the bottom-right corner.
details.info.contentStringThe recognized text.

Metadata

Note If you do not use the human-in-the-loop mode and you specify the needMetaData parameter when you call the SubmitSmarttagJob operation, the original title of the video is returned in the result.

ParameterTypeDescription
titleStringThe title of the video.

Subtitle extraction result

ParameterTypeDescription
detailsJSONArrayThe details of the result.
details.allResultUrlStringThe URL of the file that contains all subtitles. The URL is valid for half a year after the task is complete.
details.chResultUrlStringThe URL of the file that contains only Chinese subtitles. The URL is valid for half a year after the task is complete.
details.engResultUrlStringThe URL of the file that contains only English subtitles. The URL is valid for half a year after the task is complete.

Note The content of the file is in the following format: Serial number + Time range + Subtitle content. Each line in the file contains a sentence.

NLP-based result

ParameterTypeDescription
transcriptionobjectThe transcription results.
autoChaptersobjectA list of automatically generated chapters.
summarizationobjectThe AI-generated summaries.
meetingAssistanceobjectThe AI-generated minutes.
translationobjectThe text translation result.

Parameters of transcription

ParameterTypeDescription
transcriptionobjectThe transcription results.
transcription.paragraphslist[]The results are organized by paragraph.
transcription.paragraphs[i].paragraphIdstringThe paragraph ID.
transcription.paragraphs[i].speakerIdstringThe speaker ID.
transcription.paragraphs[i].wordslist[]The words contained in the paragraph.
transcription.paragraphs[i].words[i].idintThe word ID. You do not need to pay attention to it.
transcription.paragraphs[i].words[i].sentenceIdintThe sentence ID. The words that have the same sentence ID can be assembled into a sentence.
transcription.paragraphs[i].words[i].startlongThe start time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
transcription.paragraphs[i].words[i].endlongThe end time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
transcription.paragraphs[i].words[i].textstringThe word content.

Parameters of summarization

ParameterTypeDescription
summarizationobjectThe summary results. The results may be empty or of different summary types.
summarization.paragraphSummarystringThe summary of the full text.
summarization.conversationalSummarylist[]A list of speaker summary results.
summarization.conversationalSummary[i].speakerIdstringThe ID of the speaker.
summarization.conversationalSummary[i].speakerNamestringThe name of the speaker.
summarization.conversationalSummary[i].summarystringThe summary corresponding to the speaker.
summarization.questionsAnsweringSummarylist[]A list of Q&A.
summarization.questionsAnsweringSummary[i].questionstringThe question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestionlist[]The IDs of transcribed sentences corresponding to this question.
summarization.questionsAnsweringSummary[i].answerstringThe answer to the question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswerlist[]The IDs of transcribed sentences corresponding to this answer.
summarization.mindMapSummarylist[object]The mind map. It may contain the summary of each topic and the relationship between topics.
summarization.mindMapSummary[i].titlestringThe title of the topic.
summarization.mindMapSummary[i].topiclist[object]An array that contains each topic and its subtopics.
summarization.mindMapSummary[i].topic[i].titlestringThe title of the topic.
summarization.mindMapSummary[i].topic[i].topiclist[object]An array that contains the subtopics of the topic. The array can be empty.

Parameters of translation

ParameterTypeDescription
translationobjectThe translation result.
translation.paragraphslist[]The translation of the transcribed text, organized by paragraph.
translation.paragraphs.paragraphIdstringThe paragraph ID, which corresponds to the paragraph ID in the ASR result.
translation.paragraphs.sentenceslist[]A list of translated sentences.
translation.paragraphs.sentences[i].sentenctIdlongThe sentence ID.
translation.paragraphs.sentences[i].startlongThe start time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
translation.paragraphs.sentences[i].endlongThe end time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
translation.paragraphs.sentences[i].textstringThe translated text, which corresponds to the ASR result.

Parameters of autoChapters

ParameterTypeDescription
autoChapterslist[]A list of automatically generated chapters.
autoChapters[i].idintThe serial number of the chapter.
autoChapters[i].startlongThe start time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
autoChapters[i].endlongThe end time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
autoChapters[i].headlinestringThe headline of the chapter.
autoChapters[i].summarystringThe chapter overview.

Parameters of meetingAssistance

ParameterTypeDescription
meetingAssistanceobjectThe result of the AI minutes, which may be empty or of different types.
meetingAssistance.keywordslist[]A list of extracted keywords.
meetingAssistance.keySentenceslist[]A list of key sentences.
meetingAssistance.keySentences[i].idlongThe serial number of the key sentence.
meetingAssistance.keySentences[i].sentenceIdlongThe ID of the key sentence, which corresponds to the sentence ID in the original ASR result.
meetingAssistance.keySentences[i].startlongThe start time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.keySentences[i].endlongThe end time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.keySentences[i].textstringThe key sentence information.
meetingAssistance.actionslist[]A list of to-do items.
meetingAssistance.actions[i].idlongThe serial number of the to-do item.
meetingAssistance.actions[i].sentenceIdlongThe ID of the key sentence, which corresponds to the sentence ID in the original ASR result.
meetingAssistance.actions[i].startlongThe start time. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.actions[i].endlongThe end time. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.actions[i].textstringThe content of the to-do item.
meetingAssistance.classificationsobjectThe scene type. Only three types of scenes are supported.
meetingAssistance.classifications.interviewfloatThe confidence score for the interview scene.
meetingAssistance.classifications.lecturefloatThe confidence score for the lecture scene.
meetingAssistance.classifications.meetingfloatThe confidence score for the meeting scene.

Examples

Sample success responses

JSONformat

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******\n",
  "UserData": {
    "userId": 123432412831
  },
  "Results": {
    "result": [
      {
        "Type": "ASR",
        "Data": {
          "title": "example-title-****"
        }
      }
    ]
  }
}

Error codes

For a list of error codes, visit the Service error codes.