Retrieves the results of an AI analysis and processing task.
Debugging
Authorization information
The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:
- Operation: the value that you can use in the Action element to specify the operation on a resource.
- Access level: the access level of each operation. The levels are read, write, and list.
- Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- For mandatory resource types, indicate with a prefix of * .
- If the permissions cannot be granted at the resource level,
All Resourcesis used in the Resource type column of the operation.
- Condition Key: the condition key that is defined by the cloud service.
- Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
| Operation | Access level | Resource type | Condition key | Associated operation |
|---|---|---|---|---|
| ice:QueryVideoCognitionJob | get | *All Resources * |
| none |
Request parameters
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
| JobId | string | Yes | The ID of the task to query. It is returned when you call the SubmitSmarttagJob operation. | ****20b48fb04483915d4f2cd8ac**** |
| Params | string | No | Additional request parameters, provided as a JSON string. | {} |
| IncludeResults | object | No | Specifies whether to include the full algorithm results in the response. | |
| NeedAsr | boolean | No | Specifies whether to include Automatic Speech Recognition (ASR) results. | true |
| NeedOcr | boolean | No | Specifies whether to include Optical Character Recognition (OCR) results. | true |
| NeedProcess | boolean | No | Specifies whether to include the URL to the raw output of the algorithm. | true |
Response parameters
Result parameters
VideoLabel data structure
| Parameter | Type | Description |
|---|---|---|
| persons | JSONArray | An array of detected person results. |
| persons.name | String | The name of the recognized person. |
| persons.category | String | The category of the person. Valid values: celebrity, politician, sensitive, unknown, and the ID of a custom figure library. |
| persons.ratio | double | The appearance rate of the person. Valid values: 0 to 1. |
| persons.occurrences | JSONArray | An array of detailed appearance information for the person. |
| persons.occurrences.score | double | Confidence score of the recognition. |
| persons.occurrences.from | double | Start time of the appearance, in seconds. |
| persons.occurrences.to | double | End time of the appearance, in seconds. |
| persons.occurrences.position | JSONObject | Face coordinates. |
| persons.occurrences.position.leftTop | int[] | The x and y coordinates of the top-left corner. |
| persons.occurrences.position.rightBottom | int[] | The x and y coordinates of the bottom-right corner. |
| persons.occurrences.timestamp | double | The timestamp of this specific coordinate capture, in seconds. |
| persons.occurrences.scene | String | The shot type. Valid values: closeUp, medium-closeUp, medium, and medium-long (full shot). |
| tags | JSONArray | An array of detected objects, scenes, and other tags. See examples below. |
| tags.mainTagName | String | The main tag. |
| tags.subTagName | String | The subtag. |
| tags.ratio | double | The appearance rate of the tag. Valid values: 0 to 1. |
| tags.occurrences | JSONArray | An array of detailed appearance information for the tag. |
| tags.occurrences.score | double | The confidence score. |
| tags.occurrences.from | double | Start time, in seconds. |
| tags.occurrences.to | double | End time, in seconds. |
| classifications | JSONArray | An array of video classification information. |
| classifications.score | double | The confidence score of the classification. |
| classifications.category1 | String | The level-1 category, such as Lifestyle, Anime, or Automotive. |
| classifications.category2 | String | The level-2 category, such as health or home under the level-1 category Lifestyle. |
Tags examples
| mainTagName | subTagName |
|---|---|
| Program | Dad, Where Are We Going?, Top Funny Comedian |
| Role | Doctor, Nurse, Teacher |
| Object | Piano, Cup, Table, Scrambled eggs with tomato, car, cosmetics |
| Logo | CCTV-1, CCTV-2, CNN, BBC |
| Action | Dancing, Kissing, Hugging, Meeting, Singing, Calling, Horseback riding, Fighting |
| Location | Tiananmen Square, Statue of Liberty, Leshan Giant Buddha, China, America |
| Scene | Bedroom, Subway Station, Terraced Field, Beach, Desert |
ImageLabel data structure
| Parameter | Type | Description |
|---|---|---|
| persons | JSONArray | The information about the recognized people. |
| persons.name | String | The name of the recognized person. |
| persons.category | String | The type of the recognized person. Valid values: celebrity, politician, and sensitive. |
| persons.score | double | Confidence score of the recognition. |
| persons.position | JSONObject | Face coordinates. |
| persons.position.leftTop | int[] | The x and y coordinates of the top-left corner. |
| persons.position.rightBottom | int[] | The x and y coordinates of the bottom-right corner. |
| persons.scene | String | The shot type. Valid values: closeUp, medium-closeUp, medium, medium-long. |
| tags | JSONArray | An array of detected objects, scenes, and other tags. See examples below. |
| tags.mainTagName | String | The main tag. |
| tags.subTagName | String | The subtag. |
| tags.score | double | The confidence score. |
Tags examples
| mainTagName | subTagName |
|---|---|
| Role | Doctor, Nurse, Teacher |
| Location | Tiananmen Square, Statue of Liberty, Leshan Giant Buddha, China, America |
| Action | Talking |
| Logo | CCTV-1, CCTV-2, CNN, BBC |
| Action | Dancing, Kissing, Hugging, Meeting, Singing, Calling, Horseback riding, Fighting |
| Object | Piano, Cup, Table, Scrambled eggs with tomato, car, cosmetics |
| Scene | Bedroom, Subway Station, Terraced Field, Beach, Desert |
TextLabel data structure from ASR & OCR
| Parameter | Type | Description |
|---|---|---|
| tags | JSONArray | The text tags. For more information, see the following table. |
| tags.name | String | The category of the tag. |
| tags.value | String | The detected values. May contain multiple values separated by a comma. |
Tags examples
| name | value |
|---|---|
| Location | Tiananmen Square, Statue of Liberty, Leshan Giant Buddha, China, America |
| Organization | World Wildlife Fund, China Media Group |
| Logo | Nike, Li-Ning |
| Keyword | Backbone force |
CPVLabel data structure
- cates: An array of hierarchical category results.
- entities: An array of entity results, with knowledge graph data.
- hotwords: An array of detected hotwords from a watchlist.
- freeTags: free tags (keywords).
| Parameter | Type | Example | Description |
|---|---|---|---|
| type | String | hmi | The type of the result. Valid values: hmi (human-in-the-loop), autp (automated tagging). |
| cates | JSONArray | - | An array of hierarchical category results. |
| cates.labelLevel1 | String | Travel | The level-1 tag. |
| cates.labelLevel2 | String | Scenery | The level-2 tag. |
| cates.label | String | "" | The name of the tag. An empty value may be returned by the algorithm. |
| cates.appearanceProbability | double | 0.96 | The appearance rate. |
| cates.detailInfo | JSONArray | - | - |
| cates.detailInfo.score | double | 0.9 | The confidence score. |
| cates.detailInfo.startTime | double | 0.021 | Start time, in seconds. |
| cates.detailInfo.endTime | double | 29.021 | End time, in seconds. |
| entities | JSONArray | - | - |
| entities.labelLevel1 | String | Location | The level-1 tag. |
| entities.labelLevel2 | String | Landmark | The level-2 tag. |
| entities.label | String | Huangguoshu Waterfall | The name of the tag. |
| entities.appearanceProbability | double | 0.067 | The appearance rate. |
| entities.knowledgeInfo | String | {"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the four largest waterfalls in Asia"} | The knowledge graph information. The fields are provided in the Appendix. |
| entities.detailInfo | JSONArray | - | - |
| entities.detailInfo.score | double | 0.33292606472969055 | The confidence score. |
| entities.detailInfo.startTime | double | 6.021 | Start time, in seconds. |
| entities.detailInfo.endTime | double | 8.021 | End time, in seconds. |
| entities.detailInfo.trackData | JSONArray | - | Structured tracking data for the entity. |
| entities.detailInfo.trackData.score | double | 0.32 | The confidence score. |
| entities.detailInfo.trackData.bbox | integer[] | 23, 43, 45, 67 | The bounding box. |
| entities.detailInfo.trackData.timestamp | double | 7.9 | The timestamp. |
| hotwords | JSONArray | - | - |
| hotwords.labelLevel1 | String | The information about the hotword. | The level-1 tag. |
| hotwords.labelLevel1 | String | Hotword | The level-2 tag. |
| hotwords.labelLevel2 | String | "" | The level-2 tag. |
| hotwords.label | String | China Meteorological Administration | The content of the hotword. |
| hotwords.appearanceProbability | double | 0.96 | The appearance rate. |
| hotwords.detailInfo | JSONArray | ||
| hotwords.detailInfo.score | double | 1.0 | The confidence score. |
| hotwords.detailInfo.startTime | double | 0.021 | Start time, in seconds. |
| hotwords.detailInfo.endTime | double | 29.021 | End time, in seconds. |
| freeTags | JSONArray | ||
| freeTags.labelLevel1 | String | Hotword | The level-1 tag. |
| freeTags.labelLevel2 | String | "" | The level-2 tag. |
| freeTags.label | String | Central Meteorological Observatory | The content of the keyword. |
| freeTags.appearanceProbability | double | 0.96 | The appearance rate. |
| freeTags.detailInfo | JSONArray | ||
| freeTags.detailInfo.score | double | 0.9 | The confidence score. |
| freeTags.detailInfo.startTime | double | 0.021 | Start time, in seconds. |
| freeTags.detailInfo.endTime | double | 29.021 | End time, in seconds. |
ASR result
| Parameter | Type | Description |
|---|---|---|
| details | JSONArray | The details of the result. |
| details.from | double | The start timestamp. Unit: seconds. |
| details.to | double | The end timestamp. Unit: seconds. |
| details.content | String | The recognized text. |
OCR result
| Parameter | Type | Description |
|---|---|---|
| details | JSONArray | The details of the result. |
| details.timestamp | double | The timestamp. Unit: seconds. |
| details.info | JSONArray | The details of the recognized text at the specified timestamp. |
| details.info.score | double | The confidence score. |
| details.info.position | JSONObject | The coordinates of the text. |
| details.info.position.leftTop | int[] | The x and y coordinates of the top-left corner. |
| details.info.position.rightBottom | int[] | The x and y coordinates of the bottom-right corner. |
| details.info.content | String | The recognized text. |
Metadata
Note If you do not use the human-in-the-loop mode and you specify the needMetaData parameter when you call the SubmitSmarttagJob operation, the original title of the video is returned in the result.
| Parameter | Type | Description |
|---|---|---|
| title | String | The title of the video. |
Subtitle extraction result
| Parameter | Type | Description |
|---|---|---|
| details | JSONArray | The details of the result. |
| details.allResultUrl | String | The URL of the file that contains all subtitles. The URL is valid for half a year after the task is complete. |
| details.chResultUrl | String | The URL of the file that contains only Chinese subtitles. The URL is valid for half a year after the task is complete. |
| details.engResultUrl | String | The URL of the file that contains only English subtitles. The URL is valid for half a year after the task is complete. |
Note The content of the file is in the following format: Serial number + Time range + Subtitle content. Each line in the file contains a sentence.
NLP-based result
| Parameter | Type | Description |
|---|---|---|
| transcription | object | The transcription results. |
| autoChapters | object | A list of automatically generated chapters. |
| summarization | object | The AI-generated summaries. |
| meetingAssistance | object | The AI-generated minutes. |
| translation | object | The text translation result. |
Parameters of transcription
| Parameter | Type | Description |
|---|---|---|
| transcription | object | The transcription results. |
| transcription.paragraphs | list[] | The results are organized by paragraph. |
| transcription.paragraphs[i].paragraphId | string | The paragraph ID. |
| transcription.paragraphs[i].speakerId | string | The speaker ID. |
| transcription.paragraphs[i].words | list[] | The words contained in the paragraph. |
| transcription.paragraphs[i].words[i].id | int | The word ID. You do not need to pay attention to it. |
| transcription.paragraphs[i].words[i].sentenceId | int | The sentence ID. The words that have the same sentence ID can be assembled into a sentence. |
| transcription.paragraphs[i].words[i].start | long | The start time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| transcription.paragraphs[i].words[i].end | long | The end time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| transcription.paragraphs[i].words[i].text | string | The word content. |
Parameters of summarization
| Parameter | Type | Description |
|---|---|---|
| summarization | object | The summary results. The results may be empty or of different summary types. |
| summarization.paragraphSummary | string | The summary of the full text. |
| summarization.conversationalSummary | list[] | A list of speaker summary results. |
| summarization.conversationalSummary[i].speakerId | string | The ID of the speaker. |
| summarization.conversationalSummary[i].speakerName | string | The name of the speaker. |
| summarization.conversationalSummary[i].summary | string | The summary corresponding to the speaker. |
| summarization.questionsAnsweringSummary | list[] | A list of Q&A. |
| summarization.questionsAnsweringSummary[i].question | string | The question. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion | list[] | The IDs of transcribed sentences corresponding to this question. |
| summarization.questionsAnsweringSummary[i].answer | string | The answer to the question. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer | list[] | The IDs of transcribed sentences corresponding to this answer. |
| summarization.mindMapSummary | list[object] | The mind map. It may contain the summary of each topic and the relationship between topics. |
| summarization.mindMapSummary[i].title | string | The title of the topic. |
| summarization.mindMapSummary[i].topic | list[object] | An array that contains each topic and its subtopics. |
| summarization.mindMapSummary[i].topic[i].title | string | The title of the topic. |
| summarization.mindMapSummary[i].topic[i].topic | list[object] | An array that contains the subtopics of the topic. The array can be empty. |
Parameters of translation
| Parameter | Type | Description |
|---|---|---|
| translation | object | The translation result. |
| translation.paragraphs | list[] | The translation of the transcribed text, organized by paragraph. |
| translation.paragraphs.paragraphId | string | The paragraph ID, which corresponds to the paragraph ID in the ASR result. |
| translation.paragraphs.sentences | list[] | A list of translated sentences. |
| translation.paragraphs.sentences[i].sentenctId | long | The sentence ID. |
| translation.paragraphs.sentences[i].start | long | The start time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| translation.paragraphs.sentences[i].end | long | The end time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| translation.paragraphs.sentences[i].text | string | The translated text, which corresponds to the ASR result. |
Parameters of autoChapters
| Parameter | Type | Description |
|---|---|---|
| autoChapters | list[] | A list of automatically generated chapters. |
| autoChapters[i].id | int | The serial number of the chapter. |
| autoChapters[i].start | long | The start time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| autoChapters[i].end | long | The end time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| autoChapters[i].headline | string | The headline of the chapter. |
| autoChapters[i].summary | string | The chapter overview. |
Parameters of meetingAssistance
| Parameter | Type | Description |
|---|---|---|
| meetingAssistance | object | The result of the AI minutes, which may be empty or of different types. |
| meetingAssistance.keywords | list[] | A list of extracted keywords. |
| meetingAssistance.keySentences | list[] | A list of key sentences. |
| meetingAssistance.keySentences[i].id | long | The serial number of the key sentence. |
| meetingAssistance.keySentences[i].sentenceId | long | The ID of the key sentence, which corresponds to the sentence ID in the original ASR result. |
| meetingAssistance.keySentences[i].start | long | The start time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| meetingAssistance.keySentences[i].end | long | The end time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| meetingAssistance.keySentences[i].text | string | The key sentence information. |
| meetingAssistance.actions | list[] | A list of to-do items. |
| meetingAssistance.actions[i].id | long | The serial number of the to-do item. |
| meetingAssistance.actions[i].sentenceId | long | The ID of the key sentence, which corresponds to the sentence ID in the original ASR result. |
| meetingAssistance.actions[i].start | long | The start time. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| meetingAssistance.actions[i].end | long | The end time. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
| meetingAssistance.actions[i].text | string | The content of the to-do item. |
| meetingAssistance.classifications | object | The scene type. Only three types of scenes are supported. |
| meetingAssistance.classifications.interview | float | The confidence score for the interview scene. |
| meetingAssistance.classifications.lecture | float | The confidence score for the lecture scene. |
| meetingAssistance.classifications.meeting | float | The confidence score for the meeting scene. |
Examples
Sample success responses
JSONformat
{
"JobStatus": "Success",
"RequestId": "******11-DB8D-4A9A-875B-275798******\n",
"UserData": {
"userId": 123432412831
},
"Results": {
"result": [
{
"Type": "ASR",
"Data": {
"title": "example-title-****"
}
}
]
}
}Error codes
For a list of error codes, visit the Service error codes.
