Query Smart Tag tasks.
Try it now
Test
RAM authorization
|
Action |
Access level |
Resource type |
Condition key |
Dependent action |
|
ice:QuerySmarttagJob |
*All Resource
|
None | None |
Request parameters
|
Parameter |
Type |
Required |
Description |
Example |
| JobId |
string |
Yes |
The ID of the smart tagging job to query. You can obtain this ID from the response to the SubmitSmarttagJob operation. |
88c6ca184c0e47098a5b665e2**** |
| Params |
string |
No |
Additional request parameters, specified as a JSON string. For example,
|
{"labelResultType":"auto"} |
Response elements
|
Element |
Type |
Description |
Example |
|
object |
|||
| JobStatus |
string |
The status of the job. Valid values:
|
Success |
| RequestId |
string |
The ID of the request. |
******11-DB8D-4A9A-875B-275798****** |
| UserData |
string |
The custom data that you specified in the request. |
{"userId":"123432412831"} |
| Results |
object |
||
| Result |
array<object> |
The array of analysis results. |
|
|
object |
|||
| Type |
string |
The type of the analysis result.
|
Meta |
| Data |
string |
The detailed analysis results in a JSON-formatted string. The data structure varies based on the value of Type. For more information, see the description of the Result parameter. |
{"title":"example-title-****"} |
| Usages |
object |
||
| Usage |
array<object> |
||
|
object |
|||
| Type |
string |
||
| Quota |
integer |
Callback message format
When the status of a Smart Tagging job changes, ApsaraVideo for Media Processing (MPS) sends a message to the user-specified queue. (To specify the callback queue, see the UpdatePipeline API.) The message body is a JSON string with the following fields:
| Parameter | Type | Description |
| Type | String | The fixed string smarttag, which indicates a Smarttag Job. |
| JobId | String | The unique ID of the job. |
| State | String | The current status of the job. This value corresponds to the JobStatus parameter returned by the QuerySmarttagJob operation. |
| State | String | The current status of the job. This value corresponds to the JobStatus parameter returned by the QuerySmarttagJob operation. |
| UserData | String | The user-defined data passed in the SubmitSmarttagJob request. |
| UserData | String | The user-defined data passed in the SubmitSmarttagJob request. |
Result parameters
VideoLabel data structure
| Parameter | Type | Description |
| persons | Array of objects | An array of objects, each containing the results for a detected Person. |
| persons.name | String | The name of the detected Person. |
| persons.category | String | The Person Category. Valid values: celebrity, politician, sensitive, and unknown. For a Custom Person, this field returns the ID of the Custom Person library. |
| persons.ratio | Double | Proportion of the video duration where the Person appears. Value range: 0 to 1. |
| persons.occurrences | Array of objects | Array of objects, each detailing an occurrence of the Person. |
| persons.occurrences.score | Double | The Confidence Score for the detection. |
| persons.occurrences.from | Double | The Start Time, in seconds, of the segment where the Person appears. |
| persons.occurrences.to | Double | The End Time, in seconds, of the segment where the Person appears. |
| persons.occurrences.position | Object | The coordinates of the Face Bounding Box. |
| persons.occurrences.position.leftTop | Array of integers | The x and y coordinates of the top-left corner of the Face Bounding Box. |
| persons.occurrences.position.rightBottom | Array of integers | The x and y coordinates of the bottom-right corner of the Face Bounding Box. |
| persons.occurrences.timestamp | Double | The Timestamp, in seconds, for the frame's Face Bounding Box. |
| persons.occurrences.scene | String | The shot type. Valid values: closeUp (Close-up), medium-closeUp (Medium Close-up), medium (Medium Shot), and medium-long (Long Shot). |
| tags | Array of objects | Array of objects containing detected Labels for entities like objects and scenes. See the example table below. |
| tags.mainTagName | String | The main Label. |
| tags.subTagName | String | The sub-Label. |
| tags.ratio | Double | Proportion of the video duration where the Label appears. Value range: 0 to 1. |
| tags.occurrences | Array of objects | Array of objects, each detailing an occurrence of the Label. |
| tags.occurrences.score | Double | The Confidence Score for the detected Label. |
| tags.occurrences.from | Double | The Start Time, in seconds, of the segment where the Label appears. |
| tags.occurrences.to | Double | The End Time, in seconds, of the segment where the Label appears. |
| classifications | Array of objects | Array of objects containing Video Classification information. |
| classifications.score | Double | The Confidence Score for the Classification. |
| classifications.category1 | String | The Level-1 Category, for example, Lifestyle, Animation, or Automotive. |
| classifications.category2 | String | The Level-2 Category. For example, under the Lifestyle category, subcategories include Health and Home & Garden. |
Video Tag Examples
| Main tag name | Sub tag name |
| Program | Talk shows, reality TV, and comedy specials |
| Character | Doctor, nurse, and teacher |
| Object | Piano, cup, table, car, cosmetics, and food |
| Logo | Channel and brand logos |
| Action | Dancing, kissing, hugging, singing, fighting, and making a phone call |
| Landmark | Statue of Liberty, Eiffel Tower, and Tiananmen Square |
| Scene | Bedroom, subway station, beach, desert, and terraced fields |
ImageLabel data structure
| Parameter | Type | Description |
| persons | Array of objects | An array of objects, each containing the result for a detected person. |
| persons.name | String | The name of the detected person. |
| persons.category | String | The category of the person. Valid values are: celebrity, politician, and sensitive (Sensitive person). |
| persons.score | Double | The confidence score for the detected person. |
| persons.position | Object | The coordinates of the face bounding box. |
| persons.position.leftTop | Array of integers | The x and y coordinates of the upper-left corner of the bounding box. |
| persons.position.rightBottom | Array of integers | The x and y coordinates of the lower-right corner of the bounding box. |
| persons.scene | String | The shot type for the person. Valid values: closeUp (Close-up), medium-closeUp (Medium-close up), medium (Medium shot), and medium-long (Long shot). |
| tags | Array of objects | An array of objects, each containing a detected tag for an entity (such as an object or scene). |
| tags.mainTagName | String | The primary tag. |
| tags.subTagName | String | The subtag. |
| tags.score | Double | The confidence score for the detected tag. |
Image tag examples
| Tag name | Examples |
| Character | e.g., doctor, nurse, teacher |
| Location | e.g., Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, the United States |
| Action | e.g., giving a speech |
| Logo | e.g., CCTV1, CCTV2, Youku, Dragon TV |
| Action | e.g., dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting |
| Object | e.g., piano, cup, table, stir-fried tomato and egg, car, cosmetics |
| Scene | e.g., bedroom, subway station, terraced fields, beach, desert |
TextLabel Data Structure from ASR and OCR
| Parameter | Type | Description |
| tags | Array of objects | An array of tag objects. |
| tags.name | String | The tag key. |
| tags.value | String | The tag value. Use a comma (,) to separate multiple values. |
Image Tag Examples
| Parameter | Value |
| Region | e.g., China (Hangzhou), US (Silicon Valley). |
| Organization | e.g., Google, the United Nations, and the World Health Organization. |
| Brand | e.g., Nike and Apple. |
| Keyword | e.g., sustainable development. |
CPVLabel Data Structure
cates: Categories organized into a three-level hierarchy: Level 1, Level 2, and Level 3.
entities: Category attributes enriched with information from a knowledge graph.
hotwords: Keywords that are currently trending among users.
freeTags: Free-form tags or keywords that describe content.
| Parameter | Type | Example value | Description |
| type | String | hmi | Specifies the result type: hmi for Human-Machine Collaboration results and auto for Automated Tagging results. |
| cates | Array | - | An array of category objects. |
| cates.labelLevel1 | String | Travel | The Level-1 Label. |
| cates.labelLevel2 | String | Scenery | The Level-2 Label. |
| cates.label | String | "" | The Label Name. May be an empty string. |
| cates.appearanceProbability | Double | 0.96 | The appearance probability of the category. |
| cates.detailInfo | Array | - | An array of objects, each detailing an occurrence of the category. |
| cates.detailInfo.score | Double | 0.9 | The confidence score. |
| cates.detailInfo.startTime | Double | 0.021 | The start time of the segment, in seconds. |
| cates.detailInfo.endTime | Double | 29.021 | The end time of the segment, in seconds. |
| entities | Array | - | An array of detected Entity objects. |
| entities.labelLevel1 | String | Location | The Level-1 Label. |
| entities.labelLevel2 | String | Landmark | The Level-2 Label. |
| entities.label | String | Huangguoshu Waterfall | The Label Name. |
| entities.appearanceProbability | Double | 0.067 | The appearance probability of the entity. |
| entities.knowledgeInfo | String | {"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the largest waterfalls in Asia."} | A JSON string with information from the Knowledge Graph. For a complete list of fields, see the appendices for the Film/TV/Variety IP, Music, Person, Landmark, and Object Graphs. |
| entities.detailInfo | Array | - | An array of objects, each detailing an occurrence of the Entity. |
| entities.detailInfo.score | Double | 0.33292606472969055 | The confidence score. |
| entities.detailInfo.startTime | Double | 6.021 | The start time of the segment, in seconds. |
| entities.detailInfo.endTime | Double | 8.021 | The end time of the segment, in seconds. |
| entities.detailInfo.trackData | Array | - | An array of objects with structured information for the Entity Label. |
| entities.detailInfo.trackData.score | Double | 0.32 | The confidence score. |
| entities.detailInfo.trackData.bbox | Array of integers | 23, 43, 45, 67 | The coordinates of the bounding box. |
| entities.detailInfo.trackData.timestamp | Double | 7.9 | The timestamp of the frame containing the bounding box, in seconds. |
| hotwords | Array | - | An array of detected hotword objects. |
| hotwords.labelLevel1 | String | Keyword | The Level-1 Label. |
| hotwords.labelLevel2 | String | "" | The Level-2 Label. |
| hotwords.label | String | China Meteorological Administration | The hotword content. |
| hotwords.appearanceProbability | Double | 0.96 | The appearance probability of the hotword. |
| hotwords.detailInfo | Array | An array of objects, each detailing an occurrence of the hotword. | |
| hotwords.detailInfo.score | Double | 1.0 | The confidence score. |
| hotwords.detailInfo.startTime | Double | 0.021 | The start time of the segment, in seconds. |
| hotwords.detailInfo.endTime | Double | 29.021 | The end time of the segment, in seconds. |
| freeTags | Array | An array of free-form Label objects. | |
| freeTags.labelLevel1 | String | Keyword | The Level-1 Label. |
| freeTags.labelLevel2 | String | "" | The Level-2 Label. |
| freeTags.label | String | National Meteorological Center | The content of the free-form Label. |
| freeTags.appearanceProbability | Double | 0.96 | The appearance probability of the free-form Label. |
| freeTags.detailInfo | Array | An array of objects, each detailing an occurrence of the free-form Label. | |
| freeTags.detailInfo.score | Double | 0.9 | The confidence score. |
| freeTags.detailInfo.startTime | Double | 0.021 | The start time of the segment, in seconds. |
| freeTags.detailInfo.endTime | Double | 29.021 | The end time of the segment, in seconds. |
ASR results
| Parameter | Type | Description |
| details | Array of objects | An array of objects containing the detailed recognition results. |
| details.from | double | Start timestamp of the recognized text segment, in seconds. |
| details.to | double | End timestamp of the recognized text segment, in seconds. |
| details.content | String | The recognized text. |
Optical Character Recognition (OCR) Result
| Parameter | Type | Description |
details | Array of objects | An array of objects containing the detailed results of the task. |
details.timestamp | Double | The timestamp, in seconds, of the frame in which text is detected. |
details.info | Array of objects | An array of objects with details about the text recognized at this Timestamp. |
details.info.score | Double | The confidence score. |
details.info.position | Object | The bounding box coordinates of the detected text. |
details.info.position.leftTop | Array of integers | The x and y coordinates of the top-left corner of the bounding box. |
details.info.position.rightBottom | Array of integers | The x and y coordinates of the bottom-right corner of the bounding box. |
details.info.content | String | The recognized text. |
Meta annotation results
When calling SubmitSmarttagJob with the needMetaData parameter and not using Human-assisted Annotation, QuerySmarttagJob currently returns the Title originally provided in the request.
| Parameter | Type | Description |
| title | String | The title. |
Extracted Subtitles
| Parameter | Type | Description |
| details | Array of objects | Detailed results for the task. |
| details.allResultUrl | String | The URL for all subtitle results. This URL is valid for six months after the Task completes. |
| details.chResultUrl | String | The URL for the Chinese results. This URL is valid for six months after the Task completes. |
| details.engResultUrl | String | The URL for the English results. This URL is valid for six months after the Task completes. |
The content of the Subtitle Result URL has the following format: Sequence number + Time range + Subtitle Content (one subtitle per line).
NLP results
| Parameter | Type | Description |
| transcription | object | Speech transcription output. |
| autoChapters | object | Automatic chapters output. |
| summarization | object | Large model summarization output. |
| meetingAssistance | object | Intelligent meeting minutes output. |
| translation | object | Text translation output. |
Transcription
| Parameter | Type | Description |
| transcription | object | An object containing the speech-to-text transcription result. |
| transcription.paragraphs | array | An array of paragraph objects that structure the speech-to-text transcription. |
| transcription.paragraphs[i].paragraphId | string | The unique identifier for the paragraph. |
| transcription.paragraphs[i].speakerId | string | The unique identifier for the speaker. |
| transcription.paragraphs[i].words | array | An array of word objects in the paragraph. |
| transcription.paragraphs[i].words[i].id | integer | The sequence number of the word. You can typically ignore this value. |
| transcription.paragraphs[i].words[i].sentenceId | integer | The identifier for the sentence. Assemble words with the same sentenceId to form a complete sentence. |
| transcription.paragraphs[i].words[i].start | long | The start time of the word relative to the beginning of the audio, in milliseconds. |
| transcription.paragraphs[i].words[i].end | long | The end time of the word relative to the beginning of the audio, in milliseconds. |
| transcription.paragraphs[i].words[i].text | string | The text content of the word. |
Summarization (Full-text summary, Speaker summary, Question summary)
| Parameter | Type | Description |
| summarization | object | The summarization result object, which can contain results for zero or more summary types. |
| summarization.paragraphSummary | string | The full-text summary. |
| summarization.conversationalSummary | array | An array of conversational summary results. |
| summarization.conversationalSummary[i].speakerId | string | The speaker ID. |
| summarization.conversationalSummary[i].speakerName | string | The speaker's name. |
| summarization.conversationalSummary[i].summary | string | This speaker's summary. |
| summarization.questionsAnsweringSummary | array | An array of question-answering summary results. |
| summarization.questionsAnsweringSummary[i].question | string | The question. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion | array | An array of sentenceId values from the original transcription corresponding to the question. |
| summarization.questionsAnsweringSummary[i].answer | string | The answer. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer | array | An array of sentenceId values from the original transcription corresponding to the answer. |
| summarization.mindMapSummary | array of objects | An array of mind map summary results, which can contain topic summaries and their relationships. |
| summarization.mindMapSummary[i].title | string | The title of the topic. |
| summarization.mindMapSummary[i].topic | array of objects | An array of topic objects. |
| summarization.mindMapSummary[i].topic[i].title | string | The title of the topic. |
| summarization.mindMapSummary[i].topic[i].topic | array of objects | An array of sub-topics for the parent topic, which can be empty. |
Full-Text Translation
| Parameter | Type | Description |
| translation | object | Contains the translation results. |
| translation.paragraphs | array | An array of Paragraph objects containing the translation results. This array corresponds to the Message from the Speech Recognition result. |
| translation.paragraphs.paragraphId | string | The unique identifier for the Paragraph. This ID corresponds to the ParagraphId in the Speech Recognition result. |
| translation.paragraphs.sentences | array | An array of Sentence objects that form the translated paragraph. |
| translation.paragraphs.sentences[i].sentenceId | long | The unique identifier for the Sentence. |
| translation.paragraphs.sentences[i].start | long | The start time of the Sentence, a relative timestamp in milliseconds from the start of the Audio. |
| translation.paragraphs.sentences[i].end | long | The end time of the Sentence, a relative timestamp in milliseconds from the start of the Audio. |
| translation.paragraphs.sentences[i].text | string | The translated text from the Message in the Speech Recognition result. |
autoChapters (automatic chapter detection)
| Parameter | Type | Description |
| autoChapters | array | An array of auto-chapter objects. |
| autoChapters[i].id | integer | The chapter ID. |
| autoChapters[i].start | long | The chapter's start time in milliseconds, relative to the start of the audio. |
| autoChapters[i].end | long | The chapter's end time in milliseconds, relative to the start of the audio. |
| autoChapters[i].headline | string | The chapter's headline. |
| autoChapters[i].summary | string | The chapter summary. |
meetingAssistance: Intelligent Minute Extraction (Keywords, Key Sentences, and Action Items)
| Parameter | Type | Description |
| meetingAssistance | object | Contains the 'Intelligent Meeting Minutes' results. This can include results for zero or more result types. |
| meetingAssistance.keywords | array | An array of 'Keyword' results. |
| meetingAssistance.keySentences | array | An array of 'Key Sentence' results, also known as highlights. |
| meetingAssistance.keySentences[i].id | long | The sequence number of the 'Key Sentence'. |
| meetingAssistance.keySentences[i].sentenceId | long | The ID of the corresponding sentence in the original 'Transcription'. |
| meetingAssistance.keySentences[i].start | long | The Start Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'. |
| meetingAssistance.keySentences[i].end | long | The End Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'. |
| meetingAssistance.keySentences[i].text | string | The text content of the 'Key Sentence'. |
| meetingAssistance.actions | array | An array of 'To-do' items. |
| meetingAssistance.actions[i].id | long | The sequence number of the 'To-do'. |
| meetingAssistance.actions[i].sentenceId | long | The ID of the corresponding sentence in the original 'Transcription'. |
| meetingAssistance.actions[i].start | long | The Start Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'. |
| meetingAssistance.actions[i].end | long | The End Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'. |
| meetingAssistance.actions[i].text | string | The text content of the 'To-do'. |
| meetingAssistance.classifications | object | Contains the 'Scene Classification' results. Three scene types are currently supported. |
| meetingAssistance.classifications.interview | float | The confidence score for the 'interview' scene. |
| meetingAssistance.classifications.lecture | float | The confidence score for the 'lecture' scene. |
| meetingAssistance.classifications.meeting | float | The confidence score for the 'meeting' scene. |
Examples
Success response
JSON format
{
"JobStatus": "Success",
"RequestId": "******11-DB8D-4A9A-875B-275798******",
"UserData": "{\"userId\":\"123432412831\"}",
"Results": {
"Result": [
{
"Type": "Meta",
"Data": "{\"title\":\"example-title-****\"}\t\n"
}
]
},
"Usages": {
"Usage": [
{
"Type": "",
"Quota": 0
}
]
}
}
Error codes
See Error Codes for a complete list.
Release notes
See Release Notes for a complete list.