QuerySmarttagJob - - Alibaba Cloud Documentation Center

Query Smart Tag tasks.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.
API: The API that you can call to perform the action.
Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.
Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.
- For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.
- For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.
Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.
Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:QuerySmarttagJob

*All Resource

*

None

Request parameters

Parameter	Type	Required	Description	Example
JobId	string	Yes	The ID of the smart tagging job to query. You can obtain this ID from the response to the SubmitSmarttagJob operation.	88c6ca184c0e47098a5b665e2****
Params	string	No	Additional request parameters, specified as a JSON string. For example, `{"labelResultType":"auto"}`. The `labelResultType` field specifies the type of results to return. Valid values: `auto`: Machine-generated results. `hmi`: Human-machine co-annotation results.	{"labelResultType":"auto"}

Response elements

Element	Type	Description	Example
	object
JobStatus	string	The status of the job. Valid values: Success: The job was successful. Fail: The job failed. Processing: The job is in progress. Submitted: The job is awaiting processing.	Success
RequestId	string	The ID of the request.	****11-DB8D-4A9A-875B-275798****
UserData	string	The custom data that you specified in the request.	{"userId":"123432412831"}
Results	object
Result	array<object>	The array of analysis results.
	object
Type	string	The type of the analysis result. - Analysis result types for Label 1.0: TextLabel: text label. VideoLabel: video label. ASR: Automatic Speech Recognition results, which are not returned by default. OCR: Optical Character Recognition results, which are not returned by default. NLP: Natural Language Processing results, which are not returned by default. - Analysis result types for Label 2.0: CPVLabel Meta: metadata such as the video title, which is not returned by default. - Analysis result types for Label 2.0-custom: CPVLabel Meta: metadata such as the video title, which is not returned by default.	Meta
Data	string	The detailed analysis results in a JSON-formatted string. The data structure varies based on the value of Type. For more information, see the description of the Result parameter.	{"title":"example-title-****"}
Usages	object
Usage	array<object>
	object
Type	string
Quota	integer

Callback message format When the status of a Smart Tagging job changes, ApsaraVideo for Media Processing (MPS) sends a message to the user-specified queue. (To specify the callback queue, see the UpdatePipeline API.) The message body is a JSON string with the following fields:

Parameter	Type	Description
Type	String	The fixed string `smarttag`, which indicates a Smarttag Job.
JobId	String	The unique ID of the job.
State	String	The current status of the job. This value corresponds to the `JobStatus` parameter returned by the `QuerySmarttagJob` operation.
State	String	The current status of the job. This value corresponds to the `JobStatus` parameter returned by the `QuerySmarttagJob` operation.
UserData	String	The user-defined data passed in the `SubmitSmarttagJob` request.
UserData	String	The user-defined data passed in the `SubmitSmarttagJob` request.

Result parameters

VideoLabel data structure

Parameter	Type	Description
persons	Array of objects	An array of objects, each containing the results for a detected Person.
persons.name	String	The name of the detected Person.
persons.category	String	The Person Category. Valid values: `celebrity`, `politician`, `sensitive`, and `unknown`. For a Custom Person, this field returns the ID of the Custom Person library.
persons.ratio	Double	Proportion of the video duration where the Person appears. Value range: 0 to 1.
persons.occurrences	Array of objects	Array of objects, each detailing an occurrence of the Person.
persons.occurrences.score	Double	The Confidence Score for the detection.
persons.occurrences.from	Double	The Start Time, in seconds, of the segment where the Person appears.
persons.occurrences.to	Double	The End Time, in seconds, of the segment where the Person appears.
persons.occurrences.position	Object	The coordinates of the Face Bounding Box.
persons.occurrences.position.leftTop	Array of integers	The x and y coordinates of the top-left corner of the Face Bounding Box.
persons.occurrences.position.rightBottom	Array of integers	The x and y coordinates of the bottom-right corner of the Face Bounding Box.
persons.occurrences.timestamp	Double	The Timestamp, in seconds, for the frame's Face Bounding Box.
persons.occurrences.scene	String	The shot type. Valid values: `closeUp` (Close-up), `medium-closeUp` (Medium Close-up), `medium` (Medium Shot), and `medium-long` (Long Shot).
tags	Array of objects	Array of objects containing detected Labels for entities like objects and scenes. See the example table below.
tags.mainTagName	String	The main Label.
tags.subTagName	String	The sub-Label.
tags.ratio	Double	Proportion of the video duration where the Label appears. Value range: 0 to 1.
tags.occurrences	Array of objects	Array of objects, each detailing an occurrence of the Label.
tags.occurrences.score	Double	The Confidence Score for the detected Label.
tags.occurrences.from	Double	The Start Time, in seconds, of the segment where the Label appears.
tags.occurrences.to	Double	The End Time, in seconds, of the segment where the Label appears.
classifications	Array of objects	Array of objects containing Video Classification information.
classifications.score	Double	The Confidence Score for the Classification.
classifications.category1	String	The Level-1 Category, for example, `Lifestyle`, `Animation`, or `Automotive`.
classifications.category2	String	The Level-2 Category. For example, under the `Lifestyle` category, subcategories include Health and Home & Garden.

Video Tag Examples

Main tag name	Sub tag name
Program	Talk shows, reality TV, and comedy specials
Character	Doctor, nurse, and teacher
Object	Piano, cup, table, car, cosmetics, and food
Logo	Channel and brand logos
Action	Dancing, kissing, hugging, singing, fighting, and making a phone call
Landmark	Statue of Liberty, Eiffel Tower, and Tiananmen Square
Scene	Bedroom, subway station, beach, desert, and terraced fields

ImageLabel data structure

Parameter	Type	Description
persons	Array of objects	An array of objects, each containing the result for a detected person.
persons.name	String	The name of the detected person.
persons.category	String	The category of the person. Valid values are: `celebrity`, `politician`, and `sensitive` (Sensitive person).
persons.score	Double	The confidence score for the detected person.
persons.position	Object	The coordinates of the face bounding box.
persons.position.leftTop	Array of integers	The x and y coordinates of the upper-left corner of the bounding box.
persons.position.rightBottom	Array of integers	The x and y coordinates of the lower-right corner of the bounding box.
persons.scene	String	The shot type for the person. Valid values: `closeUp` (Close-up), `medium-closeUp` (Medium-close up), `medium` (Medium shot), and `medium-long` (Long shot).
tags	Array of objects	An array of objects, each containing a detected tag for an entity (such as an object or scene).
tags.mainTagName	String	The primary tag.
tags.subTagName	String	The subtag.
tags.score	Double	The confidence score for the detected tag.

Image tag examples

Tag name	Examples
Character	e.g., doctor, nurse, teacher
Location	e.g., Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, the United States
Action	e.g., giving a speech
Logo	e.g., CCTV1, CCTV2, Youku, Dragon TV
Action	e.g., dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting
Object	e.g., piano, cup, table, stir-fried tomato and egg, car, cosmetics
Scene	e.g., bedroom, subway station, terraced fields, beach, desert

TextLabel Data Structure from ASR and OCR

Parameter	Type	Description
tags	Array of objects	An array of tag objects.
tags.name	String	The tag key.
tags.value	String	The tag value. Use a comma (,) to separate multiple values.

Image Tag Examples

Parameter	Value
Region	e.g., China (Hangzhou), US (Silicon Valley).
Organization	e.g., Google, the United Nations, and the World Health Organization.
Brand	e.g., Nike and Apple.
Keyword	e.g., sustainable development.

CPVLabel Data Structure

cates: Categories organized into a three-level hierarchy: Level 1, Level 2, and Level 3.
entities: Category attributes enriched with information from a knowledge graph.
hotwords: Keywords that are currently trending among users.
freeTags: Free-form tags or keywords that describe content.

Parameter	Type	Example value	Description
type	String	hmi	Specifies the result type: `hmi` for Human-Machine Collaboration results and `auto` for Automated Tagging results.
cates	Array	-	An array of category objects.
cates.labelLevel1	String	Travel	The Level-1 Label.
cates.labelLevel2	String	Scenery	The Level-2 Label.
cates.label	String	""	The Label Name. May be an empty string.
cates.appearanceProbability	Double	0.96	The appearance probability of the category.
cates.detailInfo	Array	-	An array of objects, each detailing an occurrence of the category.
cates.detailInfo.score	Double	0.9	The confidence score.
cates.detailInfo.startTime	Double	0.021	The start time of the segment, in seconds.
cates.detailInfo.endTime	Double	29.021	The end time of the segment, in seconds.
entities	Array	-	An array of detected Entity objects.
entities.labelLevel1	String	Location	The Level-1 Label.
entities.labelLevel2	String	Landmark	The Level-2 Label.
entities.label	String	Huangguoshu Waterfall	The Label Name.
entities.appearanceProbability	Double	0.067	The appearance probability of the entity.
entities.knowledgeInfo	String	{"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the largest waterfalls in Asia."}	A JSON string with information from the Knowledge Graph. For a complete list of fields, see the appendices for the Film/TV/Variety IP, Music, Person, Landmark, and Object Graphs.
entities.detailInfo	Array	-	An array of objects, each detailing an occurrence of the Entity.
entities.detailInfo.score	Double	0.33292606472969055	The confidence score.
entities.detailInfo.startTime	Double	6.021	The start time of the segment, in seconds.
entities.detailInfo.endTime	Double	8.021	The end time of the segment, in seconds.
entities.detailInfo.trackData	Array	-	An array of objects with structured information for the Entity Label.
entities.detailInfo.trackData.score	Double	0.32	The confidence score.
entities.detailInfo.trackData.bbox	Array of integers	23, 43, 45, 67	The coordinates of the bounding box.
entities.detailInfo.trackData.timestamp	Double	7.9	The timestamp of the frame containing the bounding box, in seconds.
hotwords	Array	-	An array of detected hotword objects.
hotwords.labelLevel1	String	Keyword	The Level-1 Label.
hotwords.labelLevel2	String	""	The Level-2 Label.
hotwords.label	String	China Meteorological Administration	The hotword content.
hotwords.appearanceProbability	Double	0.96	The appearance probability of the hotword.
hotwords.detailInfo	Array		An array of objects, each detailing an occurrence of the hotword.
hotwords.detailInfo.score	Double	1.0	The confidence score.
hotwords.detailInfo.startTime	Double	0.021	The start time of the segment, in seconds.
hotwords.detailInfo.endTime	Double	29.021	The end time of the segment, in seconds.
freeTags	Array		An array of free-form Label objects.
freeTags.labelLevel1	String	Keyword	The Level-1 Label.
freeTags.labelLevel2	String	""	The Level-2 Label.
freeTags.label	String	National Meteorological Center	The content of the free-form Label.
freeTags.appearanceProbability	Double	0.96	The appearance probability of the free-form Label.
freeTags.detailInfo	Array		An array of objects, each detailing an occurrence of the free-form Label.
freeTags.detailInfo.score	Double	0.9	The confidence score.
freeTags.detailInfo.startTime	Double	0.021	The start time of the segment, in seconds.
freeTags.detailInfo.endTime	Double	29.021	The end time of the segment, in seconds.

ASR results

Parameter	Type	Description
details	Array of objects	An array of objects containing the detailed recognition results.
details.from	double	Start timestamp of the recognized text segment, in seconds.
details.to	double	End timestamp of the recognized text segment, in seconds.
details.content	String	The recognized text.

Optical Character Recognition (OCR) Result

Parameter	Type	Description
`details`	Array of objects	An array of objects containing the detailed results of the task.
`details.timestamp`	Double	The timestamp, in seconds, of the frame in which text is detected.
`details.info`	Array of objects	An array of objects with details about the text recognized at this Timestamp.
`details.info.score`	Double	The confidence score.
`details.info.position`	Object	The bounding box coordinates of the detected text.
`details.info.position.leftTop`	Array of integers	The x and y coordinates of the top-left corner of the bounding box.
`details.info.position.rightBottom`	Array of integers	The x and y coordinates of the bottom-right corner of the bounding box.
`details.info.content`	String	The recognized text.

Meta annotation results

Note

When calling SubmitSmarttagJob with the needMetaData parameter and not using Human-assisted Annotation, QuerySmarttagJob currently returns the Title originally provided in the request.

Parameter	Type	Description
title	String	The title.

Extracted Subtitles

Parameter	Type	Description
details	Array of objects	Detailed results for the task.
details.allResultUrl	String	The URL for all subtitle results. This URL is valid for six months after the Task completes.
details.chResultUrl	String	The URL for the Chinese results. This URL is valid for six months after the Task completes.
details.engResultUrl	String	The URL for the English results. This URL is valid for six months after the Task completes.

Note

The content of the Subtitle Result URL has the following format: Sequence number + Time range + Subtitle Content (one subtitle per line).

NLP results

Parameter	Type	Description
transcription	object	Speech transcription output.
autoChapters	object	Automatic chapters output.
summarization	object	Large model summarization output.
meetingAssistance	object	Intelligent meeting minutes output.
translation	object	Text translation output.

Transcription

Parameter	Type	Description
transcription	object	An object containing the speech-to-text transcription result.
transcription.paragraphs	array	An array of paragraph objects that structure the speech-to-text transcription.
transcription.paragraphs[i].paragraphId	string	The unique identifier for the paragraph.
transcription.paragraphs[i].speakerId	string	The unique identifier for the speaker.
transcription.paragraphs[i].words	array	An array of word objects in the paragraph.
transcription.paragraphs[i].words[i].id	integer	The sequence number of the word. You can typically ignore this value.
transcription.paragraphs[i].words[i].sentenceId	integer	The identifier for the sentence. Assemble words with the same `sentenceId` to form a complete sentence.
transcription.paragraphs[i].words[i].start	long	The start time of the word relative to the beginning of the audio, in milliseconds.
transcription.paragraphs[i].words[i].end	long	The end time of the word relative to the beginning of the audio, in milliseconds.
transcription.paragraphs[i].words[i].text	string	The text content of the word.

Summarization (Full-text summary, Speaker summary, Question summary)

Parameter	Type	Description
summarization	object	The summarization result object, which can contain results for zero or more summary types.
summarization.paragraphSummary	string	The full-text summary.
summarization.conversationalSummary	array	An array of conversational summary results.
summarization.conversationalSummary[i].speakerId	string	The speaker ID.
summarization.conversationalSummary[i].speakerName	string	The speaker's name.
summarization.conversationalSummary[i].summary	string	This speaker's summary.
summarization.questionsAnsweringSummary	array	An array of question-answering summary results.
summarization.questionsAnsweringSummary[i].question	string	The question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion	array	An array of `sentenceId` values from the original transcription corresponding to the question.
summarization.questionsAnsweringSummary[i].answer	string	The answer.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer	array	An array of `sentenceId` values from the original transcription corresponding to the answer.
summarization.mindMapSummary	array of objects	An array of mind map summary results, which can contain topic summaries and their relationships.
summarization.mindMapSummary[i].title	string	The title of the topic.
summarization.mindMapSummary[i].topic	array of objects	An array of topic objects.
summarization.mindMapSummary[i].topic[i].title	string	The title of the topic.
summarization.mindMapSummary[i].topic[i].topic	array of objects	An array of sub-topics for the parent topic, which can be empty.

Full-Text Translation

Parameter	Type	Description
translation	object	Contains the translation results.
translation.paragraphs	array	An array of `Paragraph` objects containing the translation results. This array corresponds to the `Message` from the Speech Recognition result.
translation.paragraphs.paragraphId	string	The unique identifier for the `Paragraph`. This ID corresponds to the `ParagraphId` in the Speech Recognition result.
translation.paragraphs.sentences	array	An array of `Sentence` objects that form the translated paragraph.
translation.paragraphs.sentences[i].sentenceId	long	The unique identifier for the `Sentence`.
translation.paragraphs.sentences[i].start	long	The start time of the `Sentence`, a relative timestamp in milliseconds from the start of the `Audio`.
translation.paragraphs.sentences[i].end	long	The end time of the `Sentence`, a relative timestamp in milliseconds from the start of the `Audio`.
translation.paragraphs.sentences[i].text	string	The translated text from the `Message` in the Speech Recognition result.

autoChapters (automatic chapter detection)

Parameter	Type	Description
autoChapters	array	An array of auto-chapter objects.
autoChapters[i].id	integer	The chapter ID.
autoChapters[i].start	long	The chapter's start time in milliseconds, relative to the start of the audio.
autoChapters[i].end	long	The chapter's end time in milliseconds, relative to the start of the audio.
autoChapters[i].headline	string	The chapter's headline.
autoChapters[i].summary	string	The chapter summary.

meetingAssistance: Intelligent Minute Extraction (Keywords, Key Sentences, and Action Items)

Parameter	Type	Description
meetingAssistance	object	Contains the 'Intelligent Meeting Minutes' results. This can include results for zero or more result types.
meetingAssistance.keywords	array	An array of 'Keyword' results.
meetingAssistance.keySentences	array	An array of 'Key Sentence' results, also known as highlights.
meetingAssistance.keySentences[i].id	long	The sequence number of the 'Key Sentence'.
meetingAssistance.keySentences[i].sentenceId	long	The ID of the corresponding sentence in the original 'Transcription'.
meetingAssistance.keySentences[i].start	long	The Start Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.keySentences[i].end	long	The End Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.keySentences[i].text	string	The text content of the 'Key Sentence'.
meetingAssistance.actions	array	An array of 'To-do' items.
meetingAssistance.actions[i].id	long	The sequence number of the 'To-do'.
meetingAssistance.actions[i].sentenceId	long	The ID of the corresponding sentence in the original 'Transcription'.
meetingAssistance.actions[i].start	long	The Start Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.actions[i].end	long	The End Time, in milliseconds, relative to the beginning of the audio. This is a 'Relative Timestamp'.
meetingAssistance.actions[i].text	string	The text content of the 'To-do'.
meetingAssistance.classifications	object	Contains the 'Scene Classification' results. Three scene types are currently supported.
meetingAssistance.classifications.interview	float	The confidence score for the 'interview' scene.
meetingAssistance.classifications.lecture	float	The confidence score for the 'lecture' scene.
meetingAssistance.classifications.meeting	float	The confidence score for the 'meeting' scene.

Examples

Success response

JSON format

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******",
  "UserData": "{\"userId\":\"123432412831\"}",
  "Results": {
    "Result": [
      {
        "Type": "Meta",
        "Data": "{\"title\":\"example-title-****\"}\t\n"
      }
    ]
  },
  "Usages": {
    "Usage": [
      {
        "Type": "",
        "Quota": 0
      }
    ]
  }
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.