Workflow Node Capabilities - Media AI - SuperApp - Alibaba Cloud Documentation Center

Workflow nodes supported by the Media AI solution and their capabilities.

Node Name	Description
Face Recognition	Takes multiple actor images, names, and roles as input. Detects faces in video and identifies when each character appears.
Speech Extraction + ASR	Extracts speech from audio/video and runs high-accuracy ASR to produce speaker-attributed, timestamped transcripts.
Text Content Extraction	Select AI models, configure a prompt, and define the output format. Combines ASR transcripts with frame-sampled visual analysis for cross-modal text extraction and comprehensive semantic understanding.
Video Summary (OTT)	Generates summaries and recaps for various video formats (e.g., movies, TV series, and short videos). Analyzes visual frames at configurable intervals and similarity thresholds, combined with subtitle and ASR content.
Frame Extraction & Analysis (Custom)	Samples frames at configurable intervals and similarity thresholds. Select a model and define a custom prompt and output format to analyze frames for content, quality, or other visual attributes.