This article primarily introduces the functional nodes currently supported by the Media AI solution and their respective features.
Node Name | Description |
Face Recognition | Accepts multiple actor images, names, and role information as input. Automatically detects faces in the video stream and identifies the appearance timestamps of corresponding characters. |
Speech Extraction + ASR | Separates human voices from audio/video and performs high-accuracy Automatic Speech Recognition (ASR), generating speaker-attributed transcripts with precise time stamps. |
Text Content Extraction | Allows selection of AI models, custom prompt configuration, and output formatting. Extracts on-screen text from video frames and combines it with ASR transcripts to enable comprehensive semantic understanding. |
Video Summary (OTT) | Generates intelligent summaries for videos (e.g., movies, TV series, short videos) by analyzing visual frames (configurable interval/similarity threshold) and integrating subtitle/ASR content. |
Frame Extraction & Analysis (Custom) | Enables frame sampling (configurable interval/similarity), model selection, and custom prompt/output definition to analyze video frames for content, quality, or other visual attributes. |