All Products
Search
Document Center

SuperApp:Node Capabilities Description

Last Updated:Apr 21, 2026

This article primarily introduces the functional nodes currently supported by the Media AI solution and their respective features.

Node Name

Description

Face Recognition

Accepts multiple actor images, names, and role information as input. Automatically detects faces in the video stream and identifies the appearance timestamps of corresponding characters.

Speech Extraction + ASR

Separates human voices from audio/video and performs high-accuracy Automatic Speech Recognition (ASR), generating speaker-attributed transcripts with precise time stamps.

Text Content Extraction

Allows selection of AI models, custom prompt configuration, and output formatting. Extracts on-screen text from video frames and combines it with ASR transcripts to enable comprehensive semantic understanding.

Video Summary (OTT)

Generates intelligent summaries for videos (e.g., movies, TV series, short videos) by analyzing visual frames (configurable interval/similarity threshold) and integrating subtitle/ASR content.

Frame Extraction & Analysis (Custom)

Enables frame sampling (configurable interval/similarity), model selection, and custom prompt/output definition to analyze video frames for content, quality, or other visual attributes.