This document describes the features, core capabilities, and use cases of the MaxFrame skill for intelligent processing of autonomous driving video data.
Overview
The MaxFrame skill for intelligent processing of autonomous driving video data is a scaffold generator for autonomous driving video processing jobs. Simply describe your input tables and processing goals to instantly generate complete pipeline code, suggested table structures, and a walkthrough. The output is ready to run directly on MaxCompute and MaxFrame.
Use cases
Quickly build pipelines to transform video data into images, labels, and embeddings for autonomous driving, intelligent driving, or computer vision teams.
Refactor existing pipelines that use UDFs with direct DashScope connections to use managed calls through AI Function.
Generate production-grade code with built-in row-level fault tolerance and observability.
Scenario | Input | Expected output |
Video frame extraction | Video table (containing OSS paths) | Frame image table |
Frame extraction, labeling, and embedding generation | Video table | Image table with labels and embeddings (automatically split into a two-stage job) |
Keyframe labeling | Clip directory table | Keyframe table with labels |
Direct image labeling and embedding generation | Image table | Table with labels and embeddings |
Append embeddings to an image table | Labeled image table | Image table with an embedding column |
Procedure
User describes requirements
↓
The skill automatically determines the pipeline shape
↓
(If ambiguous) The skill prompts for the minimum required inputs
↓
The skill generates the code, table structure, and walkthrough
↓
User submits and runs the job directly on MaxCompute or MaxFrameDeliverables
Each time you call the skill, you get the following:
Main job code (
*.py): A runnable MaxFrame program.Suggested table structure (
*_schema.sql, optional): The DDL for input, intermediate, and output tables.Walkthrough (
*_walkthrough.md): Includes the scenario type, execution order, required environment variables, and specifications for upstream and downstream tables.
Core capabilities
Data teams in fields such as autonomous driving, intelligent driving, and in-cabin computer vision often need to convert video into searchable, trainable data with labels and embeddings. This process involves several key steps: video frame extraction, keyframe selection and labeling, image and text embedding generation, multi-source data flow between MaxCompute and OSS, and management of distributed concurrency and fault tolerance.
Traditionally, this requires teams to write UDFs, maintain clients for DashScope or other HTTP services, handle OSS authentication, control concurrency, and manage retries. Deploying a single pipeline can take several days to a week.
Using the intelligent autonomous driving video data processing skill, you can describe your input data and desired output to get complete job code that follows MaxFrame best practices in just minutes.
The key advantages are as follows:
Generate complete jobs from one-line descriptions
Provide the following four inputs:
Scenario name (
scenario_name)Input data shape (
input_shape)Processing targets (
targets)Output table name (
output_table/output_tables)
The skill automatically selects the appropriate pipeline shape, generates the main program code, and creates the table structure and walkthrough.
Built-in MaxFrame best practices
Uses MaxFrame AI Function and a managed Model Studio large model (
read_odps_model) for labeling and embedding generation. This eliminates the need to maintain DashScope keys or wrap code in UDFs.Video tasks are automatically split into a two-stage job, "frame extraction → image processing", which simplifies reruns and allows for the reuse of intermediate results.
OSS paths are mounted using
with_fs_mount, and concurrency is controlled withrebalance.All write operations are consolidated into a single exit point with
to_odps_table().execute().
Row-level fault tolerance and rerunnability
All model stage outputs include the status, error_stage, and error_msg fields:
A single row failure does not stop the entire batch job.
Failures are pinpointed to a specific stage, such as frame extraction, labeling, or embedding parsing.
Supports precise reruns of only the failed rows.
Secure and compliant by default
No hardcoded sensitive information: Model names, OSS buckets, the MaxCompute project, and keys are all configured through environment variables.
Path safety checks: Prevents directory traversal attacks using
..and enforces that all paths stay within the declared OSS prefix.Customer neutrality: The generated code does not contain any customer names, private prompts, or business rules.
Controllable cost and performance
For label generation, the thinking process is disabled by default to reduce token consumption.
The embedding generation stage is independent and can be enabled or disabled as needed.
Token usage is reported by stage (
label_input_token,label_output_token,*_total_token) to simplify cost attribution.