MaxFrame skill for autonomous driving video data processing - MaxCompute

Overview

This Skill is a job scaffold generator for autonomous driving video data processing. Describe your input tables and processing goals, and it produces complete pipeline code, table schema recommendations, and a walkthrough that runs directly on MaxCompute and MaxFrame.

Use cases

Quickly build a pipeline to process video into images, labels, and vectors for autonomous driving, intelligent driving, and computer vision.
Migrate existing UDF and direct DashScope connections to managed AI Function calls.
Generate production-grade code with built-in row-level fault tolerance and observability.

Scenario	Input	Expected output
Video frame extraction	Video table (containing OSS paths)	Image frame table
Frame extraction, labeling, and embedding generation	Video table	Image table with labels and vectors (automatically split into a two-stage job)
Keyframe labeling	Clip directory table	Keyframe table with labels
Direct image labeling and vectorization	Image table	Label and embedding table
Append embeddings to an image table	Labeled image table	Image table with a vector column

Workflow

User describes requirements
    ↓
Skill automatically determines the pipeline structure
    ↓
(If ambiguous) Prompts for minimum required input
    ↓
Generates code, table schemas, and a walkthrough
    ↓
User submits the job directly to MaxCompute or MaxFrame

Installation

Download the package

Skill package: https://skills.aliyun.com/skills/alibabacloud-maxframe-video-frame-pipeline
Extract the package to the skills directory of your AI coding assistant (using Claude Code as an example)

unzip alibabacloud-maxframe-video-frame-pipeline-0.0.1.zip -d your-project/.claude/skills/

Deliverables

Each Skill call generates the following:

Main script (*.py) — A MaxFrame program that you can run directly.
Recommended table structure (*_schema.sql, optional) — DDL for input, intermediate, and output tables.
Walkthrough (*_walkthrough.md) — Covers the scenario type, execution order, required environment variables, and upstream and downstream table expectations.

Core capabilities

Data teams in autonomous driving, intelligent driving, and in-cabin computer vision often need to transform video into searchable, trainable data with labels and embeddings. This involves video frame extraction, keyframe selection and labeling, image and text embedding generation, multi-source data flow between MaxCompute and OSS, and distributed concurrency and fault tolerance.

Traditionally, teams must handwrite UDFs, maintain DashScope or HTTP clients, handle OSS authentication, control concurrency, and manage retry logic. A single pipeline typically takes several days to a week to go live.
With this Skill, you get complete job code that follows MaxFrame best practices in minutes by describing your input data and expected output.

Key advantages include:

Generate jobs from a single prompt

Provide four inputs:

Scenario name (scenario_name)
Input data shape (input_shape)
Targets (targets)
Output table name (output_table / output_tables)

The Skill automatically selects the pipeline structure, generates the main program code, and creates table schemas and a walkthrough.

Built-in MaxFrame best practices

Uses MaxFrame AI Function + a managed Model Studio large model (read_odps_model) for labeling and vectorization, eliminating the need to maintain a DashScope Key or package a UDF.
Automatically splits video tasks into two stages (frame extraction → image processing) to simplify reruns and reuse intermediate results.
Mounts OSS paths with with_fs_mount and controls concurrency with rebalance.
Centralizes write operations in to_odps_table().execute().

Row-level fault tolerance and rerunnable jobs

All model stage outputs include three fields: status, error_stage, and error_msg.

A single row failure does not affect the entire batch job.
Failure reasons are pinpointed to a specific stage, such as frame extraction, labeling, or embedding parsing.
Supports precise reruns of only the failed rows.

Secure and compliant by design

No hardcoded sensitive information: Model names, OSS buckets, MaxCompute Projects, and access keys are configured through environment variables.
Path security validation: Rejects .. path traversal and enforces OSS prefix boundaries.
Customer isolation: Generated code contains no customer names, private prompts, or business rules.

Controllable cost and performance

Label generation disables thinking by default to reduce token usage.
The embedding stage is independent and can be enabled or disabled as needed.
Token usage is returned per stage (label_input_token, label_output_token, and *_total_token) for cost attribution.