Introduction and Billing for Video Moderation 2.0 - AI Guardrails

Video Moderation Version 2.0 scans ApsaraVideo VOD files and live streams for policy-violating content across both video frames and audio. The service returns risk labels with confidence scores so you can take moderation actions based on your platform's rules.

The service integrates Image Moderation Version 2.0 for frame analysis and Voice Moderation Version 2.0 for audio analysis, letting you reuse configurations already set up for those services.

Services

Video Moderation Version 2.0 provides four services depending on your content type and model preference.

Service	Service ID	Description	Availability
Video file detection	`videoDetection_global`	Scans video files for violations in frames and audio	All regions
Video file detection (Large model edition)	`videoDetectionByVL_global`	Uses large model image moderation for frame analysis; 10 ingest endpoints by default	Singapore only
Live video stream moderation	`liveStreamDetection_global`	Scans live video streams for violations in frames and audio	All regions
Live video stream moderation (Large model edition)	`liveStreamDetectionByVL_global`	Uses large model image moderation for frame analysis; 10 ingest endpoints by default	Singapore only

Large model edition services default to 10 ingest endpoints. Control the number of concurrent calls accordingly.

Upgrade from Video Moderation 1.0

If you are evaluating whether to upgrade, the table below summarizes the differences between versions.

	Video Moderation Version 2.0	Video Moderation 1.0
Default ingest endpoints	50	20
Default QPS	100 calls/second	50 calls/second
Max video size	500 MB	200 MB
Frame detection scope	General-purpose baseline check (via Image Moderation Version 2.0) common baseline detection Multi-Language Detection in Audio and Video Multi-language detection for social and entertainment live streams	Pornography, terrorism, undesirable scenes, logos, text and image violations
Audio detection scope	Multilingual audio and video media detection; multilingual social and entertainment live stream detection (via Voice Moderation Version 2.0)	—
Console features	Frame detection service settings, audio detection service settings, snapshot settings, result return settings	Check item settings only
Frame billing	Consistent with Image Moderation Version 2.0 pricing	1.8× Image Moderation 1.0 pricing
Audio billing	10% discount vs. Voice Moderation Version 2.0	Consistent with Voice Moderation 1.0

Version 2.0 increases the default ingest endpoints from 20 to 50, raises the QPS limit from 50 to 100 calls/second, and supports video files up to 500 MB. Frame detection scope is consolidated into a single general-purpose baseline check that covers the content categories handled by Image Moderation Version 2.0.

Billing

Video Moderation Version 2.0 uses pay-as-you-go billing. You are not charged when the service is not called. Usage is metered and billed once every 24 hours.

Pricing

Moderation type	Business scenarios	Unit price
Standard video image detection (image_standard, video_image_standard)	Common baseline detection: baselineCheck_global	USD 0.60 per 1,000 calls Note Each call to a business scenario in this tier is counted as one transaction. You are billed based on your actual usage. For example, 100 calls to the common baseline detection service cost USD 0.06.
Premium video image detection (image_advanced, video_image_advanced)	Hybrid large-small model image moderation service: postImageCheckByVL_global	USD 1.20 per 1,000 calls Note Each call to a business scenario in this tier is counted as one transaction. You are billed based on your actual usage. For example, 100 calls to the hybrid large-small model image moderation service cost USD 0.12.
Standard video voice moderation (video_standard)	Audio and video media multi-language detection: audio_multilingual_global Live stream multi-language detection: stream_multilingual_global	USD 8.10 per 1,000 minutes (equivalent to USD 0.486/hour).
Video frame detection (General-purpose edition) — `image_standard`, `video_image_standard`	General-purpose baseline check (`baselineCheck_global`)	USD 0.6 / 1,000 calls
Video frame detection (Premium edition) — `image_advanced`, `video_image_advanced`	Large and small model fusion image moderation service (`postImageCheckByVL_global`)	USD 1.2 / 1,000 calls
Video audio moderation (General-purpose edition) — `video_standard`	Multilingual audio and video media detection (`audio_multilingual_global`); multilingual live video stream detection (`stream_multilingual_global`)	USD 8.1 / 1,000 minutes (USD 0.486/hour)

Frame detection and audio moderation are billed independently. If you enable audio moderation, the total cost is:

Total cost = (Number of snapshots × Frame unit price) + (Video duration × Audio unit price)

In the billing details, the 24moderationType field corresponds to the moderation types in the pricing table. To view your bill, go to Bill details.

Integrate Video Moderation Version 2.0

SDK and API

Integrate Video Moderation Version 2.0 using the method that fits your workflow:

SDK — See Video Moderation Version 2.0 SDK and integration guide.
API — See Video file moderation Version 2.0 API.

Console setup

Before making your first API call, configure your video moderation settings in the Content Moderation console. The console lets you:

Configure video frame and audio detection services
Set snapshot intervals and result return preferences
Define different moderation policies for different business scenarios
Review call results and monitor usage

Related resources

Video Moderation Version 2.0 SDK and integration guide
Video file moderation Version 2.0 API
Image Moderation Version 2.0 service descriptionservice description
Voice Moderation Version 2.0 service description