All Products
Search
Document Center

Intelligent Media Services:Batch video production

Last Updated:Jun 22, 2026

Learn about the billing rules and pricing for batch video production features, including text generation, script-to-video, image-text matching, and highlight extraction.

Billing

  • Rules:

    • Charges are calculated based on the total duration of input and output videos. Durations are rounded up to the nearest minute. Any duration less than 1 minute is billed as 1 minute. No charges are incurred for failed production tasks.

    • Intelligent text generation is billed based on the number of tokens consumed. Token counts are rounded up to the nearest thousand. Any usage under 1,000 tokens is billed as 1,000 tokens. No charges are incurred for failed generation tasks.

  • Billing cycle: Bills are generated hourly. Alibaba Cloud measures your service usage from the previous billing cycle and issues a bill in the next one. The exact billing time is subject to system processing.

Feature

Billing method

Chinese Mainland

Singapore

US (Silicon Valley)

Unit

Documentation

Intelligent Text Generation

Billed by the number of tokens.

0.017

0.025

0.025

USD per 1,000 tokens

Script-to-Video

  • Same as video editing. Billed by output duration and resolution.

  • No fee is charged if you only generate a timeline (GeneratePreviewOnly is true). If you later render this timeline into a video, standard video editing fees will apply.

Same as video editing

Same as video editing

Same as video editing

Same as video editing

Image-Text Matching (Common Scenarios)

  • Billed by the total duration of input and output videos. If assets are sourced through a theme description search, the actual input duration may exceed the search result duration to allow for creative randomness.

  • If you only generate a Timeline (GeneratePreviewOnly is true), the output duration is calculated based on the total duration of the Timeline.

0.04

0.06

0.06

USD per minute

Image-Text Matching (Movie Collections)

  • Billed by the total duration of the input and output videos

  • If only the Timeline is generated (GeneratePreviewOnly is true), the output duration is calculated based on the total duration of the Timeline.

0.14

0.21

0.21

USD per minute

Highlight Mashup

  • Billed by the total duration of the input and output videos.

  • If only a TimeLine is generated (GeneratePreviewOnly is true), the output duration is calculated based on the total duration of the TimeLine.

0.28

0.42

0.42

USD per minute

Highlight Extraction

Billed by the input video duration.

0.28

0.42

0.42

USD per minute

Billing examples

Assume that between 8:00 and 9:00, you use the Image-Text Matching (Common Scenarios) feature in the US (Silicon Valley) region. You provide a 90-second input video and produce a 23-second output video. You also use the Intelligent Text Generation feature, consuming 900 tokens.

The total cost is calculated as follows:

  • Video production cost:

    • Total billable duration = Input duration + Output duration = (90s + 23s) / 60 = 1.88 minutes.

    • Rounded up to the nearest minute, the total duration is 2 minutes.

    • Cost = 0.06 USD per minute × 2 minutes = 0.12 USD.

  • Text generation cost:

    • Total tokens consumed = 900 tokens.

    • Rounded up to the nearest thousand, the total is 1,000 tokens.

    • Cost = 0.025 USD per 1000 tokens × 1 = 0.025 USD.

  • Total cost:

    • The total cost for the batch video production between 8:00 and 9:00 is 0.12 + 0.025 = 0.145 USD.