All Products
Search
Document Center

Intelligent Media Services:Batch video production

Last Updated:Jan 16, 2026

This topic describes the billing for batch video production.

Billing

  • Rules:

    • Charges are calculated based on the total duration of the input videos and the output videos. Durations are rounded up to the nearest minute. Any duration less than 1 minute is billed as 1 minute. No charges are incurred for failed production tasks.

    • Intelligent text generation is billed based on the number of tokens consumed. Token counts are rounded up to the nearest thousand. Any usage under 1,000 tokens is billed as 1,000 tokens. No charges are incurred for failed generation tasks.

  • Billing Cycle: Bills are generated hourly. Alibaba Cloud will measure your service usage from the previous billing cycle and issue a bill in the next one. the exact billing time is subject to system processing.

Feature

Billing method

Chinese Mainland

Singapore

US (Silicon Valley)

Unit

Documentation

Intelligent Text Generation

Billed by the number of tokens.

0.017

0.025

0.025

USD per 1,000 tokens

Script-to-Video

  • Same as video editing. Billed by output duration and resolution.

  • No fee is charged if you only generate a timeline (GeneratePreviewOnly is ture). If you later render this timeline into a video, standard video editing fees will apply.

Same as video editing

Same as video editing

Same as video editing

Same as video editing

Image-Text Matching (Common Scenarios)

  • Billed by the total duration of input and output video. If assets are sourced via a theme description search, the actual input duration may be longer than the search result duration to allow for creative randomness.

  • If you only generate a Timeline (GeneratePreviewOnly is true), the output duration is calculated based on the total duration of the Timeline.

0.04

0.06

0.06

USD per minute

Image-Text Matching (Movie Collections)

  • Billed by the total duration of the input and output videos

  • If only the Timeline is generated (GeneratePreviewOnly is true), the output duration is calculated based on the total duration of the Timeline.

0.14

0.21

0.21

USD per minute

Highlight Mashup

  • Billed by the total duration of the input and output videos.

  • If only a TimeLine is generated (GeneratePreviewOnly is true), the output duration is calculated based on the total duration of the TimeLine.

0.28

0.42

0.42

USD per minute

Highlight Extraction

Billed by the input video duration.

0.28

0.42

0.42

USD per minute

Billing examples

Assume that between 8:00 and 9:00, you use the Image-Text Matching (Common Scenarios) feature in the US (Silicon Valley) region. You provide a 90-second input video and successfully produce a 23-second output video. During this process, you also use the Intelligent Text Generation feature, consuming 900 tokens.

Here is how the total cost is calculated:

  • Video production cost:

    • Total billable duration = Input duration + Output duration = (90s + 23s) / 60 = 1.88 minutes.

    • Rounded up to the nearest minute, the total duration is 2 minutes.

    • Cost = 0.06 USD per minute × 2 minutes = 0.12 USD.

  • Text generation cost:

    • Total tokens consumed = 900 tokens.

    • Rounded up to the nearest thousand, the total is 1,000 tokens.

    • Cost = 0.025 USD per 1000 tokens × 1 = 0.025 USD.

  • Total cost:

    • The total cost for the batch video production between 8:00 and 9:00 is 0.12 + 0.025 = 0.145 USD.