Extract Video Subtitles via OSS with CreateMediaConvertTask - Intelligent Media Management

This topic describes how to extract subtitles from a video file using the CreateMediaConvertTask operation in Intelligent Media Management (IMM).

Overview

Video subtitle extraction separates subtitle information from a video file into a standalone text file that you can access, edit, and reuse independently. This capability supports multi-language content production, accessibility, and subtitle creation workflows.

006

Scenarios

Multi-language support: Extract subtitles from a video so you can produce multi-language versions, making your content accessible to audiences in different languages.
Translation and localization: Once subtitles are extracted, translators can work directly with the text to create localized versions that match the cultural and linguistic conventions of a target region.
Dubbing and speech recognition: Extracted subtitle text provides a script for voice actors to record dubbed audio. It can also serve as training data for speech recognition models.
Video editing and production: Video editors can extract, review, and refine subtitle text to improve accuracy and readability before publishing the final product.

Supported subtitle types

Subtitle type	Description	Supported
Text subtitles	Stored as text data with timestamps in formats such as SRT, ASS, or WebVTT. These are the most common type of subtitle and are easy to edit.	Yes
Image-based subtitles	Stored as bitmap images with timestamps (for example, DVB-Sub or PGS). These subtitles are rendered as images rather than editable text.	Yes
Burned-in subtitles	Rendered directly into the video frames and cannot be separated from the video. Also known as hardcoded subtitles.	No

Note

Burned-in subtitles that are embedded in video frames are not supported. If you need assistance with this type of subtitle, contact us.

Prerequisites

Before you begin, make sure you have completed the following:

An AccessKey pair is created and obtained. For more information, see Create an AccessKey pair.
Object Storage Service (OSS) is activated and a bucket is created. For more information, see Create a bucket.
IMM is activated. For more information, see Activate IMM.
A project is created in the IMM console. For more information, see Create a project.

Note

You can also call the CreateProject operation to create a project programmatically. For more information, see CreateProject.
You can call the ListProjects operation to list all projects in a region.

Procedure

Step 1: Upload a file

Upload a media file to an OSS bucket that resides in the same region as your IMM project. You can use the OSS console to upload the file.

Step 2: Extract subtitles from the video

Call the CreateMediaConvertTask operation to create a subtitle extraction task.

Note

You can use OpenAPI Explorer to call the media transcoding interface and reference the SDK code.
To extract video subtitles, do not configure the Target.URI and Target.Container parameters.
We recommend that you include the {streamindex} variable in the subtitle output URI, for example: oss://test-bucket/objectPrefix-{streamindex}.{autoext}. If you omit {streamindex}, multiple subtitle streams may overwrite each other.

The extraction process works in four phases:

Subtitle format identification: The system identifies the subtitle formats present in the video file, including SubRip Subtitle (SRT), Advanced SubStation Alpha (ASS), Web Video Text Tracks (WebVTT), and embedded subtitle stream formats.
Subtitle data extraction: The system extracts subtitle text along with its associated timestamps from the video file. This phase captures all subtitle content, including speaker names, timestamps, and time format information.
Text processing: The system performs any necessary processing on the extracted subtitle text. During this phase, redundant information may be removed, formats adjusted, languages translated, and spelling and grammar checked.
Output and saving: The processed subtitles are saved in the specified output format (such as SRT or ASS) at the designated OSS location, ready for downstream use, editing, or upload to video platforms.

Parameter examples

The following examples use an IMM project named test-project and extract subtitles from a video at oss://test-bucket/video-demo/test.mp4.

For more information about media processing features, see Media transcoding.

Extract all subtitles and convert to WebVTT format

Subtitle format: WebVTT
Output path: oss://test-bucket/video-demo/subtitle-%d.vtt
Completion notification: A Simple Message Queue (formerly MNS) message sent to a topic named test-mns-topic

For the sample SDK code, go to OpenAPI Explorer. The parameters in the example are pre-configured in OpenAPI Explorer. Modify them as needed before running the code.

Request parameters

{
  "ProjectName": "test-project",
  "Notification": {
    "MNS": {
      "TopicName": "test-mns-topic"
    }
  },
  "Sources": [
    {
      "URI": "oss://test-bucket/video-demo/test.mp4"
    }
  ],
  "Targets": [
    {
      "Subtitle": {
        "ExtractSubtitle": {
          "Format": "webvtt",
          "URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
        }
      }
    }
  ]
}

Extract all subtitles and convert to SRT format

Subtitle format: SRT
Output path: oss://test-bucket/video-demo/subtitle-%d.srt
Completion notification: A Simple Message Queue (formerly MNS) message sent to a topic named test-mns-topic

For the sample SDK code, go to OpenAPI Explorer. The parameters in the example are pre-configured in OpenAPI Explorer. Modify them as needed before running the code.

Request parameters

{
  "ProjectName": "test-project",
  "Notification": {
    "MNS": {
      "TopicName": "test-mns-topic"
    }
  },
  "Sources": [
    {
      "URI": "oss://test-bucket/video-demo/test.mp4"
    }
  ],
  "Targets": [
    {
      "Subtitle": {
        "ExtractSubtitle": {
          "Format": "srt",
          "URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
        }
      }
    }
  ]
}

Note

SRT and WebVTT are both widely used subtitle formats. The key difference is the timestamp delimiter: SRT uses a comma (hh:mm:ss,fff) while WebVTT uses a period (hh:mm:ss.fff). WebVTT also supports CSS styling for caption appearance.

Billing

Video subtitle extraction generates billable items from both OSS and IMM. The following sections describe each.

Important

Starting from 11:00 UTC+8 on July 28, 2025, the IMM video subtitle extraction service will be upgraded from a free model to a paid model. For more information, see IMM billing adjustment announcement.

IMM billable items

For detailed pricing, see IMM billable items.

API	Billable item	Description
CreateMediaConvertTask	ExtractSubtitleText	You are charged for text subtitle extraction based on the number of successfully extracted subtitle streams.
CreateMediaConvertTask	ExtractSubtitleImage	You are charged for image-based subtitle extraction based on the total duration of successfully extracted subtitle streams.

OSS billable items

For detailed pricing, see OSS Pricing

API	Billable item	Description
GetObject	GET requests	You are charged request fees based on the number of successful requests.
	Infrequent Access Data Retrieval Capacity	If IA objects are retrieved, you are charged IA data retrieval fees based on the size of retrieved IA data.
	Archive Data Direct Read Retrieval Capacity	If Archive objects in a bucket for which real-time access is enabled are retrieved, you are charged Archive data retrieval fees based on the size of retrieved Archive objects.
	Transfer acceleration	If you enable transfer acceleration and use an acceleration endpoint to access your bucket, you are charged transfer acceleration fees based on the data size.
PutObject	PUT requests	You are charged request fees based on the number of successful requests.
PutObject	Storage fees	You are charged storage fees based on the storage class, size, and storage duration of the object.
HeadObject	GET requests	You are charged request fees based on the number of successful requests.