This topic describes how to extract subtitles from a video file using the CreateMediaConvertTask operation in Intelligent Media Management (IMM).
Overview
Video subtitle extraction separates subtitle information from a video file into a standalone text file that you can access, edit, and reuse independently. This capability supports multi-language content production, accessibility, and subtitle creation workflows.

Scenarios
-
Multi-language support: Extract subtitles from a video so you can produce multi-language versions, making your content accessible to audiences in different languages.
-
Translation and localization: Once subtitles are extracted, translators can work directly with the text to create localized versions that match the cultural and linguistic conventions of a target region.
-
Dubbing and speech recognition: Extracted subtitle text provides a script for voice actors to record dubbed audio. It can also serve as training data for speech recognition models.
-
Video editing and production: Video editors can extract, review, and refine subtitle text to improve accuracy and readability before publishing the final product.
Supported subtitle types
Subtitle type |
Description |
Supported |
Text subtitles |
Stored as text data with timestamps in formats such as SRT, ASS, or WebVTT. These are the most common type of subtitle and are easy to edit. |
Yes |
Image-based subtitles |
Stored as bitmap images with timestamps (for example, DVB-Sub or PGS). These subtitles are rendered as images rather than editable text. |
Yes |
Burned-in subtitles |
Rendered directly into the video frames and cannot be separated from the video. Also known as hardcoded subtitles. |
No |
Burned-in subtitles that are embedded in video frames are not supported. If you need assistance with this type of subtitle, contact us.
Prerequisites
Before you begin, make sure you have completed the following:
-
An AccessKey pair is created and obtained. For more information, see Create an AccessKey pair.
-
Object Storage Service (OSS) is activated and a bucket is created. For more information, see Create a bucket.
-
IMM is activated. For more information, see Activate IMM.
-
A project is created in the IMM console. For more information, see Create a project.
-
You can also call the CreateProject operation to create a project programmatically. For more information, see CreateProject.
-
You can call the ListProjects operation to list all projects in a region.
Procedure
Step 1: Upload a file
Upload a media file to an OSS bucket that resides in the same region as your IMM project. You can use the OSS console to upload the file.

Step 2: Extract subtitles from the video
Call the CreateMediaConvertTask operation to create a subtitle extraction task.
-
You can use OpenAPI Explorer to call the media transcoding interface and reference the SDK code.
-
To extract video subtitles, do not configure the
Target.URIandTarget.Containerparameters. -
We recommend that you include the
{streamindex}variable in the subtitle output URI, for example:oss://test-bucket/objectPrefix-{streamindex}.{autoext}. If you omit{streamindex}, multiple subtitle streams may overwrite each other.
The extraction process works in four phases:
-
Subtitle format identification: The system identifies the subtitle formats present in the video file, including SubRip Subtitle (SRT), Advanced SubStation Alpha (ASS), Web Video Text Tracks (WebVTT), and embedded subtitle stream formats.
-
Subtitle data extraction: The system extracts subtitle text along with its associated timestamps from the video file. This phase captures all subtitle content, including speaker names, timestamps, and time format information.
-
Text processing: The system performs any necessary processing on the extracted subtitle text. During this phase, redundant information may be removed, formats adjusted, languages translated, and spelling and grammar checked.
-
Output and saving: The processed subtitles are saved in the specified output format (such as SRT or ASS) at the designated OSS location, ready for downstream use, editing, or upload to video platforms.
Parameter examples
The following examples use an IMM project named test-project and extract subtitles from a video at oss://test-bucket/video-demo/test.mp4.
For more information about media processing features, see Media transcoding.
Extract all subtitles and convert to WebVTT format
-
Subtitle format: WebVTT
-
Output path:
oss://test-bucket/video-demo/subtitle-%d.vtt -
Completion notification: A Simple Message Queue (formerly MNS) message sent to a topic named
test-mns-topic
For the sample SDK code, go to OpenAPI Explorer. The parameters in the example are pre-configured in OpenAPI Explorer. Modify them as needed before running the code.
Request parameters
{
"ProjectName": "test-project",
"Notification": {
"MNS": {
"TopicName": "test-mns-topic"
}
},
"Sources": [
{
"URI": "oss://test-bucket/video-demo/test.mp4"
}
],
"Targets": [
{
"Subtitle": {
"ExtractSubtitle": {
"Format": "webvtt",
"URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
}
}
}
]
}
Extract all subtitles and convert to SRT format
-
Subtitle format: SRT
-
Output path:
oss://test-bucket/video-demo/subtitle-%d.srt -
Completion notification: A Simple Message Queue (formerly MNS) message sent to a topic named
test-mns-topic
For the sample SDK code, go to OpenAPI Explorer. The parameters in the example are pre-configured in OpenAPI Explorer. Modify them as needed before running the code.
Request parameters
{
"ProjectName": "test-project",
"Notification": {
"MNS": {
"TopicName": "test-mns-topic"
}
},
"Sources": [
{
"URI": "oss://test-bucket/video-demo/test.mp4"
}
],
"Targets": [
{
"Subtitle": {
"ExtractSubtitle": {
"Format": "srt",
"URI": "oss://test-bucket/video-demo/subtitle-{streamindex}.{autoext}"
}
}
}
]
}
SRT and WebVTT are both widely used subtitle formats. The key difference is the timestamp delimiter: SRT uses a comma (hh:mm:ss,fff) while WebVTT uses a period (hh:mm:ss.fff). WebVTT also supports CSS styling for caption appearance.
Billing
Video subtitle extraction generates billable items from both OSS and IMM. The following sections describe each.
Starting from 11:00 UTC+8 on July 28, 2025, the IMM video subtitle extraction service will be upgraded from a free model to a paid model. For more information, see IMM billing adjustment announcement.
IMM billable items
For detailed pricing, see IMM billable items.
API |
Billable item |
Description |
CreateMediaConvertTask |
ExtractSubtitleText |
You are charged for text subtitle extraction based on the number of successfully extracted subtitle streams. |
CreateMediaConvertTask |
ExtractSubtitleImage |
You are charged for image-based subtitle extraction based on the total duration of successfully extracted subtitle streams. |
OSS billable items
For detailed pricing, see OSS Pricing
API |
Billable item |
Description |
GetObject |
GET requests |
You are charged request fees based on the number of successful requests. |
Infrequent Access Data Retrieval Capacity |
If IA objects are retrieved, you are charged IA data retrieval fees based on the size of retrieved IA data. |
|
Archive Data Direct Read Retrieval Capacity |
If Archive objects in a bucket for which real-time access is enabled are retrieved, you are charged Archive data retrieval fees based on the size of retrieved Archive objects. |
|
Transfer acceleration |
If you enable transfer acceleration and use an acceleration endpoint to access your bucket, you are charged transfer acceleration fees based on the data size. |
|
PutObject |
PUT requests |
You are charged request fees based on the number of successful requests. |
Storage fees |
You are charged storage fees based on the storage class, size, and storage duration of the object. |
|
HeadObject |
GET requests |
You are charged request fees based on the number of successful requests. |