All Products
Search
Document Center

Intelligent Media Services:Video translation

Last Updated:Jul 09, 2025

Video translation leverages AI and machine learning algorithms to accurately convert video content into one or more languages. It supports subtitle, speech, and lip-sync translation, ensuring a natural, cohesive visual and auditory experience. By breaking down language barriers, video translation makes your content accessible to global audiences.

Introduction

Subtitle translation

Subtitle translation is ideal for scenarios that require quick multi-language support. It involves the following steps:

  • Extract subtitles from a source video.

  • Remove the original subtitles.

  • Translate subtitles into the target language. You can select multiple target languages in a single task to output corresponding videos.

  • Add translated subtitles to the output video.

Speech translation

Speech translation works based on subtitle translation. It involves the following steps:

  • Clone the speakers' vocal characteristics.

  • Generate audio tracks that read out the translated subtitles in a way that mimics the speakers' tone and inflection. You can select multiple target languages in a single task.

  • Replace the original audio tracks with the translated ones.

This helps preserve the speakers' identities and emotions while conveying the translated content.

Lip-sync translation

Lip-sync translation is built upon subtitle and speech translation.

It adjusts the lip movements of speakers to match the translated audio, ensuring that the visual presentation aligns with the spoken content.

Lip-sync translation is ideal for generating highly realistic interaction or promotional content.

Post-editing

Intelligent Media Services (IMS) supports post-editing to correct speech translation results and edit output videos, meeting personalized needs.

You can perform post-editing through Online Editing or OpenAPI.

Availability

  • Subtitle translation: China (Shanghai), China (Beijing), China (Shenzhen), China (Hangzhou), Singapore, US (Silicon Valley)

  • Speech translation: China (Shanghai), China (Beijing), China (Shenzhen), China (Hangzhou), Singapore, US (Silicon Valley)

  • Lip-sync translation: China (Shanghai), Singapore

Advantages

  • Support for multiple languages and dialects

    Provide translation services in more than 40 national languages and over 10 Chinese dialects. A translation task supports generating outputs in over 40 languages at a time.

  • Compatible with multiple video formats

    Supported formats include MP4, WebM, MOV, and M3U8.

  • Rich custom options

    Supported audio formats include MP3 and WAV.

    Allow you to make personalized configurations to meet specific requirements.

Billing

For billing information, see Video translation.

Use video translation

IMS provides three methods to create and manage video translation tasks:

  • IMS console: provides intuitive user interfaces.

  • Online Editing (Web): provides users familiar with video editing with a more refined workspace to enable flexible control over translation outputs. Users can directly add materials to their editing project, use video translation to translate content, and do post-editing.

  • OpenAPI: provides developers and technical personnel with APIs to use video translation, allowing for integration into third-party systems to automate the processing of large volumes of video translation tasks.

The supported methods vary by translation type:

  • Subtitle and speech translation support all three methods.

  • Lip-sync translation supports only IMS console and OpenAPI.

Create a translation task

For specific procedure based on the method, choose one of the following tabs: 

IMS console

  1. Log on to the IMS console.

  2. In the left-side navigation pane, select Intelligent Production > Video Translation.

  3. In the upper-left corner, select a region as needed.

  4. Click Create Translation Task.image

  5. Configure parameters:

    • Translation Method: Select Subtitle Translation, Speech Translation, or Lip-sync Translation.

    • Select Source File: Upload the video file to be translated. MP4, WebM, and MOV formats are supported.

    • Target Language: Select one or more target languages. The system generates corresponding output videos.

    • Storage Directory and File Name: Specify the storage location and name for the output file.

    • Subtitle settings: Specify whether to erase the original subtitles and the subtitle source.

      • OCR: When you don't have a ready-made caption file but the video includes built-in subtitles, you can extract subtitles using OCR technology. To improve efficiency and accuracy, select the OCR-recognized area.

      • Specified SRT file: If you have a ready-made caption file, load it into the video editing software to play synchronously with the video.

      • ASR: If your video doesn't have subtitles, use ASR technology to convert speech into text.

      • OCR/ASR: Prioritize OCR to extract subtitles. When it fails, the system uses ASR.

  6. Click Submit Translation Task to create a task.

  7. View the task status, configurations, and translation results in the task list. When the status changes to Processed, click View Details to view its basic information, advanced settings, and outputs.

    image

    image

Online Editing

Preparation

If you are not familiar with Online Editing, learn its basic operations.

Procedure

  1. Log on to the IMS console.

  2. In the left-side navigation pane, select Intelligent Production > Online Editing.

  3. In the upper-left corner, select a region as needed.

  4. Click Create Editing Project to create a project. Then, click Edit in the Actions column to enter Online Editing.image

  5. On the Materials tab, click Import and select the materials you want to translate in the right-side pane. After they are added to Materials, click the plus sign image of the file or drag it to the editing area below.screen_recording_2024-12-19_23-51-27

  6. Select the audio or video that needs to be translated in the editing track, then click AI Translation. Configure the translation type, subtitle extraction, language, and subtitle erasure settings, and click Submit. The following image takes speech translation as an example.screen_recording_2024-12-19_23-57-40

  7. Wait a few minutes. The translated result is displayed in the track.image

  8. Click Generate As > Generate in the upper-right corner. Configure parameters for video production as prompted and click OK to generate and export the translated video.screen_recording_2024-12-20_00-23-26

OpenAPI

image
  1. Create a task

    Call SubmitVideoTranslationJob to create a video translation task. For related parameters, see Video translation parameters.

  2. Query the result of a translation task

    To get the status and result of a specific video translation task, call GetSmartHandleJob. The API returns processing progress, completion time, and URLs to final outputs.

  3. Query translation task list

    To view all ongoing or completed video translation tasks, call ListSmartJobs.

  4. Delete a translation task

    Call DeleteSmartJob to delete a completed task that no longer needs to be saved and release resources.

Perform post-editing

If you want to modify the speech translation, enable secondary editing before submitting the initial translation task. This section introduces two methods to perform post-editing.

Note

For lip-sync translation, post-editing only supports modifying the result of voice output. The lip movement cannot be modified.

OpenAPI

For detailed API operations, see Speech translation with post-editing.

Online Editing

Preparation

If you are not familiar with Online Editing, learn its basic operations.

Procedure

  1. Log on to the IMS console and go to the Video Translation page.

  2. In the task list, find the one that needs correction.

  3. Click Edit in the Actions column to open the corresponding online editing project.

FAQ

How do I set the start and end times of a single line based on the audio waveform?

Example:

To split the translated subtitle "Great where are you" into two segments: "Great" and "where are you", align the start and end times of the segments with the troughs of the waveform. This optimizes the post-editing effect for speech translation.

image

Word count limit during post-editing

Ensure the word count of the adjusted content does not exceed 1.5 times the number before adjustment. Otherwise, it may cause the speech rate to be too fast after post-editing.

Example

Initial translation: Let's talk about this later. We need to go home now.

Inappropriate adjustment: Let's discuss this matter in more detail at a later time. Right now, we should focus on heading back home as it's important to ensure we get there safely and in good time. We can revisit this conversation when we are both more relaxed and have ample opportunity to explore all the aspects thoroughly.

Appropriate adjustment: Let's pick this up another time. We should be going home now.