All Products
Search
Document Center

Intelligent Media Services:Transcode multiple audio tracks into an MP4 file

Last Updated:Sep 03, 2025

Intelligent Media Services (IMS) enables you to transcode and package multiple audio tracks into a single MP4 file and set the language for each track.

Workflow

image

Example of an output file structure:

Duration: 00:00:31.40, start: 0.000000, bitrate: 816 kb/s
Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 960x540 [SAR 1:1 DAR 16:9], 663 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
Stream #0:1[0x2](zho): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)
Stream #0:2[0x3](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)
Stream #0:3[0x4](jpn): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)

Prerequisites

IMS is activated. For details, see Activate IMS.

Configuration

Basic configuration

  • Storage: Associate an Object Storage Service (OSS) bucket with IMS. For more information, see Configure storage address.

  • Callback: Configure an HTTP or MNS callback to receive task status notifications. For callback methods and events, see Overview.

Transcoding template configuration

Procedure

image

Example requirements

Codec: H.264, H.265

Resolution: 360P, 540P, 720P, 1080P

Audio: HE-AAC, 64 Kbps (default)

Example configuration

This example shows how to configure transcoding templates for the four required video resolutions. To learn how to create a template, see Create a transcoding template.

Note

To perform Narrowband HD™ transcoding, create a basic template based on the following table. Then, submit a ticket for backend upgrade.

H.264

Template

Codec

Container format

Other parameters

Video-360P

H.264

.mp4

  • Resolution (long edge fixed): 640px

  • Configure other parameters as needed.

Video-540P

H.264

.mp4

  • Resolution (long edge fixed): 960px

  • Configure other parameters as needed.

Video-720P

H.264

.mp4

  • Resolution (long edge fixed): 1280px

  • Configure other parameters as needed.

Video-1080P

H.264

.mp4

  • Resolution (long edge fixed): 1920px

  • Configure other parameters as needed.

H.265

Template

Codec

Container format

Other parameters

Video-360P

H.265

.mp4

  • Resolution (long edge fixed): 640px

  • Configure other parameters as needed.

Video-540P

H.265

.mp4

  • Resolution (long edge fixed): 960px

  • Configure other parameters as needed.

Video-720P

H.265

.mp4

  • Resolution (long edge fixed): 1280px

  • Configure other parameters as needed.

Video-1080P

H.265

.mp4

  • Resolution (long edge fixed): 1920px

  • Configure other parameters as needed.

Submit a transcoding task

Call SubmitMediaConvertJob to submit a transcoding task.

Audios parameter

Field

Type

Description

InputRef

String

The name of the input stream to use for this audio track. It must match a Name defined in the Inputs or AudioSelector array.

LanguageControl

String

Specifies how the language tag for the output audio track is determined. Valid values:

  • InputFirst: Inherits the language tag from the input stream. If the input has no language tag, the tag specified in the Language parameter is used.

  • Configured: Uses the language tag specified in the Language parameter.

  • None (Default): Does not add a language tag to the output track.

Language

String

The language code to apply to the audio track. It must be a valid ISO 639-2 code.

Remove

String

Specifies whether to remove the audio track from the output file.

Codec

String

The audio codec.

Profile

String

The audio codec profile.

Bitrate

String

The audio bitrate of the output file.

Samplerate

String

The audio sample rate.

Channels

String

The number of audio channels.

Volume

Object

The volume control settings.

Scenario 1: Retain original audio track

This example shows how to combine a video with its original audio and two additional language tracks into a single output file.

  • In the Inputs array, three sources are defined: the video file with default audio track (video), the English audio file (EnglishAudio), and the Japanese audio file (JapaneseAudio).

  • In OutputGroups.GroupConfig, Type is set to File, indicating that the output group is muxed into a single container file.

  • In OutputGroups.Outputs.OverrideParams, Audios array defines the multiple audio tracks for the output. For each track in the array:

    • InputRef references the source file from the Inputs array.

    • LanguageControl determines how the language tag is set for that track.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Chinese>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 English>"}
    },
    {
      "Name": "JapaneseAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Japanese>"}
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<Bucket>.<Public Endpoint>/<URI>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "video",
                "LanguageControl": "InputFirst"
              }, {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "Configured",
                "Language": "jpn"
              }
            ]
          }
        }
      ]
    }
  ]
}

Scenario 2: Remove original audio track

Compared to Scenario 1, this configuration in the Audios array removes any reference to the audio of the original video input.

As a result, the source video's audio will be excluded from the final output. The output will be muxed with only the specified English and Japanese audio tracks.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Chinese>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 English>"}
    },
    {
      "Name": "JapaneseAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Japanese>"}
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<Bucket>.<Public Endpoint>/<URI>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "Configured",
                "Language": "jpn"
              }
            ]
          }
        }
      ]
    }
  ]
}

Scenario 3: Select an audio track by language tag

This example shows how to use the AudioSelector parameter to select a specific audio track from an input file. Here, it selects the track tagged as Japanese (jpn) from the JapaneseFile input.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Chinese>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 English>"}
    },
    {
      "Name": "JapaneseFile",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Japanese>"},
      "AudioSelector": [{
        "Name": "JapaneseFile",
        "Rule": "tag",
        "TagConfig": {"language": "jpn"}
      }]
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<Bucket>.<Public Endpoint>/<URI>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "video",
                "LanguageControl": "InputFirst"
              }, {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "InputFirst"
              }
            ]
          }
        }
      ]
    }
  ]
}

Query transcoding results

Call GetMediaConvertJob to retrieve the details of a transcoding task.

Callback events

Event type: MediaConvertComplete

Configuration method: This event cannot be configured in the console. Configure it by calling SetEventCallback.

Key callback parameters

Parameter

Type

Required

Description

Name

String

Yes

The name of the main task.

JobId

String

Yes

The ID of the task.

Status

String

Yes

The task status. Success indicates that at least one output (subtask) succeeded.

TriggerSource

String

No

The source that triggered the task. API indicates the task was submitted via an API call.

FinishTime

String

No

The time the task was completed, in UTC format.

UserData

string

No

A custom string specified when submitting the task. It is passed through and returned in the callback.

Example

{
	"FinishTime": "2025-05-09T08:03:21Z",
	"JobId": "5d37357cb3a44d10ba33c52760c896cd",
	"Status": "Success",
	"TriggerSource": "IceWorkflow",
	"UserData": "{\"ImsSrc\":\"Workflow\",\"TaskId\":\"e89a955d88ca47f0b9b79c562e5c622f\"}"
}