Transcode multiple audio tracks into an MP4 file - Intelligent Media Services

Intelligent Media Services (IMS) enables you to transcode and package multiple audio tracks into a single MP4 file and set the language for each track.

Workflow

Example of an output file structure:

Duration: 00:00:31.40, start: 0.000000, bitrate: 816 kb/s
Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 960x540 [SAR 1:1 DAR 16:9], 663 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
Stream #0:1[0x2](zho): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)
Stream #0:2[0x3](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)
Stream #0:3[0x4](jpn): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 46 kb/s (default)

Prerequisites

IMS is activated. For details, see Activate IMS.

Configuration

Basic configuration

Storage: Associate an Object Storage Service (OSS) bucket with IMS. For more information, see Configure storage address.
Callback: Configure an HTTP or MNS callback to receive task status notifications. For callback methods and events, see Overview.

Transcoding template configuration

Procedure

Example requirements

Codec: H.264, H.265

Resolution: 360P, 540P, 720P, 1080P

Audio: HE-AAC, 64 Kbps (default)

Example configuration

This example shows how to configure transcoding templates for the four required video resolutions. To learn how to create a template, see Create a transcoding template.

Note

To perform Narrowband HD™ transcoding, create a basic template based on the following table. Then, submit a ticket for backend upgrade.

H.264

Template	Codec	Container format	Other parameters
Video-360P	H.264	.mp4	Resolution (long edge fixed): 640px Configure other parameters as needed.
Video-540P	H.264	.mp4	Resolution (long edge fixed): 960px Configure other parameters as needed.
Video-720P	H.264	.mp4	Resolution (long edge fixed): 1280px Configure other parameters as needed.
Video-1080P	H.264	.mp4	Resolution (long edge fixed): 1920px Configure other parameters as needed.

H.265

Template	Codec	Container format	Other parameters
Video-360P	H.265	.mp4	Resolution (long edge fixed): 640px Configure other parameters as needed.
Video-540P	H.265	.mp4	Resolution (long edge fixed): 960px Configure other parameters as needed.
Video-720P	H.265	.mp4	Resolution (long edge fixed): 1280px Configure other parameters as needed.
Video-1080P	H.265	.mp4	Resolution (long edge fixed): 1920px Configure other parameters as needed.

Submit a transcoding task

Call SubmitMediaConvertJob to submit a transcoding task.

Audios parameter

Field	Type	Description
InputRef	String	The name of the input stream to use for this audio track. It must match a Name defined in the Inputs or AudioSelector array.
LanguageControl	String	Specifies how the language tag for the output audio track is determined. Valid values: InputFirst: Inherits the language tag from the input stream. If the input has no language tag, the tag specified in the Language parameter is used. Configured: Uses the language tag specified in the Language parameter. None (Default): Does not add a language tag to the output track.
Language	String	The language code to apply to the audio track. It must be a valid ISO 639-2 code.
Remove	String	Specifies whether to remove the audio track from the output file.
Codec	String	The audio codec.
Profile	String	The audio codec profile.
Bitrate	String	The audio bitrate of the output file.
Samplerate	String	The audio sample rate.
Channels	String	The number of audio channels.
Volume	Object	The volume control settings.

Scenario 1: Retain original audio track

This example shows how to combine a video with its original audio and two additional language tracks into a single output file.

In the Inputs array, three sources are defined: the video file with default audio track (video), the English audio file (EnglishAudio), and the Japanese audio file (JapaneseAudio).
In OutputGroups.GroupConfig, Type is set to File, indicating that the output group is muxed into a single container file.
In OutputGroups.Outputs.OverrideParams, Audios array defines the multiple audio tracks for the output. For each track in the array:
- InputRef references the source file from the Inputs array.
- LanguageControl determines how the language tag is set for that track.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Chinese>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 English>"}
    },
    {
      "Name": "JapaneseAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Japanese>"}
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<Bucket>.<Public Endpoint>/<URI>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "video",
                "LanguageControl": "InputFirst"
              }, {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "Configured",
                "Language": "jpn"
              }
            ]
          }
        }
      ]
    }
  ]
}

Scenario 2: Remove original audio track

Compared to Scenario 1, this configuration in the Audios array removes any reference to the audio of the original video input.

As a result, the source video's audio will be excluded from the final output. The output will be muxed with only the specified English and Japanese audio tracks.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Chinese>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 English>"}
    },
    {
      "Name": "JapaneseAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Japanese>"}
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<Bucket>.<Public Endpoint>/<URI>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "Configured",
                "Language": "jpn"
              }
            ]
          }
        }
      ]
    }
  ]
}

Scenario 3: Select an audio track by language tag

This example shows how to use the AudioSelector parameter to select a specific audio track from an input file. Here, it selects the track tagged as Japanese (jpn) from the JapaneseFile input.

{
  "Inputs": [
    {
      "Name": "video",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Chinese>"}
    },
    {
      "Name": "EnglishAudio",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 English>"}
    },
    {
      "Name": "JapaneseFile",
      "InputFile": {"Type": "OSS", "Media": "https://<Bucket>.<Public Endpoint>/<Video1 Japanese>"},
      "AudioSelector": [{
        "Name": "JapaneseFile",
        "Rule": "tag",
        "TagConfig": {"language": "jpn"}
      }]
    }
  ],
  "OutputGroups": [
    {
      "GroupConfig": {
        "Type": "File",
        "OutputFileBase": {
          "Type": "OSS",
          "Media": "https://<Bucket>.<Public Endpoint>/<URI>/"
        }
      },
      "Outputs": [
        {
          "Name": "360P",
          "OutputFileName": "video/360p/360p",
          "TemplateId": "Video-360P",
          "OverrideParams": {
            "Audios": [
              {
                "InputRef": "video",
                "LanguageControl": "InputFirst"
              }, {
                "InputRef": "EnglishAudio",
                "LanguageControl": "Configured",
                "Language": "eng"
              }, {
                "InputRef": "JapaneseAudio",
                "LanguageControl": "InputFirst"
              }
            ]
          }
        }
      ]
    }
  ]
}

Query transcoding results

Call GetMediaConvertJob to retrieve the details of a transcoding task.

Callback events

Event type: MediaConvertComplete

Configuration method: This event cannot be configured in the console. Configure it by calling SetEventCallback.

Key callback parameters

Parameter	Type	Required	Description
Name	String	Yes	The name of the main task.
JobId	String	Yes	The ID of the task.
Status	String	Yes	The task status. `Success` indicates that at least one output (subtask) succeeded.
TriggerSource	String	No	The source that triggered the task. `API` indicates the task was submitted via an API call.
FinishTime	String	No	The time the task was completed, in UTC format.
UserData	string	No	A custom string specified when submitting the task. It is passed through and returned in the callback.

Example

{
	"FinishTime": "2025-05-09T08:03:21Z",
	"JobId": "5d37357cb3a44d10ba33c52760c896cd",
	"Status": "Success",
	"TriggerSource": "IceWorkflow",
	"UserData": "{\"ImsSrc\":\"Workflow\",\"TaskId\":\"e89a955d88ca47f0b9b79c562e5c622f\"}"
}