Merge multiple audio objects into a single audio object in the specified format-Audio merging - Object Storage Service

Audio merging concatenates multiple audio objects stored in OSS into a single output file. Each source clip is placed end-to-end in the order you define, then transcoded to the target format in one asynchronous operation.

Use audio merging to automate post-production tasks — assembling narrated chapters, combining multi-track recordings, or stitching dialogue and ambient sound — without running local audio processing tools.

Use cases

Music production: Combine individual instrumental and vocal recordings into a complete track.
Audiobook production: Stitch narrated chapters into a seamless, full-length audio file.
Film and TV post-production: Assemble dialogue, voice-over, ambient sound, and music into a unified audio mix.
Social media content: Layer sound effects, voice-overs, and background audio to enhance video content.

Prerequisites

Before you begin, ensure that you have:

An OSS bucket bound to an Intelligent Media Management (IMM) project. See Quick start (OSS console) or AttachOSSBucket (IMM API) for setup instructions.
The required permissions to use audio merging. See Permissions.
Authenticated access — anonymous requests are denied.

How it works

Audio merging uses the audio/concat action with the x-oss-async-process header. All requests are asynchronous.

Clip ordering is controlled by the /pre and /sur markers in the request string:

/pre — places the clip before all previously listed clips.
/sur — places the clip after all previously listed clips.

The sequence of /pre and /sur entries in the request string determines the final output order. Each clip can also be trimmed using ss (start offset) and t (duration) before it is concatenated.

After concatenation, the merged audio is transcoded to the output format you specify. Use sys/saveas to write the result to a target OSS object, and notify to receive a completion notification.

You can merge up to 11 audio objects per request.

If you leave the sampling rate (ar) or number of sound channels (ac) at their defaults, the output inherits these values from the source audio identified by align. Mismatches between default values and the target container format may cause the merge to fail — specify ar and ac explicitly to avoid this.

Merge audio objects

Quick start

The following Python example merges two MP3 files and saves the result as an AAC file. Use it as a starting point, then adjust parameters to fit your use case.

OSS SDK for Python V2.18.4 or later is required. Set the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables before running this code.

# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider


def main():
    # Load access credentials from environment variables.
    auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

    # Specify the endpoint of the region where the bucket is located.
    endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
    # Specify the region ID. Example: cn-hangzhou.
    region = 'cn-hangzhou'

    bucket_name = 'examplebucket'
    bucket = oss2.Bucket(auth, endpoint, bucket_name, region=region)

    # Specify the output audio object name.
    target_audio = 'dest.aac'
    # Specify the source audio objects to merge.
    audio1 = 'src1.mp3'
    audio2 = 'src2.mp3'

    # Encode source audio object names in URL-safe Base64.
    audio1_encoded = base64.urlsafe_b64encode(audio1.encode()).decode().rstrip('=')
    audio2_encoded = base64.urlsafe_b64encode(audio2.encode()).decode().rstrip('=')

    # Build the audio/concat process string.
    style = f"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_{audio1_encoded}/pre,o_{audio2_encoded},t_0"

    # Append sys/saveas and notify parameters.
    bucket_encoded = base64.urlsafe_b64encode(bucket_name.encode()).decode().rstrip('=')
    target_encoded = base64.urlsafe_b64encode(target_audio.encode()).decode().rstrip('=')
    process = f"{style}|sys/saveas,b_{bucket_encoded},o_{target_encoded}/notify,topic_QXVkaW9Db252ZXJ0"

    # Submit the asynchronous merge task.
    try:
        result = bucket.async_process_object(audio1, process)
        print(f"EventId: {result.event_id}")
        print(f"RequestId: {result.request_id}")
        print(f"TaskId: {result.task_id}")
    except Exception as e:
        print(f"Error: {e}")


if __name__ == "__main__":
    main()

For Java and Go examples, see SDK examples.

Action

audio/concat

Parameters for merging

The /pre and /sur markers each represent one audio clip. Each marker is followed by one or more parameters that describe that clip.

Parameter	Type	Required	Description
`o`	string	Yes	The name of the audio object in the OSS bucket. Must be URL-safe Base64 encoded.
`ss`	int	No	Start offset in milliseconds. `0` = from the beginning (default). A value greater than `0` = start from that point in the source audio.
`t`	int	No	Duration in milliseconds. `0` = to the end of the source audio (default). A value greater than `0` = use only that many milliseconds.

Parameters for transcoding

These parameters apply to the merged output as a whole.

Parameter	Type	Required	Description
`f`	string	Yes	Container format of the output audio. Valid values: `mp3`, `aac`, `flac`, `oga`, `ac3`, `opus`, `amr`.
`ss`	int	No	Start offset for transcoding in milliseconds. `0` = from the beginning (default).
`t`	int	No	Duration to transcode in milliseconds. `0` = to the end of the merged audio (default).
`ar`	int	No	Sampling rate of the output audio in Hz. Defaults to the sampling rate of the source audio identified by `align`. Valid values: `8000`, `11025`, `12000`, `16000`, `22050`, `24000`, `32000`, `44100`, `48000`, `64000`, `88200`, `96000`. See format-specific limits below.
`ac`	int	No	Number of sound channels. Defaults to the channel count of the source audio identified by `align`. Valid values: `1`–`8`. See format-specific limits below.
`ab`	int	No	Target audio bitrate in bit/s. Valid values: `1000`–`10000000`. Mutually exclusive with `aq`. If both are omitted, the codec's default bitrate is used.
`aq`	int	No	Audio compression quality. Valid values: `0`–`100`. Mutually exclusive with `ab`. If both are omitted, the codec's default bitrate is used.
`abopt`	string	No	Bitrate handling when a source clip's bitrate is lower than `ab`. Requires `ab`. Valid values: `0` = always use `ab` (default), `1` = use the lower bitrate, `2` = return a failure.
`align`	int	No	Index (0-based) of the source audio from which default transcoding parameters are obtained. Default: `0` (the first clip in the list).
`adepth`	int	No	Sampling bit depth of the output audio. Valid values: `16`, `24`. Applies only when `f` is `flac`.

Format-specific limits

Format	Max sampling rate	Max channels
MP3	48 kHz	2
Opus	8, 12, 16, 24, or 48 kHz	—
AC3	32, 44.1, or 48 kHz	6 (5.1)
AMR	8 or 16 kHz	1

Additional parameters

Use the following parameters alongside audio/concat:

sys/saveas — writes the output to a specified OSS object. Parameters: b_<bucket> and o_<object> (both URL-safe Base64 encoded). See sys/saveas.
notify — sends a completion notification. Parameter: topic_<topic> (URL-safe Base64 encoded). See Message notification.

Examples

Merge five clips into an AAC file (REST API)

This example merges five source audio objects in different formats into a single AAC output with a mono channel, 44.1 kHz sampling rate, and 96 Kbit/s bitrate. A Simple Message Queue (SMQ) notification is sent when the task completes.

Source clips and merge order

Audio object	Order	Segment used
pre1.mp3	1	Full duration
pre2.wav	2	First 2 seconds
example.oga	3	Full duration
sur1.aac	4	Seconds 4–10
sur2.wma	5	From second 10 to end

Output

Format: AAC
Sampling rate: 44.1 kHz, mono
Bitrate: 96 Kbit/s
Destination: oss://outbucket/outobj.aac

All object names below are URL-safe Base64 encoded. The align_2 parameter sets the default transcoding properties from example.oga (index 2).

POST /example.oga?x-oss-async-process HTTP/1.1
Host: video-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: OSS4-HMAC-SHA256 Credential=LTAI********************/20250417/cn-hangzhou/oss/aliyun_v4_request,Signature=a7c3554c729d71929e0b84489addee6b2e8d5cb48595adfc51868c299c0c218e

x-oss-async-process=audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_cHJlMS5tcDMK/pre,o_cHJlMi53YXYK,t_2000/sur,o_c3VyMS5hYWMK,ss_4000,t_10000/sur,o_c3VyMi53bWEK,ss_10000|sys/saveas,b_b3V0YnVja2V0,o_b3V0b2JqLnthdXRvZXh0fQo/notify,topic_QXVkaW9Db252ZXJ0

SDK examples

OSS SDKs support audio merging for Java, Python, and Go only.

Java

OSS SDK for Java V3.17.4 or later is required.

import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;

import java.nio.charset.StandardCharsets;
import java.util.Base64;

public class Demo {

    public static void main(String[] args) throws ClientException {
        // Specify the endpoint of the region where the bucket is located.
        String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
        // Specify the region ID. Example: cn-hangzhou.
        String region = "cn-hangzhou";
        // Load access credentials from environment variables.
        // Set OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET before running this code.
        EnvironmentVariableCredentialsProvider credentialsProvider =
                CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Specify the bucket name.
        String bucketName = "examplebucket";
        // Specify the output audio object name.
        String targetAudio = "dest.aac";
        // Specify the source audio objects to merge.
        String audio1 = "src1.mp3";
        String audio2 = "src2.mp3";

        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
        OSS ossClient = OSSClientBuilder.create()
                .endpoint(endpoint)
                .credentialsProvider(credentialsProvider)
                .clientConfiguration(clientBuilderConfiguration)
                .region(region)
                .build();

        try {
            // Encode source audio object names in URL-safe Base64.
            String audio1Encoded = Base64.getUrlEncoder()
                    .encodeToString(audio1.getBytes(StandardCharsets.UTF_8)).replace("=", "");
            String audio2Encoded = Base64.getUrlEncoder()
                    .encodeToString(audio2.getBytes(StandardCharsets.UTF_8)).replace("=", "");

            // Build the audio/concat process string.
            String style = String.format(
                    "audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0",
                    audio1Encoded, audio2Encoded);

            // Append sys/saveas and notify parameters.
            String bucketEncoded = Base64.getUrlEncoder()
                    .encodeToString(bucketName.getBytes(StandardCharsets.UTF_8)).replace("=", "");
            String targetEncoded = Base64.getUrlEncoder()
                    .encodeToString(targetAudio.getBytes(StandardCharsets.UTF_8)).replace("=", "");
            String process = String.format(
                    "%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0",
                    style, bucketEncoded, targetEncoded);

            // Submit the asynchronous merge task.
            AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, audio1, process);
            AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
            System.out.println("EventId: " + response.getEventId());
            System.out.println("RequestId: " + response.getRequestId());
            System.out.println("TaskId: " + response.getTaskId());

        } finally {
            ossClient.shutdown();
        }
    }
}

Python

OSS SDK for Python V2.18.4 or later is required.

# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider


def main():
    # Load access credentials from environment variables.
    # Set OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET before running this code.
    auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

    # Specify the endpoint of the region where the bucket is located.
    endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
    # Specify the region ID. Example: cn-hangzhou.
    region = 'cn-hangzhou'

    bucket_name = 'examplebucket'
    bucket = oss2.Bucket(auth, endpoint, bucket_name, region=region)

    # Specify the output audio object name.
    target_audio = 'dest.aac'
    # Specify the source audio objects to merge.
    audio1 = 'src1.mp3'
    audio2 = 'src2.mp3'

    # Encode source audio object names in URL-safe Base64.
    audio1_encoded = base64.urlsafe_b64encode(audio1.encode()).decode().rstrip('=')
    audio2_encoded = base64.urlsafe_b64encode(audio2.encode()).decode().rstrip('=')

    # Build the audio/concat process string.
    style = f"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_{audio1_encoded}/pre,o_{audio2_encoded},t_0"

    # Append sys/saveas and notify parameters.
    bucket_encoded = base64.urlsafe_b64encode(bucket_name.encode()).decode().rstrip('=')
    target_encoded = base64.urlsafe_b64encode(target_audio.encode()).decode().rstrip('=')
    process = f"{style}|sys/saveas,b_{bucket_encoded},o_{target_encoded}/notify,topic_QXVkaW9Db252ZXJ0"

    # Submit the asynchronous merge task.
    try:
        result = bucket.async_process_object(audio1, process)
        print(f"EventId: {result.event_id}")
        print(f"RequestId: {result.request_id}")
        print(f"TaskId: {result.task_id}")
    except Exception as e:
        print(f"Error: {e}")


if __name__ == "__main__":
    main()

Go

OSS SDK for Go V3.0.2 or later is required.

package main

import (
	"encoding/base64"
	"fmt"
	"log"
	"os"

	"github.com/aliyun/aliyun-oss-go-sdk/oss"
)

func main() {
	// Load access credentials from environment variables.
	// Set OSS_ACCESS_KEY_ID, OSS_ACCESS_KEY_SECRET, and OSS_SESSION_TOKEN before running this code.
	provider, err := oss.NewEnvironmentVariableCredentialsProvider()
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}

	// Create an OSS client.
	// Specify the endpoint of the region where the bucket is located.
	client, err := oss.New(
		"https://oss-cn-hangzhou.aliyuncs.com", "", "",
		oss.SetCredentialsProvider(&provider),
		oss.AuthVersion(oss.AuthV4),
		oss.Region("cn-hangzhou"),
	)
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}

	// Specify the bucket name.
	bucketName := "examplebucket"
	bucket, err := client.Bucket(bucketName)
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}

	// Specify the source audio objects to merge and the output object name.
	audio1 := "src1.mp3"
	audio2 := "src2.mp3"
	targetAudio := "dest.aac"

	// Encode source audio object names in URL-safe Base64.
	audio1Encoded := base64.URLEncoding.EncodeToString([]byte(audio1))
	audio2Encoded := base64.URLEncoding.EncodeToString([]byte(audio2))

	// Build the audio/concat process string.
	style := fmt.Sprintf(
		"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0",
		audio1Encoded, audio2Encoded,
	)

	// Append sys/saveas and notify parameters.
	bucketEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
	targetEncoded := base64.URLEncoding.EncodeToString([]byte(targetAudio))
	process := fmt.Sprintf(
		"%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0",
		style, bucketEncoded, targetEncoded,
	)

	// Submit the asynchronous merge task.
	result, err := bucket.AsyncProcessObject(audio1, process)
	if err != nil {
		log.Fatalf("Failed to submit async process task: %s", err)
	}

	fmt.Printf("EventId: %s\n", result.EventId)
	fmt.Printf("RequestId: %s\n", result.RequestId)
	fmt.Printf("TaskId: %s\n", result.TaskId)
}

What's next

sys/saveas — save the output to a specific OSS object path.
Message notification — receive a callback when an async task completes.
Permissions — configure access control for audio processing operations.