All Products
Search
Document Center

Object Storage Service:Audio merging

Last Updated:May 07, 2026

You can merge multiple audio files into a single audio file and convert it to your desired format. This topic describes the parameters and provides examples for audio merging.

Use cases

  • Music creation and production: Musicians and producers often merge separately recorded instrument and vocal tracks to create a complete song.

  • Audiobook and voice content creation: In audiobook production, narrated audio is often merged by chapter to ensure a continuous story.

  • Film and television post-production: In post-production, audio editors merge dialogue, narration, ambient sound effects, and background music to match the visuals.

  • Social media content creation: Users on short-form video platforms often merge sound effects, voice-overs, and background music to make their content more expressive.

Usage notes

  • Audio merging supports only asynchronous processing (using the x-oss-async-process method).

  • Before performing audio merging, you must bind an IMM project. For instructions on how to bind a project in the console or by using an API, see quick start and Bind an OSS bucket.

  • Anonymous access will be denied.

  • You must have the required permissions to use the feature. For more information, see permissions.

  • Using the default sampling rate or number of audio channels may cause the merge to fail due to compatibility issues with the target audio container format.

  • You can merge up to 11 audio files at a time.

Parameters

Action: audio/concat

The following tables describe the parameters.

Merging parameters

The sequence of the pre and sur parameters in the request string determines the merge order for audio/concat:

  • /pre: The audio file to be prepended.

  • /sur: The audio file to be appended.

Parameter

Type

Required

Description

ss

int

No

The start time of the audio clip to merge, in milliseconds. Valid values:

  • 0 (default): Merging starts from the beginning of the clip.

  • A value greater than 0: Merging starts at the ss-th millisecond.

t

int

No

The duration of the prepended or appended audio clip to merge, in milliseconds. Valid values:

  • 0 (default): Merging continues to the end of the clip.

  • A value greater than 0: Merging lasts for t milliseconds.

o

string

Yes

An OSS object in the current bucket. The object name must be URL-safe Base64 encoded.

Transcoding parameters

Parameter

Type

Required

Description

ss

int

No

The start time for transcoding the main audio file, in milliseconds. Valid values:

  • 0 (default): Transcoding starts from the beginning of the audio.

  • A value greater than 0: Transcoding starts at the ss-th millisecond.

t

int

No

The duration for transcoding the main audio file, in milliseconds. Valid values:

  • 0 (default): Transcoding continues to the end of the audio.

  • A value greater than 0: Transcoding lasts for t milliseconds.

f

string

Yes

Audio container format:

  • mp3

  • aac

  • flac

  • oga

  • ac3

  • opus

  • amr

ar

int

No

The audio sampling rate. By default, this matches the rate of the source audio file specified by align. Valid values:

  • 8000

  • 11025

  • 12000

  • 16000

  • 22050

  • 24000

  • 32000

  • 44100

  • 48000

  • 64000

  • 88200

  • 96000

Note

Supported sampling rates vary by format. mp3 supports 48 kHz and lower; opus supports 8 kHz, 12 kHz, 16 kHz, 24 kHz, and 48 kHz; ac3 supports 32 kHz, 44.1 kHz, and 48 kHz; amr supports only 8 kHz and 16 kHz.

ac

int

No

The number of audio channels. By default, this matches the channel count of the source audio file specified by the align parameter. Valid values: 1 to 8.

Note

The supported number of channels varies by format. mp3 supports mono and stereo; ac3 supports up to 6 channels (5.1); amr supports only mono.

aq

int

No

Audio compression quality. Valid values: 0 to 100.

Note

This parameter is mutually exclusive with ab. If neither is set, the encoder uses its default bitrate.

ab

int

No

Audio bitrate, in bit/s (bps). Valid values: 1000 to 10000000.

abopt

string

No

Audio bitrate option. Valid values:

  • 0 (default): Always use the target audio bitrate.

  • 1: If a source file's bitrate is lower than the ab value, the lowest bitrate among the source files is used.

  • 2: If a source file's bitrate is lower than the ab value, the operation fails.

Note

This parameter must be used with the ab parameter.

align

int

No

The index of the main audio file in the merge list. Default transcoding parameters are sourced from this file. The default value is 0, which indicates the first audio file in the merge list.

adepth

int

No

Audio sampling bit depth. Valid values: 16 and 24.

Note

This parameter takes effect only when f is set to flac.

Note

Audio merging also uses the sys/saveas and notify parameters. For more information, see save as and Message notification.

REST API

Merge audio to AAC

  • Audio files: pre1.mp3, pre2.wav, example.oga, sur1.aac, sur2.wma

  • Merge duration and order:

    Audio name

    Order

    Duration

    pre1.mp3

    1

    Entire audio

    pre2.wav

    2

    First 2 seconds

    example.oga

    3

    Entire audio

    sur1.aac

    4

    4s to 10s

    sur2.wma

    5

    10s to end

  • Transcoding completion notification: Send a Message Service (MNS) message.

  • Output audio specifications

    • Audio format: aac

    • Audio profile: 44.1 kHz sampling rate, mono

    • Audio bitrate: 96 kbps

    • Object storage path

      • AAC file: oss://outbucket/outobj.aac

Sample request

// Perform audio merging on the example.oga object.
POST /example.oga?x-oss-async-process HTTP/1.1
Host: video-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: OSS4-HMAC-SHA256 Credential=LTAI********************/20250417/cn-hangzhou/oss/aliyun_v4_request,Signature=a7c3554c729d71929e0b84489addee6b2e8d5cb48595adfc51868c299c0c218e

x-oss-async-process=audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_cHJlMS5tcDMK/pre,o_cHJlMi53YXYK,t_2000/sur,o_c3VyMS5hYWMK,ss_4000,t_10000/sur,o_c3VyMi53bWEK,ss_10000|sys/saveas,b_b3V0YnVja2V0,o_b3V0b2JqLnthdXRvZXh0fQo/notify,topic_QXVkaW9Db252ZXJ0

OSS SDKs

You can perform audio merging by using asynchronous processing with the OSS SDK for Java, Python, or Go.

Prerequisites

  1. Ensure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.

  2. Specify the bucket name, for example, examplebucket.

  3. Specify the name of the output audio file, for example, dest.aac.

  4. Specify the names of the source audio files to merge, for example, src1.mp3 and src2.mp3.

Java

OSS SDK for Java 3.17.4 or later is required.

import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;

import java.nio.charset.StandardCharsets;
import java.util.Base64;

public class Demo {

    public static void main(String[] args) throws ClientException {
        // Set endpoint to the endpoint of the region where the bucket is located.
        String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
        // Specify the Alibaba Cloud region ID, for example, cn-hangzhou.
        String region = "cn-hangzhou";
        // Obtain access credentials from environment variables. Before running this sample code, make sure the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
        EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Specify the bucket name.
        String bucketName = "examplebucket";
        // Specify the name of the output audio file.
        String targetAudio = "dest.aac";
        // Specify the names of the audio files to be merged.
        String audio1 = "src1.mp3";
        String audio2 = "src2.mp3";

        // Create an OSSClient instance.
        // When you are finished, shut down the OSSClient to release resources.
        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
        OSS ossClient = OSSClientBuilder.create()
                .endpoint(endpoint)
                .credentialsProvider(credentialsProvider)
                .clientConfiguration(clientBuilderConfiguration)
                .region(region)
                .build();

        try {
            // Build the style string for audio processing and the audio merging parameters.
            String audio1Encoded = Base64.getUrlEncoder().encodeToString(audio1.getBytes(StandardCharsets.UTF_8)).replace("=", "");
            String audio2Encoded = Base64.getUrlEncoder().encodeToString(audio2.getBytes(StandardCharsets.UTF_8)).replace("=", "");
            String style = String.format("audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0", audio1Encoded, audio2Encoded);

            // Build the asynchronous processing instruction.
            String bucketEncoded = Base64.getUrlEncoder().encodeToString(bucketName.getBytes(StandardCharsets.UTF_8)).replace("=", "");
            String targetEncoded = Base64.getUrlEncoder().encodeToString(targetAudio.getBytes(StandardCharsets.UTF_8)).replace("=", "");
            String process = String.format("%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0", style, bucketEncoded, targetEncoded);

            // Create an AsyncProcessObjectRequest object.
            AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, audio1, process);
            // Execute the asynchronous processing task.
            AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
            System.out.println("EventId: " + response.getEventId());
            System.out.println("RequestId: " + response.getRequestId());
            System.out.println("TaskId: " + response.getTaskId());

        } finally {
            // Shut down the OSSClient.
            ossClient.shutdown();
        }
    }
}

Python

OSS SDK for Python 2.18.4 or later is required.

# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider


def main():
    # Obtain access credentials from environment variables. Before running this sample code, make sure the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
    auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

    # Set endpoint to the endpoint of the region where the bucket is located. For example, for China (Hangzhou), set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
    endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'

    # Specify the Alibaba Cloud region ID, for example, cn-hangzhou.
    region = 'cn-hangzhou'

    # Specify the bucket name, for example, examplebucket.
    bucket = oss2.Bucket(auth, endpoint, 'examplebucket', region=region)

    # Specify the name of the output audio file.
    target_audio = 'dest.aac'

    # Specify the names of the audio files to merge.
    audio1 = 'src1.mp3'
    audio2 = 'src2.mp3'

    # Build the style string for audio processing and the audio merging parameters.
    audio1_encoded = base64.urlsafe_b64encode(audio1.encode()).decode().rstrip('=')
    audio2_encoded = base64.urlsafe_b64encode(audio2.encode()).decode().rstrip('=')
    style = f"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_{audio1_encoded}/pre,o_{audio2_encoded},t_0"

    # Build the asynchronous processing instruction.
    bucket_encoded = base64.urlsafe_b64encode(bucket.bucket_name.encode()).decode().rstrip('=')
    target_encoded = base64.urlsafe_b64encode(target_audio.encode()).decode().rstrip('=')
    process = f"{style}|sys/saveas,b_{bucket_encoded},o_{target_encoded}/notify,topic_QXVkaW9Db252ZXJ0"

    print(process)

    # Execute the asynchronous processing task.
    try:
        result = bucket.async_process_object(audio1, process)
        print(f"EventId: {result.event_id}")
        print(f"RequestId: {result.request_id}")
        print(f"TaskId: {result.task_id}")
    except Exception as e:
        print(f"Error: {e}")


if __name__ == "__main__":
    main()

Go

OSS SDK for Go 3.0.2 or later is required.

package main

import (
	"encoding/base64"
	"fmt"
	"log"
	"os"

	"github.com/aliyun/aliyun-oss-go-sdk/oss"
)
func main() {
	// Obtain access credentials from environment variables. Before running this sample code, make sure the OSS_ACCESS_KEY_ID, OSS_ACCESS_KEY_SECRET, and OSS_SESSION_TOKEN environment variables are set.
	provider, err := oss.NewEnvironmentVariableCredentialsProvider()
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}
	// Create an OSSClient instance.
	// Set endpoint to the endpoint of the region where your bucket is located. For example, for China (Hangzhou), set it to https://oss-cn-hangzhou.aliyuncs.com.
	// Set region to the Alibaba Cloud region ID, for example, cn-hangzhou.
	client, err := oss.New("https://oss-cn-hangzhou.aliyuncs.com", "", "", oss.SetCredentialsProvider(&provider), oss.AuthVersion(oss.AuthV4), oss.Region("cn-hangzhou"))
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}
	// Specify the bucket name, for example, examplebucket.
	bucketName := "examplebucket"

	bucket, err := client.Bucket(bucketName)
	if err != nil {
		fmt.Println("Error:", err)
		os.Exit(-1)
	}

	// Specify the names of the audio files to be merged.
	audio1 := "src1.mp3"
	audio2 := "src2.mp3"
	// Specify the name of the output audio file.
	targetAudio := "dest.aac"

	// Build the style string for audio processing and the audio merging parameters.
	audio1Encoded := base64.URLEncoding.EncodeToString([]byte(audio1))
	audio2Encoded := base64.URLEncoding.EncodeToString([]byte(audio2))
	style := fmt.Sprintf("audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0", audio1Encoded, audio2Encoded)

	// Build the asynchronous processing instruction.
	bucketEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
	targetEncoded := base64.URLEncoding.EncodeToString([]byte(targetAudio))
	process := fmt.Sprintf("%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0", style, bucketEncoded, targetEncoded)

	// Execute the asynchronous processing task.
	result, err := bucket.AsyncProcessObject(audio1, process)
	if err != nil {
		log.Fatalf("Failed to async process object: %s", err)
	}

	fmt.Printf("EventId: %s\n", result.EventId)
	fmt.Printf("RequestId: %s\n", result.RequestId)
	fmt.Printf("TaskId: %s\n", result.TaskId)
}