Audio merging concatenates multiple audio objects stored in OSS into a single output file. Each source clip is placed end-to-end in the order you define, then transcoded to the target format in one asynchronous operation.
Use audio merging to automate post-production tasks — assembling narrated chapters, combining multi-track recordings, or stitching dialogue and ambient sound — without running local audio processing tools.
Use cases
Music production: Combine individual instrumental and vocal recordings into a complete track.
Audiobook production: Stitch narrated chapters into a seamless, full-length audio file.
Film and TV post-production: Assemble dialogue, voice-over, ambient sound, and music into a unified audio mix.
Social media content: Layer sound effects, voice-overs, and background audio to enhance video content.
Prerequisites
Before you begin, ensure that you have:
An OSS bucket bound to an Intelligent Media Management (IMM) project. See Quick start (OSS console) or AttachOSSBucket (IMM API) for setup instructions.
The required permissions to use audio merging. See Permissions.
Authenticated access — anonymous requests are denied.
How it works
Audio merging uses the audio/concat action with the x-oss-async-process header. All requests are asynchronous.
Clip ordering is controlled by the /pre and /sur markers in the request string:
/pre— places the clip before all previously listed clips./sur— places the clip after all previously listed clips.
The sequence of /pre and /sur entries in the request string determines the final output order. Each clip can also be trimmed using ss (start offset) and t (duration) before it is concatenated.
After concatenation, the merged audio is transcoded to the output format you specify. Use sys/saveas to write the result to a target OSS object, and notify to receive a completion notification.
You can merge up to 11 audio objects per request.
If you leave the sampling rate (ar) or number of sound channels (ac) at their defaults, the output inherits these values from the source audio identified byalign. Mismatches between default values and the target container format may cause the merge to fail — specifyarandacexplicitly to avoid this.
Merge audio objects
Quick start
The following Python example merges two MP3 files and saves the result as an AAC file. Use it as a starting point, then adjust parameters to fit your use case.
OSS SDK for Python V2.18.4 or later is required. Set the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables before running this code.
# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
def main():
# Load access credentials from environment variables.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region where the bucket is located.
endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
# Specify the region ID. Example: cn-hangzhou.
region = 'cn-hangzhou'
bucket_name = 'examplebucket'
bucket = oss2.Bucket(auth, endpoint, bucket_name, region=region)
# Specify the output audio object name.
target_audio = 'dest.aac'
# Specify the source audio objects to merge.
audio1 = 'src1.mp3'
audio2 = 'src2.mp3'
# Encode source audio object names in URL-safe Base64.
audio1_encoded = base64.urlsafe_b64encode(audio1.encode()).decode().rstrip('=')
audio2_encoded = base64.urlsafe_b64encode(audio2.encode()).decode().rstrip('=')
# Build the audio/concat process string.
style = f"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_{audio1_encoded}/pre,o_{audio2_encoded},t_0"
# Append sys/saveas and notify parameters.
bucket_encoded = base64.urlsafe_b64encode(bucket_name.encode()).decode().rstrip('=')
target_encoded = base64.urlsafe_b64encode(target_audio.encode()).decode().rstrip('=')
process = f"{style}|sys/saveas,b_{bucket_encoded},o_{target_encoded}/notify,topic_QXVkaW9Db252ZXJ0"
# Submit the asynchronous merge task.
try:
result = bucket.async_process_object(audio1, process)
print(f"EventId: {result.event_id}")
print(f"RequestId: {result.request_id}")
print(f"TaskId: {result.task_id}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()For Java and Go examples, see SDK examples.
Action
audio/concatParameters for merging
The /pre and /sur markers each represent one audio clip. Each marker is followed by one or more parameters that describe that clip.
| Parameter | Type | Required | Description |
|---|---|---|---|
o | string | Yes | The name of the audio object in the OSS bucket. Must be URL-safe Base64 encoded. |
ss | int | No | Start offset in milliseconds. 0 = from the beginning (default). A value greater than 0 = start from that point in the source audio. |
t | int | No | Duration in milliseconds. 0 = to the end of the source audio (default). A value greater than 0 = use only that many milliseconds. |
Parameters for transcoding
These parameters apply to the merged output as a whole.
| Parameter | Type | Required | Description |
|---|---|---|---|
f | string | Yes | Container format of the output audio. Valid values: mp3, aac, flac, oga, ac3, opus, amr. |
ss | int | No | Start offset for transcoding in milliseconds. 0 = from the beginning (default). |
t | int | No | Duration to transcode in milliseconds. 0 = to the end of the merged audio (default). |
ar | int | No | Sampling rate of the output audio in Hz. Defaults to the sampling rate of the source audio identified by align. Valid values: 8000, 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000, 64000, 88200, 96000. See format-specific limits below. |
ac | int | No | Number of sound channels. Defaults to the channel count of the source audio identified by align. Valid values: 1–8. See format-specific limits below. |
ab | int | No | Target audio bitrate in bit/s. Valid values: 1000–10000000. Mutually exclusive with aq. If both are omitted, the codec's default bitrate is used. |
aq | int | No | Audio compression quality. Valid values: 0–100. Mutually exclusive with ab. If both are omitted, the codec's default bitrate is used. |
abopt | string | No | Bitrate handling when a source clip's bitrate is lower than ab. Requires ab. Valid values: 0 = always use ab (default), 1 = use the lower bitrate, 2 = return a failure. |
align | int | No | Index (0-based) of the source audio from which default transcoding parameters are obtained. Default: 0 (the first clip in the list). |
adepth | int | No | Sampling bit depth of the output audio. Valid values: 16, 24. Applies only when f is flac. |
Format-specific limits
| Format | Max sampling rate | Max channels |
|---|---|---|
| MP3 | 48 kHz | 2 |
| Opus | 8, 12, 16, 24, or 48 kHz | — |
| AC3 | 32, 44.1, or 48 kHz | 6 (5.1) |
| AMR | 8 or 16 kHz | 1 |
Additional parameters
Use the following parameters alongside audio/concat:
sys/saveas— writes the output to a specified OSS object. Parameters:b_<bucket>ando_<object>(both URL-safe Base64 encoded). See sys/saveas.notify— sends a completion notification. Parameter:topic_<topic>(URL-safe Base64 encoded). See Message notification.
Examples
Merge five clips into an AAC file (REST API)
This example merges five source audio objects in different formats into a single AAC output with a mono channel, 44.1 kHz sampling rate, and 96 Kbit/s bitrate. A Simple Message Queue (SMQ) notification is sent when the task completes.
Source clips and merge order
| Audio object | Order | Segment used |
|---|---|---|
| pre1.mp3 | 1 | Full duration |
| pre2.wav | 2 | First 2 seconds |
| example.oga | 3 | Full duration |
| sur1.aac | 4 | Seconds 4–10 |
| sur2.wma | 5 | From second 10 to end |
Output
Format: AAC
Sampling rate: 44.1 kHz, mono
Bitrate: 96 Kbit/s
Destination:
oss://outbucket/outobj.aac
All object names below are URL-safe Base64 encoded. The align_2 parameter sets the default transcoding properties from example.oga (index 2).
POST /example.oga?x-oss-async-process HTTP/1.1
Host: video-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: OSS4-HMAC-SHA256 Credential=LTAI********************/20250417/cn-hangzhou/oss/aliyun_v4_request,Signature=a7c3554c729d71929e0b84489addee6b2e8d5cb48595adfc51868c299c0c218e
x-oss-async-process=audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_cHJlMS5tcDMK/pre,o_cHJlMi53YXYK,t_2000/sur,o_c3VyMS5hYWMK,ss_4000,t_10000/sur,o_c3VyMi53bWEK,ss_10000|sys/saveas,b_b3V0YnVja2V0,o_b3V0b2JqLnthdXRvZXh0fQo/notify,topic_QXVkaW9Db252ZXJ0SDK examples
OSS SDKs support audio merging for Java, Python, and Go only.
Java
OSS SDK for Java V3.17.4 or later is required.
import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
public class Demo {
public static void main(String[] args) throws ClientException {
// Specify the endpoint of the region where the bucket is located.
String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// Specify the region ID. Example: cn-hangzhou.
String region = "cn-hangzhou";
// Load access credentials from environment variables.
// Set OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET before running this code.
EnvironmentVariableCredentialsProvider credentialsProvider =
CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the bucket name.
String bucketName = "examplebucket";
// Specify the output audio object name.
String targetAudio = "dest.aac";
// Specify the source audio objects to merge.
String audio1 = "src1.mp3";
String audio2 = "src2.mp3";
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Encode source audio object names in URL-safe Base64.
String audio1Encoded = Base64.getUrlEncoder()
.encodeToString(audio1.getBytes(StandardCharsets.UTF_8)).replace("=", "");
String audio2Encoded = Base64.getUrlEncoder()
.encodeToString(audio2.getBytes(StandardCharsets.UTF_8)).replace("=", "");
// Build the audio/concat process string.
String style = String.format(
"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0",
audio1Encoded, audio2Encoded);
// Append sys/saveas and notify parameters.
String bucketEncoded = Base64.getUrlEncoder()
.encodeToString(bucketName.getBytes(StandardCharsets.UTF_8)).replace("=", "");
String targetEncoded = Base64.getUrlEncoder()
.encodeToString(targetAudio.getBytes(StandardCharsets.UTF_8)).replace("=", "");
String process = String.format(
"%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0",
style, bucketEncoded, targetEncoded);
// Submit the asynchronous merge task.
AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, audio1, process);
AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
System.out.println("EventId: " + response.getEventId());
System.out.println("RequestId: " + response.getRequestId());
System.out.println("TaskId: " + response.getTaskId());
} finally {
ossClient.shutdown();
}
}
}Python
OSS SDK for Python V2.18.4 or later is required.
# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
def main():
# Load access credentials from environment variables.
# Set OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET before running this code.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region where the bucket is located.
endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
# Specify the region ID. Example: cn-hangzhou.
region = 'cn-hangzhou'
bucket_name = 'examplebucket'
bucket = oss2.Bucket(auth, endpoint, bucket_name, region=region)
# Specify the output audio object name.
target_audio = 'dest.aac'
# Specify the source audio objects to merge.
audio1 = 'src1.mp3'
audio2 = 'src2.mp3'
# Encode source audio object names in URL-safe Base64.
audio1_encoded = base64.urlsafe_b64encode(audio1.encode()).decode().rstrip('=')
audio2_encoded = base64.urlsafe_b64encode(audio2.encode()).decode().rstrip('=')
# Build the audio/concat process string.
style = f"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_{audio1_encoded}/pre,o_{audio2_encoded},t_0"
# Append sys/saveas and notify parameters.
bucket_encoded = base64.urlsafe_b64encode(bucket_name.encode()).decode().rstrip('=')
target_encoded = base64.urlsafe_b64encode(target_audio.encode()).decode().rstrip('=')
process = f"{style}|sys/saveas,b_{bucket_encoded},o_{target_encoded}/notify,topic_QXVkaW9Db252ZXJ0"
# Submit the asynchronous merge task.
try:
result = bucket.async_process_object(audio1, process)
print(f"EventId: {result.event_id}")
print(f"RequestId: {result.request_id}")
print(f"TaskId: {result.task_id}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()Go
OSS SDK for Go V3.0.2 or later is required.
package main
import (
"encoding/base64"
"fmt"
"log"
"os"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
)
func main() {
// Load access credentials from environment variables.
// Set OSS_ACCESS_KEY_ID, OSS_ACCESS_KEY_SECRET, and OSS_SESSION_TOKEN before running this code.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Create an OSS client.
// Specify the endpoint of the region where the bucket is located.
client, err := oss.New(
"https://oss-cn-hangzhou.aliyuncs.com", "", "",
oss.SetCredentialsProvider(&provider),
oss.AuthVersion(oss.AuthV4),
oss.Region("cn-hangzhou"),
)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the bucket name.
bucketName := "examplebucket"
bucket, err := client.Bucket(bucketName)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the source audio objects to merge and the output object name.
audio1 := "src1.mp3"
audio2 := "src2.mp3"
targetAudio := "dest.aac"
// Encode source audio object names in URL-safe Base64.
audio1Encoded := base64.URLEncoding.EncodeToString([]byte(audio1))
audio2Encoded := base64.URLEncoding.EncodeToString([]byte(audio2))
// Build the audio/concat process string.
style := fmt.Sprintf(
"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0",
audio1Encoded, audio2Encoded,
)
// Append sys/saveas and notify parameters.
bucketEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
targetEncoded := base64.URLEncoding.EncodeToString([]byte(targetAudio))
process := fmt.Sprintf(
"%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0",
style, bucketEncoded, targetEncoded,
)
// Submit the asynchronous merge task.
result, err := bucket.AsyncProcessObject(audio1, process)
if err != nil {
log.Fatalf("Failed to submit async process task: %s", err)
}
fmt.Printf("EventId: %s\n", result.EventId)
fmt.Printf("RequestId: %s\n", result.RequestId)
fmt.Printf("TaskId: %s\n", result.TaskId)
}What's next
sys/saveas — save the output to a specific OSS object path.
Message notification — receive a callback when an async task completes.
Permissions — configure access control for audio processing operations.