You can merge multiple audio files into a single audio file and convert it to your desired format. This topic describes the parameters and provides examples for audio merging.
Use cases
-
Music creation and production: Musicians and producers often merge separately recorded instrument and vocal tracks to create a complete song.
-
Audiobook and voice content creation: In audiobook production, narrated audio is often merged by chapter to ensure a continuous story.
-
Film and television post-production: In post-production, audio editors merge dialogue, narration, ambient sound effects, and background music to match the visuals.
-
Social media content creation: Users on short-form video platforms often merge sound effects, voice-overs, and background music to make their content more expressive.
Usage notes
-
Audio merging supports only asynchronous processing (using the x-oss-async-process method).
-
Before performing audio merging, you must bind an IMM project. For instructions on how to bind a project in the console or by using an API, see quick start and Bind an OSS bucket.
-
Anonymous access will be denied.
-
You must have the required permissions to use the feature. For more information, see permissions.
-
Using the default sampling rate or number of audio channels may cause the merge to fail due to compatibility issues with the target audio container format.
-
You can merge up to 11 audio files at a time.
Parameters
Action: audio/concat
The following tables describe the parameters.
Merging parameters
The sequence of the pre and sur parameters in the request string determines the merge order for audio/concat:
-
/pre: The audio file to be prepended. -
/sur: The audio file to be appended.
|
Parameter |
Type |
Required |
Description |
|
ss |
int |
No |
The start time of the audio clip to merge, in milliseconds. Valid values:
|
|
t |
int |
No |
The duration of the prepended or appended audio clip to merge, in milliseconds. Valid values:
|
|
o |
string |
Yes |
An OSS object in the current bucket. The object name must be URL-safe Base64 encoded. |
Transcoding parameters
|
Parameter |
Type |
Required |
Description |
|
ss |
int |
No |
The start time for transcoding the main audio file, in milliseconds. Valid values:
|
|
t |
int |
No |
The duration for transcoding the main audio file, in milliseconds. Valid values:
|
|
f |
string |
Yes |
Audio container format:
|
|
ar |
int |
No |
The audio sampling rate. By default, this matches the rate of the source audio file specified by align. Valid values:
Note
Supported sampling rates vary by format. mp3 supports 48 kHz and lower; opus supports 8 kHz, 12 kHz, 16 kHz, 24 kHz, and 48 kHz; ac3 supports 32 kHz, 44.1 kHz, and 48 kHz; amr supports only 8 kHz and 16 kHz. |
|
ac |
int |
No |
The number of audio channels. By default, this matches the channel count of the source audio file specified by the align parameter. Valid values: 1 to 8. Note
The supported number of channels varies by format. mp3 supports mono and stereo; ac3 supports up to 6 channels (5.1); amr supports only mono. |
|
aq |
int |
No |
Audio compression quality. Valid values: 0 to 100. Note
This parameter is mutually exclusive with ab. If neither is set, the encoder uses its default bitrate. |
|
ab |
int |
No |
Audio bitrate, in bit/s (bps). Valid values: 1000 to 10000000. |
|
abopt |
string |
No |
Audio bitrate option. Valid values:
Note
This parameter must be used with the ab parameter. |
|
align |
int |
No |
The index of the main audio file in the merge list. Default transcoding parameters are sourced from this file. The default value is 0, which indicates the first audio file in the merge list. |
|
adepth |
int |
No |
Audio sampling bit depth. Valid values: 16 and 24. Note
This parameter takes effect only when f is set to flac. |
Audio merging also uses the sys/saveas and notify parameters. For more information, see save as and Message notification.
REST API
Merge audio to AAC
-
Audio files: pre1.mp3, pre2.wav, example.oga, sur1.aac, sur2.wma
-
Merge duration and order:
Audio name
Order
Duration
pre1.mp3
1
Entire audio
pre2.wav
2
First 2 seconds
example.oga
3
Entire audio
sur1.aac
4
4s to 10s
sur2.wma
5
10s to end
-
Transcoding completion notification: Send a Message Service (MNS) message.
-
Output audio specifications
-
Audio format: aac
-
Audio profile: 44.1 kHz sampling rate, mono
-
Audio bitrate: 96 kbps
-
Object storage path
-
AAC file: oss://outbucket/outobj.aac
-
-
Sample request
// Perform audio merging on the example.oga object.
POST /example.oga?x-oss-async-process HTTP/1.1
Host: video-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: OSS4-HMAC-SHA256 Credential=LTAI********************/20250417/cn-hangzhou/oss/aliyun_v4_request,Signature=a7c3554c729d71929e0b84489addee6b2e8d5cb48595adfc51868c299c0c218e
x-oss-async-process=audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_cHJlMS5tcDMK/pre,o_cHJlMi53YXYK,t_2000/sur,o_c3VyMS5hYWMK,ss_4000,t_10000/sur,o_c3VyMi53bWEK,ss_10000|sys/saveas,b_b3V0YnVja2V0,o_b3V0b2JqLnthdXRvZXh0fQo/notify,topic_QXVkaW9Db252ZXJ0
OSS SDKs
You can perform audio merging by using asynchronous processing with the OSS SDK for Java, Python, or Go.
Prerequisites
-
Ensure that the
OSS_ACCESS_KEY_IDandOSS_ACCESS_KEY_SECRETenvironment variables are set. -
Specify the bucket name, for example,
examplebucket. -
Specify the name of the output audio file, for example,
dest.aac. -
Specify the names of the source audio files to merge, for example,
src1.mp3andsrc2.mp3.
Java
OSS SDK for Java 3.17.4 or later is required.
import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
public class Demo {
public static void main(String[] args) throws ClientException {
// Set endpoint to the endpoint of the region where the bucket is located.
String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// Specify the Alibaba Cloud region ID, for example, cn-hangzhou.
String region = "cn-hangzhou";
// Obtain access credentials from environment variables. Before running this sample code, make sure the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the bucket name.
String bucketName = "examplebucket";
// Specify the name of the output audio file.
String targetAudio = "dest.aac";
// Specify the names of the audio files to be merged.
String audio1 = "src1.mp3";
String audio2 = "src2.mp3";
// Create an OSSClient instance.
// When you are finished, shut down the OSSClient to release resources.
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Build the style string for audio processing and the audio merging parameters.
String audio1Encoded = Base64.getUrlEncoder().encodeToString(audio1.getBytes(StandardCharsets.UTF_8)).replace("=", "");
String audio2Encoded = Base64.getUrlEncoder().encodeToString(audio2.getBytes(StandardCharsets.UTF_8)).replace("=", "");
String style = String.format("audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0", audio1Encoded, audio2Encoded);
// Build the asynchronous processing instruction.
String bucketEncoded = Base64.getUrlEncoder().encodeToString(bucketName.getBytes(StandardCharsets.UTF_8)).replace("=", "");
String targetEncoded = Base64.getUrlEncoder().encodeToString(targetAudio.getBytes(StandardCharsets.UTF_8)).replace("=", "");
String process = String.format("%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0", style, bucketEncoded, targetEncoded);
// Create an AsyncProcessObjectRequest object.
AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, audio1, process);
// Execute the asynchronous processing task.
AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
System.out.println("EventId: " + response.getEventId());
System.out.println("RequestId: " + response.getRequestId());
System.out.println("TaskId: " + response.getTaskId());
} finally {
// Shut down the OSSClient.
ossClient.shutdown();
}
}
}
Python
OSS SDK for Python 2.18.4 or later is required.
# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
def main():
# Obtain access credentials from environment variables. Before running this sample code, make sure the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set endpoint to the endpoint of the region where the bucket is located. For example, for China (Hangzhou), set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
# Specify the Alibaba Cloud region ID, for example, cn-hangzhou.
region = 'cn-hangzhou'
# Specify the bucket name, for example, examplebucket.
bucket = oss2.Bucket(auth, endpoint, 'examplebucket', region=region)
# Specify the name of the output audio file.
target_audio = 'dest.aac'
# Specify the names of the audio files to merge.
audio1 = 'src1.mp3'
audio2 = 'src2.mp3'
# Build the style string for audio processing and the audio merging parameters.
audio1_encoded = base64.urlsafe_b64encode(audio1.encode()).decode().rstrip('=')
audio2_encoded = base64.urlsafe_b64encode(audio2.encode()).decode().rstrip('=')
style = f"audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_{audio1_encoded}/pre,o_{audio2_encoded},t_0"
# Build the asynchronous processing instruction.
bucket_encoded = base64.urlsafe_b64encode(bucket.bucket_name.encode()).decode().rstrip('=')
target_encoded = base64.urlsafe_b64encode(target_audio.encode()).decode().rstrip('=')
process = f"{style}|sys/saveas,b_{bucket_encoded},o_{target_encoded}/notify,topic_QXVkaW9Db252ZXJ0"
print(process)
# Execute the asynchronous processing task.
try:
result = bucket.async_process_object(audio1, process)
print(f"EventId: {result.event_id}")
print(f"RequestId: {result.request_id}")
print(f"TaskId: {result.task_id}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
Go
OSS SDK for Go 3.0.2 or later is required.
package main
import (
"encoding/base64"
"fmt"
"log"
"os"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
)
func main() {
// Obtain access credentials from environment variables. Before running this sample code, make sure the OSS_ACCESS_KEY_ID, OSS_ACCESS_KEY_SECRET, and OSS_SESSION_TOKEN environment variables are set.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Create an OSSClient instance.
// Set endpoint to the endpoint of the region where your bucket is located. For example, for China (Hangzhou), set it to https://oss-cn-hangzhou.aliyuncs.com.
// Set region to the Alibaba Cloud region ID, for example, cn-hangzhou.
client, err := oss.New("https://oss-cn-hangzhou.aliyuncs.com", "", "", oss.SetCredentialsProvider(&provider), oss.AuthVersion(oss.AuthV4), oss.Region("cn-hangzhou"))
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the bucket name, for example, examplebucket.
bucketName := "examplebucket"
bucket, err := client.Bucket(bucketName)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the names of the audio files to be merged.
audio1 := "src1.mp3"
audio2 := "src2.mp3"
// Specify the name of the output audio file.
targetAudio := "dest.aac"
// Build the style string for audio processing and the audio merging parameters.
audio1Encoded := base64.URLEncoding.EncodeToString([]byte(audio1))
audio2Encoded := base64.URLEncoding.EncodeToString([]byte(audio2))
style := fmt.Sprintf("audio/concat,f_aac,ac_1,ar_44100,ab_96000,align_2/pre,o_%s/pre,o_%s,t_0", audio1Encoded, audio2Encoded)
// Build the asynchronous processing instruction.
bucketEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
targetEncoded := base64.URLEncoding.EncodeToString([]byte(targetAudio))
process := fmt.Sprintf("%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0", style, bucketEncoded, targetEncoded)
// Execute the asynchronous processing task.
result, err := bucket.AsyncProcessObject(audio1, process)
if err != nil {
log.Fatalf("Failed to async process object: %s", err)
}
fmt.Printf("EventId: %s\n", result.EventId)
fmt.Printf("RequestId: %s\n", result.RequestId)
fmt.Printf("TaskId: %s\n", result.TaskId)
}