Combine multiple videos stored in an OSS bucket into a single output file with transcoding control. The operation is submitted as an asynchronous task via the OSS SDK or REST API.

Use cases
Film and video production: Stitch shots and scenes from different clips into a complete narrative sequence.
Short-form content: Combine clips into vlogs, tutorials, or themed videos for social media platforms.
Education and training: Merge theory and demonstration clips into a single instructional video.
Sports highlights: Assemble highlight reels from multiple footage segments.
How it works
The video/concat action concatenates videos in the order that /pre and /sur segments appear in the processing string, then transcodes the result to the specified format.
All examples in this topic use the x-oss-async-process header to submit the task as an asynchronous job.
Prerequisites
Before you begin, ensure that you have:
Activated Intelligent Media Management (IMM). For details, see Activate a product.
Attached an IMM project to the bucket. To attach via the OSS console, see Step 1: Attach an IMM project. To attach via API, see AttachOSSBucket.
Limitations
Review these constraints before submitting a task:
Asynchronous only: Video concatenation supports only asynchronous processing (
x-oss-async-process). Synchronous requests are not supported.Maximum input videos: A single task can concatenate up to 11 videos.
Anonymous access: Requests from anonymous users are denied.
Container compatibility: When using default audio sample rates or audio channel counts, concatenation may fail if those defaults are incompatible with the target container. Specify explicit values for
arandacto avoid this.Codec and container restrictions: See Transcoding parameters for codec-to-container compatibility rules.
Concatenate videos
Use Java SDK 3.17.4 or later, Python SDK 2.18.4 or later, or Go SDK 3.0.2 or later.
Java
import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;
import java.util.Base64;
public class Demo {
public static void main(String[] args) throws ClientException {
// Replace with the endpoint for the region where your bucket is located.
String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// Specify the region ID, such as cn-hangzhou.
String region = "cn-hangzhou";
// Read credentials from environment variables OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the bucket name.
String bucketName = "examplebucket";
// Specify the output video name.
String targetObject = "dest.mp4";
// Specify the source video (the main file used as the transcoding base).
String sourceVideo = "src.mp4";
// Specify the videos to prepend and append.
String video1 = "concat1.mp4";
String video2 = "concat2.mp4";
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Base64 URL-safe encode the video object names (required by the /pre and /sur parameters).
String video1Encoded = Base64.getUrlEncoder().withoutPadding().encodeToString(video1.getBytes());
String video2Encoded = Base64.getUrlEncoder().withoutPadding().encodeToString(video2.getBytes());
// Build the processing style string:
// - /pre,o_<encoded>: prepend video1 to the beginning
// - /sur,o_<encoded>,t_0: append video2 to the end (t_0 means use the full clip)
// - Other parameters control the output encoding.
String style = String.format(
"video/concat,ss_0,f_mp4,vcodec_h264,fps_25,vb_1000000,acodec_aac,ab_96000,ar_48000,ac_2,align_1" +
"/pre,o_%s/sur,o_%s,t_0",
video1Encoded, video2Encoded
);
// Append save-as and notification instructions.
String bucketEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(bucketName.getBytes());
String targetEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(targetObject.getBytes());
String process = String.format(
"%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0",
style, bucketEncoded, targetEncoded
);
// Submit the asynchronous task.
AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, sourceVideo, process);
AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
System.out.println("EventId: " + response.getEventId());
System.out.println("RequestId: " + response.getRequestId());
System.out.println("TaskId: " + response.getTaskId());
} finally {
ossClient.shutdown();
}
}
}Python
# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
def main():
# Read credentials from environment variables OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Replace with the endpoint for the region where your bucket is located.
endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
region = 'cn-hangzhou'
bucket = oss2.Bucket(auth, endpoint, 'examplebucket', region=region)
# Specify the output video name.
target_object = 'out.mp4'
# Specify the source video (the main file used as the transcoding base).
source_video = 'emrfinal.mp4'
# Specify the videos to prepend and append.
video1 = 'osshdfs.mp4'
video2 = 'product.mp4'
# Base64 URL-safe encode the video object names (required by the /pre and /sur parameters).
video1_encoded = base64.urlsafe_b64encode(video1.encode()).decode().rstrip('=')
video2_encoded = base64.urlsafe_b64encode(video2.encode()).decode().rstrip('=')
# Build the processing style string:
# - /pre,o_<encoded>: prepend video1 to the beginning
# - /sur,o_<encoded>,t_0: append video2 to the end (t_0 means use the full clip)
# - Other parameters control the output encoding.
style = (
f"video/concat,ss_0,f_mp4,vcodec_h264,fps_25,vb_1000000,acodec_aac,ab_96000,ar_48000,ac_2,align_1"
f"/pre,o_{video1_encoded}/sur,o_{video2_encoded},t_0"
)
# Append save-as and notification instructions.
bucket_encoded = base64.urlsafe_b64encode('examplebucket'.encode()).decode().rstrip('=')
target_encoded = base64.urlsafe_b64encode(target_object.encode()).decode().rstrip('=')
process = f"{style}|sys/saveas,b_{bucket_encoded},o_{target_encoded}/notify,topic_QXVkaW9Db252ZXJ0"
# Submit the asynchronous task.
try:
result = bucket.async_process_object(source_video, process)
print(f"EventId: {result.event_id}")
print(f"RequestId: {result.request_id}")
print(f"TaskId: {result.task_id}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()Go
package main
import (
"encoding/base64"
"fmt"
"os"
"strings"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
)
func main() {
// Read credentials from environment variables OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Replace yourEndpoint with the endpoint for the region where your bucket is located,
// and yourRegion with the corresponding region ID (e.g., cn-hangzhou).
client, err := oss.New(
"yourEndpoint", "", "",
oss.SetCredentialsProvider(&provider),
oss.AuthVersion(oss.AuthV4),
oss.Region("yourRegion"),
)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
bucketName := "examplebucket"
bucket, err := client.Bucket(bucketName)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the output video name.
targetObject := "dest.mp4"
// Specify the source video (the main file used as the transcoding base).
sourcevideo := "src.mp4"
// Specify the videos to prepend and append.
video1 := "concat1.mp4"
video2 := "concat2.mp4"
// Base64 URL-safe encode the video object names (required by the /pre and /sur parameters).
encode := func(s string) string {
return strings.TrimRight(base64.URLEncoding.EncodeToString([]byte(s)), "=")
}
// Build the processing style string:
// - /pre,o_<encoded>: prepend video1 to the beginning
// - /sur,o_<encoded>,t_0: append video2 to the end (t_0 means use the full clip)
// - Other parameters control the output encoding.
style := fmt.Sprintf(
"video/concat,ss_0,f_mp4,vcodec_h264,fps_25,vb_1000000,acodec_aac,ab_96000,ar_48000,ac_2,align_1/pre,o_%s/sur,o_%s,t_0",
encode(video1), encode(video2),
)
// Append save-as and notification instructions.
process := fmt.Sprintf(
"%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0",
style, encode(bucketName), encode(targetObject),
)
// Submit the asynchronous task.
rs, err := bucket.AsyncProcessObject(sourcevideo, process)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
fmt.Printf("EventId: %s\n", rs.EventId)
fmt.Printf("RequestId: %s\n", rs.RequestId)
fmt.Printf("TaskId: %s\n", rs.TaskId)
}The response contains a TaskId. The API does not return the processing result directly. To receive a completion notification, configure Simple Message Queue (SMQ), formerly known as MNS. For details, see Message notifications.
For details on thesys/saveasandnotifyparameters, see Save as and Message notifications.
Parameter reference
Clip parameters
Clip parameters are set per /pre or /sur segment. They control which portion of each clip is used before concatenation.
| Parameter | Type | Required | Description |
|---|---|---|---|
o | string | Yes | The OSS object name of the clip. Must be Base64 URL-safe encoded. |
ss | int | No | Start offset within the clip, in milliseconds. 0 (default) starts from the beginning. |
t | int | No | Duration to use from the clip, in milliseconds. 0 (default) uses the clip to the end. |
Transcoding parameters
Transcoding parameters are set at the action level (video/concat,...) and control the output file encoding.
Video
| Parameter | Type | Required | Description |
|---|---|---|---|
ss | int | No | Transcoding start time for the source video, in milliseconds. 0 (default) starts from the beginning. |
t | int | No | Transcoding duration for the source video, in milliseconds. 0 (default) continues to the end. |
f | string | Yes | Output container format. Valid values: mp4, mkv, mov, asf, avi, mxf, ts, flv, webm. |
vcodec | string | Yes | Video codec. Valid values: h264, h265, vp9. mxf and flv do not support h265. |
vn | int | No | Disable the video stream. 0 (default) keeps the video stream. 1 disables it. |
fps | float | No | Output frame rate. Range: 0–240. Defaults to the frame rate of the source video selected by align. |
fpsopt | int | No | Frame rate option when a source clip's frame rate is less than fps. 0: always use target fps. 1: use the lowest source fps. 2: fail the task. Requires fps. |
pixfmt | string | No | Pixel format. Defaults to the format of the source video selected by align. Valid values: yuv420p, yuva420p, yuv420p10le, yuv422p, yuv422p10le, yuv444p, yuv444p10le. |
s | string | No | Output resolution in w x h format. Width and height must be multiples of 2, in the range 64–4096. Example: 1920x1080. |
sopt | int | No | Resolution option when a source clip's resolution is smaller than s. 0: always use target resolution. 1: use the smallest source resolution. 2: fail the task. Requires s. |
scaletype | string | No | Scaling method. stretch (default): stretch to fill. crop: scale and crop. fill: scale and pad with black bars. fit: scale to fit, no padding. |
arotate | int | No | Auto-rotate resolution based on video orientation metadata. 0 (default): off. 1: on. |
g | int | No | Group of Pictures (GOP) size. Default: 150. Range: 1–100000. |
vb | int | No | Video bitrate, in bps. Range: 10000–100000000. Mutually exclusive with crf. |
vbopt | int | No | Video bitrate option when a source clip's bitrate is less than vb. 0: always use target bitrate. 1: use the lowest source bitrate. 2: fail the task. Requires vb. |
crf | float | No | Constant rate factor (CRF). Range: 0–51. Lower values produce higher quality. Recommended range: 18–38. Mutually exclusive with vb. |
maxrate | int | No | Maximum bitrate, in bps. Default: 0. Range: 10000–100000000. Requires crf. |
bufsize | int | No | Buffer size, in bits. Default: 0. Range: 10000–200000000. Requires crf. |
Audio
| Parameter | Type | Required | Description |
|---|---|---|---|
acodec | string | Yes | Audio codec. Valid values: mp3, aac, flac, vorbis, ac3, opus, pcm. See codec compatibility notes below. |
an | int | No | Disable the audio stream. 0 (default) keeps the audio stream. 1 disables it. |
ar | int | No | Audio sample rate, in Hz. Defaults to the sample rate of the source video selected by align. Valid values: 8000, 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000, 64000, 88200, 96000. |
ac | int | No | Number of audio channels. Range: 1–8. Defaults to the channel count of the source video selected by align. |
ab | int | No | Audio bitrate, in bps. Range: 1000–10000000. Mutually exclusive with aq. |
abopt | string | No | Audio bitrate option when a source clip's audio bitrate is less than ab. 0 (default): always use target bitrate. 1: use the lowest source bitrate. 2: fail the task. Requires ab. |
aq | int | No | Audio quality. Range: 0–100. Mutually exclusive with ab. |
adepth | int | No | Audio bit depth. Valid values: 16, 24. Valid only when acodec is flac. |
align | int | No | Index of the video in the concatenation list to use as the default source for transcoding parameters (frame rate, resolution, sample rate, channel count). Default: 0 (the first clip in the list). |
Codec and container compatibility:
| Container | Unsupported codecs |
|---|---|
mp4 | pcm |
mov | flac, opus |
asf | opus |
avi | opus |
mxf | All audio codecs except pcm (audio); h265 (video) |
ts | flac, vorbis, amr, pcm |
flv | flac, vorbis, amr, opus, pcm |
mxf, flv | h265 (video codec) |
Audio codec sample rate and channel constraints:
| Codec | Sample rate constraints | Channel constraints |
|---|---|---|
mp3 | 48000 Hz and below only | Mono or stereo only |
opus | 8000, 12000, 16000, 24000, 48000 Hz | — |
ac3 | 32000, 44100, 48000 Hz | Up to 6 channels (5.1) |
amr | 8000, 16000 Hz only | Mono only |
Segmenting parameters
Use the /segment action to split the concatenated output into HLS (HTTP Live Streaming) or DASH (Dynamic Adaptive Streaming over HTTP) segments. Segmenting supports only mp4 and ts containers.
| Parameter | Type | Required | Description |
|---|---|---|---|
f | string | Yes | Segment format. Valid values: hls, dash. |
t | int | Yes | Segment length, in milliseconds. Range: 0–3600000. |
Example: Concatenate three clips into an MP4 file
This example concatenates three videos — pre.mov (full clip), example.mkv (from the 10-second mark to the end), and sur.mov (first 10 seconds) — and saves the result as oss://outbucket/outobj.mp4.
Output settings: H.264, 25 fps, 1 Mbps video bitrate; AAC audio, 48 kHz, dual-channel, 96 Kbps.
POST /example.mkv?x-oss-async-process HTTP/1.1
Host: video-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: OSS4-HMAC-SHA256 Credential=LTAI********************/20250417/cn-hangzhou/oss/aliyun_v4_request,Signature=a7c3554c729d71929e0b84489addee6b2e8d5cb48595adfc51868c299c0c218e
x-oss-async-process=video/concat,ss_10000,f_mp4,vcodec_h264,fps_25,vb_1000000,acodec_aac,ab_96000,ar_48000,ac_2,align_1/pre,o_cHJlLm1vdgo/sur,o_c3VyMS5hYWMK,t_10000|sys/saveas,b_b3V0YnVja2V0,o_b3V0b2JqLnthdXRvZXh0fQo/notify,topic_QXVkaW9Db252ZXJ0How the style string maps to the example:
| Segment | Parameter | Value | Meaning |
|---|---|---|---|
| Action | ss_10000 | 10000 ms | Start the source video (example.mkv) from the 10-second mark |
| Action | align_1 | 1 | Use example.mkv (index 1 in the list) as the transcoding parameter source |
/pre | o_cHJlLm1vdgo | pre.mov (encoded) | Prepend the full pre.mov |
/sur | o_c3VyMS5hYWMK | sur.mov (encoded) | Append the first 10 seconds of sur.mov |
/sur | t_10000 | 10000 ms | Limit sur.mov to the first 10 seconds |
Permissions
An Alibaba Cloud account has all permissions by default. A Resource Access Management (RAM) user or RAM role requires explicit grants via a RAM policy or bucket policy.
| API | Action | When required |
|---|---|---|
| GetObject | oss:GetObject | Always required to read source videos |
| GetObject | oss:GetObjectVersion | Required when accessing a specific object version |
| GetObject | kms:Decrypt | Required when the object uses KMS server-side encryption |
| HeadObject | oss:GetObject | Required to query object metadata |
| PutObject | oss:PutObject | Required to write the concatenated output |
| PutObject | oss:PutObjectTagging | Required when using x-oss-tagging |
| PutObject | kms:GenerateDataKey, kms:Decrypt | Required when the output uses KMS encryption |
| CreateMediaConvertTask | imm:CreateMediaConvertTask | Required to invoke IMM for transcoding |
Billing
Video concatenation calls both OSS and IMM, generating billable items on both sides.
OSS charges (for pricing, see OSS Pricing):
| API | Billable item | Notes |
|---|---|---|
| GetObject | GET requests | Charged per successful request |
| GetObject | Outbound Internet traffic | Charged when using a public endpoint (e.g., oss-cn-hangzhou.aliyuncs.com) or acceleration endpoint |
| GetObject | Infrequent Access (IA) data retrieval | Charged when retrieving IA objects |
| GetObject | Archive data retrieval | Charged when retrieving Archive objects in a bucket with real-time access enabled |
| GetObject | Transfer acceleration | Charged when transfer acceleration is enabled and an acceleration endpoint is used |
| PutObject | PUT requests | Charged per successful request |
| PutObject | Storage | Charged based on storage class, size, and duration of the output object |
| HeadObject | GET requests | Charged per successful request |
IMM charges (for pricing, see IMM billable items):
| API | Billable item | Calculation |
|---|---|---|
| CreateMediaConvertTask | ApsaraVideo Media Processing | Based on the definition and actual duration (in seconds) of the concatenated output |
What's next
Message notifications — Configure SMQ to receive task completion events
Save as — Learn how
sys/saveasstores the output object