Wan - Video editing API (2.1) - Alibaba Cloud Model Studio

The Wan 2.1 unified video editing model supports multiple input modalities, including text, images, and videos, for a wide range of video generation and editing tasks.

Related Documentation: user guide

Scope

To ensure successful calls, the model, endpoint URL, and API key must be in the same region. Cross-region calls will fail.

Select a model: Confirm the model's region.
Select a URL: Select the endpoint URL for the model's region. HTTP URLs are supported.
Configure an API key: Select a region, get an API key, and configure it in your environment variables.

Note

The sample code in this topic is for the Singapore region.

Important

Model Studio has released a workspace-specific domain for the Singapore region: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com. The new dedicated domain delivers superior performance and higher stability for inference requests. We recommend migrating from https://dashscope-intl.aliyuncs.com to the new domain.

{WorkspaceId} is your workspace ID, which can be found on the Workspace Details page in the Model Studio console. The existing domain remains fully functional.

HTTP call

The unified video editing model takes 5-10 minutes to process, so the API uses an asynchronous process with two core steps: "create task -> poll result".

Step 1: Create a task

Singapore

POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Replace WorkspaceId with your actual Workspace ID.

Beijing

POST https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Request parameters

Multi-image reference

API keys for the Singapore and China (Beijing) regions are different. Obtain an API key

The following URL is for the Singapore region. For the China (Beijing) region, use this URL instead: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

curl --location 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "image_reference",
        "prompt": "In the video, a girl gracefully emerges from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When she stops to look at the lush woods around her, a smile of surprise and joy blossoms on her face. This scene, frozen in an interplay of light and shadow, records her wonderful encounter with nature.",
        "ref_images_url": [
            "http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
            "http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
        ]
    },
    "parameters": {
        "prompt_extend": true,
        "obj_or_bg": ["obj","bg"],
        "size": "1280*720"
    }
}'

Video repainting

API keys for the Singapore and China (Beijing) regions are different. Obtain an API key

The following URL is for the Singapore region. For the China (Beijing) region, use this URL instead: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

curl --location 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_repainting",
        "prompt": "The video shows a black steampunk-style car driven by a gentleman, adorned with gears and copper pipes. The background is a steam-powered candy factory with retro elements, creating a vintage and fun scene.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
    },
    "parameters": {
        "prompt_extend": false,
        "control_condition": "depth"
    }
}'

Local editing

API keys for the Singapore and China (Beijing) regions are different. Obtain an API key

The following URL is for the Singapore region. For the China (Beijing) region, use this URL instead: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

curl --location 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_edit",
        "prompt": "The video shows a Parisian-style French cafe where a lion in a suit elegantly sips coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the lion's area.",
        "mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
        "mask_frame_id": 1
    },
    "parameters": {
        "prompt_extend": false,
        "mask_type": "tracking",
        "expand_ratio": 0.05
    }
}'

Video extension

API keys for the Singapore and China (Beijing) regions are different. Obtain an API key

The following URL is for the Singapore region. For the China (Beijing) region, use this URL instead: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

curl --location 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_extension",
        "prompt": "A dog wearing sunglasses skateboarding on the street, 3D cartoon.",
        "first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
    },
    "parameters": {
        "prompt_extend": false
    }
}'

Video outpainting

API keys for the Singapore and China (Beijing) regions are different. Obtain an API key

The following URL is for the Singapore region. For the China (Beijing) region, use this URL instead: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

curl --location 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_outpainting",
        "prompt": "An elegant woman passionately plays the violin, with a full symphony orchestra behind her.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
    },
    "parameters": {
        "prompt_extend": false,
        "top_scale": 1.5,
        "bottom_scale": 1.5,
        "left_scale": 1.5,
        "right_scale": 1.5
    }
}'

Request headers

Content-Type string (Required)

The content type of the request. Must be application/json.

Authorization string (Required)

Authenticates the request with a Model Studio API key. Example: Bearer sk-xxxx.

X-DashScope-Async string (Required)

Enables asynchronous processing. HTTP requests support only asynchronous calls. Must be enable.

Important

If this request header is missing, the error "current user api does not support synchronous calls" is returned.

Request body

Multi-image reference

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input, such as the prompt.

Properties

prompt string (Required)

Describes the elements and visual features to include in the generated video.

Supports both Chinese and English. The maximum length is 800 characters, where each Chinese character or letter counts as a single character. Text exceeding this limit is automatically truncated.

For prompt techniques, see Text-to-Video/Image-to-Video Prompt Guide.

function string (Required)

Feature name. The multi-image reference is set to image_reference.

Multi-image reference supports up to 3 reference images. The images can contain entities and backgrounds, such as people, animals, clothing, and scenes. Use a prompt to describe the desired video content, and the model can combine the multiple images to generate coherent video content.

ref_images_url array[string] (Required)

An array of reference image URLs.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.png.

You can provide 1 to 3 reference images. If you provide more than 3, only the first 3 are used.

Image requirements:

Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.
Resolution: The width and height must be within the range of [360, 2000] pixels.
Size: Up to 10 MB.
The URL must not contain Chinese characters.

Recommendations:

When using a reference image for an entity, we recommend that each image contain only one entity. The background should be a solid color (for example, white) to better highlight the entity.
If using a background from a reference image, you can provide at most one background image, which must not contain any entity objects.

parameters object (Optional)

Parameters for video processing, such as watermark settings.

Properties

obj_or_bg array[string] (Optional)

This parameter is used to identify the purpose of each reference image and corresponds one-to-one with the ref_images_url parameter. Each element in the array specifies whether the image at the corresponding position is a 'subject' or a 'background':

obj: Indicates that the image is the reference entity.
bg: Specifies the image as a background reference (a maximum of one is allowed).

Usage notes:

We recommend that you pass this parameter, and its length must be the same as that of ref_images_url or an error is reported.
This parameter can be omitted and defaults to ["obj"] only if ref_images_url is a single-element array.

Example: ["obj", "obj", "bg"].

size string (Optional)

The resolution of the generated video (width*height). The model supports generating 720p videos. Valid values:

1280*720 (Default): The video aspect ratio is 16:9, where 1280 is the width and 720 is the height.
720*1280: The video aspect ratio is 9:16.
960*960: The video aspect ratio is 1:1.
832*1088: The video aspect ratio is 3:4.
1088*832: The video aspect ratio is 4:3.

duration integer (Optional)

The duration of the generated video in seconds. This value is fixed at 5.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, a large language model (LLM) rewrites the input prompt. This can significantly improve results for short prompts but increases processing time.

true (default): Enables prompt rewriting.
false: Disable prompt rewriting.

seed integer (Optional)

The random number seed controls the randomness of the content generated by the model. The value range for the seed parameter is [0, 2147483647].

If you do not specify a seed, one is generated automatically. For reproducible results, use the same seed value across multiple requests.

watermark bool (Optional)

Specifies whether to add an 'AI-generated' watermark to the bottom-right corner of the image.

false (default): Does not add a watermark.
true: Adds a watermark.

Video repainting

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input, such as the prompt.

Properties

prompt string (Required)

Describes the elements and visual features to include in the generated video.

Supports both Chinese and English. The maximum length is 800 characters, where each Chinese character or letter counts as a single character. Text exceeding this limit is automatically truncated.

For prompt techniques, see Text-to-Video/Image-to-Video Prompt Guide.

function string (Required)

Feature name. Video repainting is set to video_repainting.

The video repainting feature extracts an entity's pose and actions, composition, motion contours, and line art structure from an input video. It then combines these with a text prompt to generate a new video with the same dynamic characteristics. This feature also supports replacing the entity in the original video by using a reference image, for example, to change a character's appearance while retaining the original actions.

video_url string (Required)

The URL of the input video.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.mp4.

Video requirements:

Format: MP4.
Frame rate: 16 FPS or higher.
Size: Up to 50 MB.
Duration: Up to 5 seconds. Longer videos are truncated to the first 5 seconds.
The URL must not contain Chinese characters.

Output video resolution:

If the input video resolution is 720p or lower, the output resolution is the same as the input.
If the input video resolution is higher than 720p, it is downscaled to fit within a 720p resolution while preserving the original aspect ratio.

Output video duration:

The output video duration matches the input video, up to a maximum of 5 seconds.
Example: If the input video is 3 seconds long, the output is also 3 seconds. If the input is 6 seconds, the output is the first 5 seconds of the input.

ref_images_url array[string] (Optional)

An array of reference image URLs.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.png.

Only 1 reference image is supported. We recommend that this image be an entity image for replacing the entity content in the input video.

Image requirements:

Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.
Resolution: The width and height must be within the range of [360, 2000] pixels.
Size: Up to 10 MB.
The URL must not contain Chinese characters.

Recommendations:

When using a reference image for an entity, we recommend that the image contain only one entity. The background should be a solid color (for example, white) to better highlight the entity.

parameters object (Required)

Parameters for video processing, such as watermark settings.

Properties

control_condition string (Required)

The method for video feature extraction.

posebodyface: Extracts the facial expressions and body movements of the entity from the input video. This is suitable for scenarios where you need to preserve the details of the entity's facial expressions.
posebody: Extracts the body movements of the entity from the input video, excluding facial expressions. This is suitable for scenarios where you need to control only the body movements of the entity.
depth: Extracts the composition and motion contours of the input video.
scribble: Extracts the line art structure from the input video.

strength float (Optional)

Adjusts the control strength of the video feature extraction method specified by control_condition on the generated video.

The value must be in the range [0.0, 1.0]. The default value is 1.0.

A larger value makes the generated video adhere more closely to the original video's actions and composition. A smaller value allows for more creative freedom.

prompt_extend bool (Optional)

true (default): Enables prompt rewriting.
false: Disables prompt rewriting. (Recommended)

If the text description is inconsistent with the video content, the model may misinterpret the input. We recommend that you manually disable intelligent expansion and provide a clear, specific scene description in the prompt to improve consistency and accuracy.

seed integer (Optional)

The random number seed controls the randomness of the content generated by the model. The value range for the seed parameter is [0, 2147483647].

If you do not specify a seed, one is generated automatically. For reproducible results, use the same seed value across multiple requests.

watermark bool (Optional)

Specifies whether to add an 'AI-generated' watermark to the bottom-right corner of the image.

false (default): Does not add a watermark.
true: Adds a watermark.

Local editing

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input, such as the prompt.

Properties

prompt string (Required)

Describes the elements and visual features to include in the generated video.

Supports both Chinese and English. The maximum length is 800 characters, where each Chinese character or letter counts as a single character. Text exceeding this limit is automatically truncated.

For prompt techniques, see Text-to-Video/Image-to-Video Prompt Guide.

function string (Required)

Feature name: Local editing is set to video_edit.

The local editing feature allows you to add, modify, or delete elements in a specified area of an input video. You can also replace the entity or background in the editing area for fine-grained video editing.

video_url string (Required)

The URL of the input video.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.mp4.

Video requirements:

Format: MP4.
Frame rate: 16 FPS or higher.
Size: Up to 50 MB.
Duration: Up to 5 seconds. Longer videos are truncated to the first 5 seconds.
The URL must not contain Chinese characters.

Output video resolution:

If the input video resolution is 720p or lower, the output resolution is the same as the input.
If the input video resolution is higher than 720p, it is downscaled to fit within a 720p resolution while preserving the original aspect ratio.

Output video duration:

The output video duration matches the input video, up to a maximum of 5 seconds.
Example: If the input video is 3 seconds long, the output is also 3 seconds. If the input is 6 seconds, the output is the first 5 seconds of the input.

ref_images_url array[string] (Optional)

An array of reference image URLs.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.png.

Currently, only 1 reference image is supported. This image can be used as an entity or background to replace the corresponding content in the input video.

Image requirements:

Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.
Resolution: The width and height must be within the range of [360, 2000] pixels.
Size: Up to 10 MB.
The URL must not contain Chinese characters.

Recommendations:

When using a reference image for an entity, we recommend that the image contain only one entity. The background should be a solid color (for example, white) to better highlight the entity.
If using a background from a reference image, the background image must not contain any entity objects.

mask_image_url string (Optional)

The URL of the mask image.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.png.

This parameter specifies the editing area of the video. You can specify either this parameter or the mask_video_url parameter. We recommend that you prioritize this parameter.

In the mask image, white areas (pixel value [255, 255, 255]) define the region to be edited, while black areas (pixel value [0, 0, 0]) define the region to be preserved.

Image requirements:

Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.
Image resolution: Must be the same as the resolution of the input video (video_url).
Size: Up to 10 MB.
The URL must not contain Chinese characters.

mask_frame_id integer (Optional)

This parameter takes effect when mask_image_url is not empty. It specifies the ID of the frame in the video where the mask target appears.

The default value is 1, which indicates the first frame of the video.

The value range is [1, max_frame_id], where max_frame_id = input video frame rate * input video duration + 1.

For example, if an input video (video_url) has a frame rate of 16 FPS (frames per second) and a duration of 5 seconds, the total number of frames is 16 × 5 + 1 = 81. Therefore, max_frame_id = 81.

mask_video_url string (Optional)

The URL of the mask video.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.mp4.

This parameter is used to specify the editing area of the video. You must specify either this parameter or the mask_image_url parameter.

The video format, frame rate, resolution, and length of the mask video must be identical to those of the input video (video_url).

In the mask video, white areas (pixel value [255, 255, 255]) define the region to be edited, while black areas (pixel value [0, 0, 0]) define the region to be preserved.

parameters object (Optional)

Parameters for video processing, such as watermark settings.

Properties

control_condition string (Optional)

The method for video feature extraction. The default value is "", which means no features are extracted.

posebodyface: Extracts the facial expressions and body movements of the entity from the input video. This is suitable for scenarios where the entity's face occupies a large portion of the frame and its features are clearly visible.
depth: Extracts the composition and motion contours from the input video.

mask_type string (Optional)

When mask_image_url is not empty, this parameter takes effect to specify the behavior of the editing area.

tracking (Default): The editing area dynamically follows the motion trajectory of the target object. This is suitable for scenarios where the subject is moving.
fixed: The editing area remains fixed and does not change with the screen content.

expand_ratio float (Optional)

When mask_type is tracking, this parameter takes effect and specifies the ratio to expand the mask area outward.

The value must be in the range [0.0, 1.0]. The default value is 0.05, which is recommended.

A smaller value makes the mask area fit the target object more closely, while a larger value expands the mask area more widely.

expand_mode string (Optional)

When mask_type is tracking, this parameter takes effect and specifies the shape of the mask area.

The algorithm generates a mask video of a corresponding shape from the input mask image, based on the selected expand_mode. The supported values are as follows:

hull (Default): The polygon mode. This mode uses a polygon to enclose the masked object.
bbox: Bounding box mode. This mode uses a rectangle to enclose the masked object.
original: The original mode, which attempts to preserve the shape of the original mask target.

size string (Optional)

The resolution of the generated video (width*height). The model supports generating 720p videos. Valid values:

1280*720 (Default): The video aspect ratio is 16:9, where 1280 is the width and 720 is the height.
720*1280: The video aspect ratio is 9:16.
960*960: The video aspect ratio is 1:1.
832*1088: The video aspect ratio is 3:4.
1088*832: The video aspect ratio is 4:3.

duration integer (Optional)

The duration of the generated video in seconds. This value is fixed at 5.

prompt_extend bool (Optional)

true (default): Enables prompt rewriting.
false: Disables prompt rewriting. (Recommended)

If the text description is inconsistent with the video content, the model may misinterpret the input. We recommend that you manually disable intelligent expansion and provide a clear, specific scene description in the prompt to improve consistency and accuracy.

seed integer (Optional)

The random number seed controls the randomness of the content generated by the model. The value range for the seed parameter is [0, 2147483647].

If you do not specify a seed, one is generated automatically. For reproducible results, use the same seed value across multiple requests.

watermark bool (Optional)

Specifies whether to add an 'AI-generated' watermark to the bottom-right corner of the image.

false (default): Does not add a watermark.
true: Adds a watermark.

Video extension

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input, such as the prompt.

Properties

prompt string (Required)

Describes the elements and visual features to include in the generated video.

Supports both Chinese and English. The maximum length is 800 characters, where each Chinese character or letter counts as a single character. Text exceeding this limit is automatically truncated.

For prompt techniques, see Text-to-Video/Image-to-Video Prompt Guide.

function string (Required)

Function name. Video extension is set to video_extension.

The video extension feature generates continuous content from an image or video. It can also extract dynamic features, such as actions and composition, from a reference video to guide the generation of a video with similar motion.

The total duration of the generated video is 5 seconds. This is the final output duration, not a 5-second extension added to the original content.

first_frame_url string (Optional)

The URL of the first frame image.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.png.

Image requirements:

Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.
Resolution: The width and height must be within the range of [360, 2000] pixels.
Size: Up to 10 MB.
The URL must not contain Chinese characters.

last_frame_url string(Optional)

The URL of the last frame image.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.png.

Image requirements:

Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.
Resolution: The width and height must be within the range of [360, 2000] pixels.
Size: Up to 10 MB.
The URL must not contain Chinese characters.

first_clip_url string (Optional)

The URL of the first video clip.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.mp4.

Video requirements:

Format: MP4.
Video frame rate: Greater than or equal to 16 FPS. When first_clip_url and last_clip_url are used together, we recommend that the two clips have the same frame rate.
Size: Up to 50 MB.
Video length: The video cannot be longer than 3 seconds. Otherwise, the first 3 seconds of the video will be used. If you specify both first_clip_url and last_clip_url, the total duration of the two video clips cannot exceed 3 seconds.
The URL must not contain Chinese characters.

Output video resolution:

If the input video resolution is 720p or lower, the output resolution is the same as the input.
If the input video resolution is higher than 720p, it is downscaled to fit within a 720p resolution while preserving the original aspect ratio.

last_clip_url string(Optional)

The URL of the last video clip.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://help-static-aliyun-doc.aliyuncs.com/xxx.mp4.

Video requirements:

Format: MP4.
Video frame rate: 16 FPS or higher. When first_clip_url and last_clip_url are used together, we recommend that the two clips have the same frame rate.
Size: Up to 50 MB.
Video duration: The duration cannot exceed 3 seconds. If a video is longer, only the first 3 seconds are used. If you specify both first_clip_url and last_clip_url, their combined duration cannot exceed 3 seconds.
The URL must not contain Chinese characters.

Output video resolution:

If the input video resolution is 720p or lower, the output resolution is the same as the input.
If the input video resolution is higher than 720p, it is downscaled to fit within a 720p resolution while preserving the original aspect ratio.

video_url string (Optional)

The URL of the input video.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://help-static-aliyun-doc.aliyuncs.com/xxx.mp4.

This video is primarily used to extract motion features that work in conjunction with the first_frame_url, last_frame_url, first_clip_url, and last_clip_url parameters to guide the generation of an extended video with similar motion performance.

Video requirements:

Format: MP4.
Frame rate: 16 FPS or higher, consistent with the preceding and succeeding clips.
Resolution: Consistent with the preceding and succeeding frames and clips.
Size: Up to 50 MB.
Duration: Up to 5 seconds. Longer videos are truncated to the first 5 seconds.
The URL must not contain Chinese characters.

parameters object (Optional)

Parameters for video processing, such as setting the output video resolution.

Properties

control_condition string (Optional)

The method for video feature extraction. This parameter is required when video_url is specified. The default value is "", which means no features are extracted.

posebodyface: Extracts the facial expressions and body movements of the entity in the input video.
depth: Extracts the composition and motion contours of the input video.

duration integer (Optional)

The duration of the generated video in seconds. This value is fixed at 5.

prompt_extend bool (Optional)

true (default): Enables prompt rewriting.
false: Disables prompt rewriting. (Recommended)

If the text description is inconsistent with the video content, the model may misinterpret the input. We recommend that you manually disable intelligent expansion and provide a clear, specific scene description in the prompt to improve consistency and accuracy.

seed integer (Optional)

The random number seed controls the randomness of the content generated by the model. The value range for the seed parameter is [0, 2147483647].

If you do not specify a seed, one is generated automatically. For reproducible results, use the same seed value across multiple requests.

watermark bool (Optional)

Specifies whether to add an 'AI-generated' watermark to the bottom-right corner of the image.

false (default): Does not add a watermark.
true: Adds a watermark.

Video outpainting

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input, such as the prompt.

Properties

prompt string (Required)

Describes the elements and visual features to include in the generated video.

Supports both Chinese and English. The maximum length is 800 characters, where each Chinese character or letter counts as a single character. Text exceeding this limit is automatically truncated.

For prompt techniques, see Text-to-Video/Image-to-Video Prompt Guide.

function string (Required)

Feature name. The value for video outpainting is video_outpainting.

The video outpainting feature extends the video frame proportionally in the up, down, left, and right directions.

video_url string (Required)

The URL of the input video.

Public URL:
- Supports the HTTP and HTTPS protocols.
- Example: https://xxx/xxx.mp4.

Video requirements:

Format: MP4.
Frame rate: 16 FPS or higher.
Size: Up to 50 MB.
Duration: Up to 5 seconds. Longer videos are truncated to the first 5 seconds.
The URL must not contain Chinese characters.

Output video resolution:

If the input video resolution is 720p or lower, the output resolution is the same as the input.
If the input video resolution is higher than 720p, it is downscaled to fit within a 720p resolution while preserving the original aspect ratio.

Output video duration:

The output video duration matches the input video, up to a maximum of 5 seconds.
Example: If the input video is 3 seconds long, the output is also 3 seconds. If the input is 6 seconds, the output is the first 5 seconds of the input.

parameters object (Optional)

Parameters for video processing, such as setting extension ratios.

Properties

top_scale float (Optional)

Centers the video frame and extends it upward by the specified ratio.

The value must be in the range [1.0, 2.0]. The default value is 1.0, which indicates no extension.

bottom_scale float (Optional)

Centers the video frame and extends it downward by the specified ratio.

The value must be in the range [1.0, 2.0]. The default value is 1.0, which indicates no extension.

left_scale float (Optional)

Centers the video frame and extends it to the left by the specified ratio.

The value must be in the range [1.0, 2.0]. The default value is 1.0, which indicates no extension.

right_scale float (Optional)

Centers the video frame and extends it to the right by the specified ratio.

The value must be in the range [1.0, 2.0]. The default value is 1.0, which indicates no extension.

duration integer (Optional)

The duration of the generated video in seconds. This value is fixed at 5.

prompt_extend bool (Optional)

true (default): Enables prompt rewriting.
false: Disables prompt rewriting. (Recommended)

If the text description is inconsistent with the video content, the model may misinterpret the input. We recommend that you manually disable intelligent expansion and provide a clear, specific scene description in the prompt to improve consistency and accuracy.

seed integer (Optional)

The random number seed controls the randomness of the content generated by the model. The value range for the seed parameter is [0, 2147483647].

If you do not specify a seed, one is generated automatically. For reproducible results, use the same seed value across multiple requests.

watermark bool (Optional)

Specifies whether to add an 'AI-generated' watermark to the bottom-right corner of the image.

false (default): Does not add a watermark.
true: Adds a watermark.

Response parameters	Successful response Save the `task_id` to query the task status and result. `{ "output": { "task_status": "PENDING", "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx" }, "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx" }` Error response Task creation failed. See Error codes. `{ "code": "InvalidApiKey", "message": "No API-key provided.", "request_id": "7438d53d-6eb8-4596-8835-xxxxxx" }`
output `object` The output of the asynchronous task. Properties task_id `string` The task ID. Valid for queries for 24 hours. task_status `string` The status of the task. Enumeration values PENDING RUNNING SUCCEEDED FAILED CANCELED UNKNOWN: The task does not exist or its status is unknown.
request_id `string` Unique request identifier for tracing and troubleshooting.
code `string` Error code. Returned only for failed requests. See Error codes.
message `string` Detailed error message. Returned only for failed requests. See Error codes.

Step 2: Query result by task ID

Singapore

GET https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/tasks/{task_id}

Replace WorkspaceId with your actual Workspace ID.

China (Beijing)

GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Request parameters	Query task result Replace `{task_id}` with the `task_id` value returned by the previous API call. The `task_id` is valid for queries for 24 hours. `curl -X GET https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/tasks/{task_id} \ --header "Authorization: Bearer $DASHSCOPE_API_KEY"`
Request headers
Authorization `string` (Required) Authenticates the request with a Model Studio API key. Example: Bearer sk-xxxx.
URL path parameters
task_id `string` (Required) The ID of the task.

Response parameters	Task succeeded Task data, including the task status and video URL, is available for 24 hours and is then automatically deleted. Save the generated video promptly. { "request_id": "851985d0-fbba-9d8d-a17a-xxxxxx", "output": { "task_id": "208e2fd1-fcb4-4adf-9fcc-xxxxxx", "task_status": "SUCCEEDED", "submit_time": "2025-05-15 16:14:44.723", "scheduled_time": "2025-05-15 16:14:44.750", "end_time": "2025-05-15 16:20:09.389", "video_url": "https://dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com/xxx.mp4?xxxxxx", "orig_prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.", "actual_prompt": "A girl in a light-colored long dress slowly walks out from a misty, ancient forest, her steps as light as a dance. She has slightly curly long hair, a delicate face, and bright eyes. The camera follows her movements, capturing every nimble moment. When she stops, turns, and looks around at the lush woods, a smile of surprise and joy blossoms on her face. Sunlight filters through the leaves, casting mottled shadows and freezing this beautiful moment of harmony between human and nature. The style is a fresh and natural portrait, combining medium and full shots with a level perspective and slight camera movement." }, "usage": { "video_duration": 5, "video_ratio": "standard", "video_count": 1 } } Task failed When a task fails, `task_status` is FAILED with an error code and message. See Error codes. `{ "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d", "output": { "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010", "task_status": "FAILED", "code": "InvalidParameter", "message": "The size is not match xxxxxx" } }`
output `object` Information about the task output. Properties task_id `string` The task ID. Valid for queries for 24 hours. task_status `string` The status of the task. Enumeration values PENDING RUNNING SUCCEEDED FAILED CANCELED UNKNOWN: The task does not exist or its status is unknown. submit_time `string` The time when the task was submitted. The time is in UTC+8 and the format is YYYY-MM-DD HH:mm:ss.SSS. scheduled_time `string` The time when the task was executed. The time is in UTC+8 and the format is YYYY-MM-DD HH:mm:ss.SSS. end_time `string` The time when the task was completed. The time is in UTC+8 and the format is YYYY-MM-DD HH:mm:ss.SSS. video_url `string` The URL of the generated MP4 (H.264) video. This link is valid for 24 hours. orig_prompt `string` The original input prompt. actual_prompt `string` The prompt used for generation after prompt rewriting. This field is returned only if prompt rewriting is enabled. code `string` Error code. Returned only for failed requests. See Error codes. message `string` Detailed error message. Returned only for failed requests. See Error codes.
usage `object` Statistics for the task output. This is provided only for successful tasks. Properties video_duration `integer` The duration of the generated video in seconds. video_ratio `string` The aspect ratio of the generated video. The value is always `standard`. video_count `integer` The number of generated videos.
request_id `string` Unique request identifier for tracing and troubleshooting.

Limitations

Data retention period: The task ID task_id and video URL video_url are retained for only 24 hours. After they expire, you can no longer query or download them.
Audio support: This feature currently generates only silent videos. To generate audio, use Speech Synthesis.

Error codes

If a model call fails with an error message, see Error codes for troubleshooting.

FAQ

Q: How to whitelist video storage domains?

A: Videos generated by models are stored in OSS. The API returns a temporary public URL. To configure a firewall whitelist for this download URL, note the following: The underlying storage may change dynamically. This topic does not provide a fixed OSS domain name whitelist to prevent access issues caused by outdated information. If you have security control requirements, contact your account manager to obtain the latest OSS domain name list.

Scope

HTTP call

Step 1: Create a task

Singapore

Beijing

Request parameters

Multi-image reference

Video repainting

Local editing

Video extension

Video outpainting

Request headers

Request body

Multi-image reference

Video repainting

Local editing

Video extension

Video outpainting

Response parameters

Successful response

Error response

Step 2: Query result by task ID

Singapore

China (Beijing)

Request parameters

Query task result

Request headers

URL path parameters

Response parameters

Task succeeded

Task failed

Limitations

Error codes

FAQ

Q: How to whitelist video storage domains?