All Products
Search
Document Center

Alibaba Cloud Model Studio:Wan - general video editing API reference

Last Updated:Oct 20, 2025

This topic describes the input and output parameters for the Wan-VACE model. This model supports multiple input modalities, such as text, images, and videos, and can perform various video generation and editing tasks.

Quick links: Try it online on the Wan official website

Note

The features on the official website may differ from the capabilities supported by the API. The API capabilities are detailed in this topic. This topic is updated promptly as new features are released.

Model overview

Model

Introduction

Output video format

wanx2.1-vace-plus

Wan 2.1 Professional Edition

Supports multi-modal input, multi-image reference, and video editing.

Duration: Up to 5 seconds

Frame rate: 30 fps

Format: MP4 (H.264 encoding)

Singapore region

Model

Unit price

Rate limits (shared by Alibaba Cloud accounts and RAM users)

Free quota (View)

RPS limit for task submission API

Number of concurrent tasks

wan2.1-vace-plus

$0.1/second

2

2

50 seconds

China (Beijing) region

Important

The China (Beijing) region does not offer a free quota. All calls in this region incur fees. Please confirm before you proceed.

Model

Unit price

Rate limits (shared by Alibaba Cloud accounts and RAM users)

RPS limit for task submission API

Number of concurrent tasks

wanx2.1-vace-plus

$0.100347/second

2

2

Click to view a billing example

Billing begins after the free quota is exhausted. The billing formula is: Total cost = Unit price × Video duration (seconds).

  • Assume you generate one video using the wan2.1-vace-plus model in the Singapore region.

  • Cost calculation: $0.10/second × 5 seconds = $0.50.

Performance showcase

Feature

Input reference image

Input prompt

Output video

Multi-image reference

Reference image 1 (reference entity)

image

Reference image 2 (reference background)

image

In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records the girl's wonderful encounter with nature.

Output video

Video repainting

The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.

Local editing

Input video

Input mask image (The white area indicates the editing area)

mask

The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.

The content in the editing area is modified based on the prompt.

Video extension

Input first clip (1 second)

A dog wearing sunglasses is skateboarding on the street, 3D cartoon.

Output extended video (5 seconds)

Video outpainting

An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.

Prerequisites

You must obtain an API key and set the API key as an environment variable.

Important

The Beijing and Singapore regions have separate API keys and request endpoints. Do not use them interchangeably. Cross-region calls cause authentication failures or service errors.

HTTP

The video generation model processing takes a relatively long time. To avoid request timeouts, HTTP calls only support asynchronous retrieval of model results. You need to make two requests:

  1. Create task: Send a request to create a task, which will return a task ID.

  2. Query results using the ID: Use the task ID to query the task status and results. If successful, a video URL will be returned, which is valid for 24 hours.

Note

After you create a task, it will be added to a queue. Later, call the query interface to get the task status and results based on the task ID.

The general video editing model requires a long time to process tasks, typically 5 to 10 minutes. The actual processing time depends on the number of tasks in the queue and the service execution status.

Step 1: Create a task and obtain a task ID

POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

If your model is in the China (Beijing) region, you must replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.

Request parameters

Multi-image reference

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key
The following URL is for the Singapore region. If you are using a model in the China (Beijing) region, you must replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "image_reference",
        "prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records the girl's wonderful encounter with nature.",
        "ref_images_url": [
            "http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
            "http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
        ]
    },
    "parameters": {
        "obj_or_bg": ["obj","bg"],
        "size": "1280*720"
    }
}'

Video repainting

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key
The following URL is for the Singapore region. If you are using a model in the China (Beijing) region, you must replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_repainting",
        "prompt": "The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
    },
    "parameters": {
        "control_condition": "depth"
    }
}'

Local editing

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key
The following URL is for the Singapore region. If you are using a model in the China (Beijing) region, you must replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_edit",
        "prompt": "The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.",
        "mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
        "mask_frame_id": 1
    },
    "parameters": {
        "mask_type": "tracking",
        "expand_ratio": 0.05
    }
}'

Video extension

The API keys for the Singapore and Beijing regions are different. For more information, see Preparations: Obtain and configure an API key
The following URL is for the Singapore region. If you are using a model in the China (Beijing) region, you must replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_extension",
        "prompt": "A dog wearing sunglasses is skateboarding on the street, 3D cartoon.",
        "first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
    },
    "parameters": {}
}'

Video outpainting

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key
The following URL is for the Singapore region. If you are using a model in the China (Beijing) region, you must replace the URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_outpainting",
        "prompt": "An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
    },
    "parameters": {
        "top_scale": 1.5,
        "bottom_scale": 1.5,
        "left_scale": 1.5,
        "right_scale": 1.5
    }
}'
Request headers

Content-Type string (Required)

The content type of the request. Set this parameter to application/json.

Authorization string (Required)

The identity authentication credentials for the request. This API uses an Model Studio API key for identity authentication. Example: Bearer sk-xxxx.

X-DashScope-Async string (Required)

The asynchronous processing configuration parameter. HTTP requests support only asynchronous processing. You must set this parameter to enable.

Important

If this request header is missing, the error message "current user api does not support synchronous calls" is returned.

Request body

Multi-image reference

model string (Required)

The model name. For example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

The prompt that describes the desired elements and visual characteristics in the generated video.

The prompt can be in Chinese or English and must not exceed 800 characters. Each Chinese character or letter is counted as one character. Excess characters are automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. For multi-image reference, set this to image_reference.

The multi-image reference feature supports up to three reference images, which can contain entities and backgrounds, such as people, animals, clothing, and scenes. You can use the prompt parameter to describe the desired video content. The model then merges these images to generate a coherent video.

ref_images_url array[string] (Required)

An array of URLs for the input reference images.

The URL must be accessible over the Internet and support the HTTP or HTTPS protocol.

You can provide 1 to 3 reference images. If you provide more than 3 images, only the first 3 are used as input.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Cannot exceed 10 MB.

  • The URL cannot contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

  • If you use the background from a reference image, you can use at most one background image, and it should not contain any entity objects.

parameters object (Optional)

The video processing parameters, such as watermark settings.

Properties

obj_or_bg array[string] (Optional)

This parameter is an array that corresponds one-to-one with the ref_images_url parameter. Each element in this array specifies the purpose of the corresponding image, indicating whether it is an "entity" or a "background":

  • obj: The image that serves as an entity reference.

  • bg: The image used as a background reference. A maximum of one background reference is allowed.

Instructions:

  • We recommend that you provide this parameter, which must have the same length as ref_images_url. Otherwise, an error is returned.

  • You can omit this parameter only when ref_images_url is a single-element array, in which case the default value is ["obj"].

Example: ["obj", "obj", "bg"].

size string (Optional)

The resolution of the generated video (width × height). The model currently supports generating 720p videos. Valid values:

  • 1280 × 720 (default): The aspect ratio is 16:9, with a width of 1280 pixels and a height of 720 pixels.

  • 720 × 1280: The aspect ratio is 9:16.

  • 960 × 960: The aspect ratio is 1:1.

  • 832 × 1088: The aspect ratio is 3:4.

  • 1088 × 832: 4:3 aspect ratio.

duration integer (Optional)

The duration of the generated video in seconds. This parameter is fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, a large language model (LLM) rewrites the input prompt. This can significantly improve the quality of the generated video for short prompts but increases the processing time.

  • true (default)

  • false

seed integer (Optional)

The random seed controls the randomness of the generated content. The value of this parameter ranges from [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. To ensure that the generated content is relatively stable, use the same seed value for each request.

watermark bool (Optional)

Specifies whether to add a watermark. The watermark is located in the lower-right corner of the video and says "AI-generated".

  • false (default)

  • true

Video repainting

model string (Required)

The model name. For example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

The prompt that describes the desired elements and visual characteristics in the generated video.

The prompt can be in Chinese or English and must not exceed 800 characters. Each Chinese character or letter is counted as one character. Excess characters are automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. For video repainting, set the value to video_repainting.

Video repainting extracts the entity's pose and actions, composition and motion contours, and line art structure from an input video. The model then combines this information with a text prompt to generate a new video with the same dynamic features. You can also replace the entity in the original video with a reference image, for example, to change a character's appearance while retaining the original actions.

video_url string (Required)

The URL of the input video.

The URL must be publicly accessible and support the HTTP or HTTPS protocol.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher.

  • Video size: Cannot exceed 50 MB.

  • Video length: Cannot exceed 5 seconds. If it does, only the first 5 seconds are used.

  • The URL cannot contain Chinese characters.

Output video resolution:

  • If the input video resolution is 720p or lower, the output video retains the original resolution.

  • If the input video resolution is higher than 720p, it is scaled down to 720p or lower while maintaining the original aspect ratio.

Output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • For example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URLs for the input reference images. The URLs must be publicly accessible endpoints and support the HTTP or HTTPS protocol.

Only 1 reference image is supported. We recommend that this image be an entity image used to replace the entity in the input video.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Cannot exceed 10 MB.

  • The URL cannot contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

parameters object (Required)

The video processing parameters, such as watermark settings.

Properties

control_condition string (Required)

Specifies the method for video feature extraction.

  • posebodyface: Extracts the facial expressions and body movements of the entity in the input video. This is suitable for scenarios that require the preservation of facial details.

  • posebody: Extracts an entity's body movements from the input video, excluding facial expressions. Use this for scenarios where you need to control only body movements.

  • depth: Extracts the composition and motion contours from the input video.

  • scribble: Extracts the line art structure from the input video.

strength float (Optional)

Adjusts the control strength that the control_condition feature extraction method applies to the generated video.

The default value is 1.0. The value range is [0.0, 1.0].

A larger value makes the generated video closer to the original video's actions and composition. A smaller value allows for more creative freedom.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the quality of the generated video for short prompts but increases the processing time.

  • false (default, recommended)

  • true

If the text description is inconsistent with the video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, we recommend that you disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

The random seed controls the randomness of the generated content. The value of this parameter ranges from [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. To ensure that the generated content is relatively stable, use the same seed value for each request.

watermark bool (Optional)

Specifies whether to add a watermark. The watermark is located in the lower-right corner of the video and says "AI-generated".

  • false (default)

  • true

Local editing

model string (Required)

The model name. For example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

The prompt that describes the desired elements and visual characteristics in the generated video.

The prompt can be in Chinese or English and must not exceed 800 characters. Each Chinese character or letter is counted as one character. Excess characters are automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. For local editing, set the value to video_edit.

Local editing lets you add, modify, or delete elements in a specified area of an input video. You can also replace the entity or background in the editing area to achieve fine-grained video editing.

video_url string (Required)

The URL of the input video.

The URL must be publicly accessible and support the HTTP or HTTPS protocol.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher.

  • Video size: Cannot exceed 50 MB.

  • Video length: Cannot exceed 5 seconds. If it does, only the first 5 seconds are used.

  • The URL cannot contain Chinese characters.

Output video resolution:

  • If the input video resolution is 720p or lower, the output video retains the original resolution.

  • If the input video resolution is higher than 720p, it is scaled down to 720p or lower while maintaining the original aspect ratio.

Output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • For example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URLs for the input reference images.

The URL must be an endpoint that is accessible over the Internet and support the HTTP or HTTPS protocol.

Currently, only 1 reference image is supported. This image can be used as an entity or background to replace the corresponding content in the input video.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Cannot exceed 10 MB.

  • The URL cannot contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

  • If you use the background from a reference image, the background image should not contain any entity objects.

mask_image_url string (Optional)

The URL of the mask image. The URL must be a public URL and support the HTTP or HTTPS protocol.

This parameter specifies the video editing area. You must specify either this parameter or the mask_video_url parameter. We recommend this parameter.

The white area of the mask image (with a pixel value of [255, 255, 255]) indicates the part to be edited. The black area (with a pixel value of [0, 0, 0]) indicates the area to be preserved.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • The image resolution must be the same as the input video resolution (video_url).

  • Image size: Cannot exceed 10 MB.

  • The URL cannot contain Chinese characters.

mask_frame_id integer (Optional)

This parameter is used only when mask_image_url is not empty. It specifies the frame ID of the video frame where the masked object appears.

The default value is 1, which indicates the first frame of the video.

The value must be in the range [1, max_frame_id], where max_frame_id = input video frame rate × input video duration + 1.

For example, for an input video (video_url) with a frame rate of 16 FPS and a duration of 5 seconds, the total number of frames is 81 (16 × 5 + 1). Therefore, the value of max_frame_id is 81.

mask_video_url string (Optional)

The URL of the mask video. This URL must be an endpoint that is accessible over the Internet and supports the HTTP or HTTPS protocol.

This parameter specifies the area of the video to edit. You must specify either this parameter or the mask_image_url parameter.

The mask video must have the same format, frame rate, resolution, and length as the input video (video_url).

The white area of the mask video (with a pixel value of [255, 255, 255]) indicates the part to be edited. The black area (with a pixel value of [0, 0, 0]) indicates the area to be preserved.

parameters object (Optional)

The video processing parameters, such as watermark settings.

Properties

control_condition string (Optional)

Specifies the method for video feature extraction. The default value is "", which means no extraction is performed.

  • posebodyface: Extracts the facial expressions and body movements from an entity in the input video. This feature is suitable for scenarios where the entity's face is prominent in the frame and has clear features.

  • depth: Extracts the structural and motion contours from the input video.

mask_type string (Optional)

This parameter is effective only when mask_image_url is not empty. It specifies the behavior of the editing area.

  • tracking (default): The editing area dynamically follows the trajectory of the target object. This mode is suitable for scenes with moving objects.

  • fixed: The editing area remains fixed, regardless of the video content.

expand_ratio float (Optional)

When mask_type is set to tracking, this parameter specifies the expansion ratio of the mask area.

The value range is [0.0, 1.0]. The default value is 0.05. We recommend using the default value.

A smaller value makes the mask area fit the target object more closely. A larger value expands the mask area more widely.

expand_mode string (Optional)

This parameter specifies the shape of the mask area when mask_type is set to tracking.

The algorithm generates a mask video whose shape is based on the input mask image and the selected expand_mode. The following values are supported:

  • hull (default): Polygon mode. A polygon wraps the masked object.

  • bbox: Bounding box mode, which uses a rectangle to enclose the masked object.

  • original: Raw mode. Preserves the shape of the original masked object as much as possible.

size string (Optional)

The resolution of the generated video (width × height). The model currently supports generating 720p videos. Valid values:

  • 1280 × 720 (default): The aspect ratio is 16:9, with a width of 1280 pixels and a height of 720 pixels.

  • 720 × 1280: The aspect ratio is 9:16.

  • 960 × 960: The aspect ratio is 1:1.

  • 832 × 1088: The aspect ratio is 3:4.

  • 1088 × 832: 4:3 aspect ratio.

duration integer (Optional)

The duration of the generated video in seconds. This parameter is fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the quality of the generated video for short prompts but increases the processing time.

  • false (default, recommended)

  • true

If the text description is inconsistent with the video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, we recommend that you disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

The random seed controls the randomness of the generated content. The value of this parameter ranges from [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. To ensure that the generated content is relatively stable, use the same seed value for each request.

watermark bool (Optional)

Specifies whether to add a watermark. The watermark is located in the lower-right corner of the video and says "AI-generated".

  • false (default)

  • true

Video extension

model string (Required)

The model name. For example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

The prompt that describes the desired elements and visual characteristics in the generated video.

The prompt can be in Chinese or English and must not exceed 800 characters. Each Chinese character or letter is counted as one character. Excess characters are automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. For the video extension, the value is video_extension.

Video extension supports generating continuous content based on an image or video. It also supports extracting dynamic features, such as actions and composition, from a reference video to guide the generation of a video with similar motion performance.

The total duration of the extended video is 5 seconds. Note: This is the total duration of the final output video, not an additional 5-second extension to the original video.

first_frame_url string (Optional)

The URL of the first frame image.

The URL must be a public endpoint that supports the HTTP or HTTPS protocol.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Cannot exceed 10 MB.

  • The URL cannot contain Chinese characters.

last_frame_url string(Optional)

The URL of the last frame image. The URL must be accessible over the Internet and support the HTTP or HTTPS protocol.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Cannot exceed 10 MB.

  • The URL cannot contain Chinese characters.

first_clip_url string (Optional)

The URL of the first video segment. The URL must be a publicly accessible endpoint that supports the HTTP or HTTPS protocol.

Video requirements:

  • Video format: MP4.

  • The video frame rate must be 16 FPS or higher. If you use both first_clip_url and last_clip_url, we recommend that the two clips have the same frame rate.

  • Video size: Cannot exceed 50 MB.

  • The video cannot be longer than 3 seconds. If a video exceeds this limit, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration cannot exceed 3 seconds.

  • The URL cannot contain Chinese characters.

Output video resolution:

  • If the input video resolution is 720p or lower, the output video retains the original resolution.

  • If the input video resolution is higher than 720p, it is scaled down to 720p or lower while maintaining the original aspect ratio.

last_clip_url string(Optional)

The URL of the last video segment. The URL must be accessible over the Internet and support the HTTP or HTTPS protocol.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher. When using first_clip_url and last_clip_url together, we recommend that the two clips have the same frame rate.

  • Video size: Cannot exceed 50 MB.

  • Video length: The maximum video length is 3 seconds. If a video is longer than 3 seconds, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration cannot exceed 3 seconds.

  • The URL cannot contain Chinese characters.

Output video resolution:

  • If the input video resolution is 720p or lower, the output video retains the original resolution.

  • If the input video resolution is higher than 720p, it is scaled down to 720p or lower while maintaining the original aspect ratio.

video_url string (Optional)

The URL of the video. The URL must be publicly accessible and support the HTTP or HTTPS protocol.

This video extracts motion features and uses the first_frame_url, last_frame_url, first_clip_url, and last_clip_url parameters to generate an extended video with similar motion.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher, consistent with the preceding and succeeding clips.

  • Video resolution: Consistent with the preceding and succeeding frames and clips.

  • Video size: Cannot exceed 50 MB.

  • Video length: Cannot exceed 5 seconds. If it does, only the first 5 seconds are used.

  • The URL cannot contain Chinese characters.

parameters object (Optional)

The video processing parameters, such as the output video resolution.

Properties

control_condition string (Optional)

Specifies the method for video feature extraction. This is required when `video_url` is provided. The default value is "", which means no extraction is performed.

  • posebodyface: Extracts an entity's facial expressions and body movements from the input video.

  • depth: Extracts the composition and motion contours from the input video.

duration integer (Optional)

The duration of the generated video in seconds. This parameter is fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the quality of the generated video for short prompts but increases the processing time.

  • false (default, recommended)

  • true

If the text description is inconsistent with the video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, we recommend that you disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

The random seed controls the randomness of the generated content. The value of this parameter ranges from [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. To ensure that the generated content is relatively stable, use the same seed value for each request.

watermark bool (Optional)

Specifies whether to add a watermark. The watermark is located in the lower-right corner of the video and says "AI-generated".

  • false (default)

  • true

Video outpainting

model string (Required)

The model name. For example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

The prompt that describes the desired elements and visual characteristics in the generated video.

The prompt can be in Chinese or English and must not exceed 800 characters. Each Chinese character or letter is counted as one character. Excess characters are automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

Specifies the feature to use. For video outpainting, set this parameter to video_outpainting.

Video outpainting supports proportionally scaling a video in the up, down, left, and right directions.

video_url string (Required)

The URL of the input video.

The URL must be publicly accessible and support the HTTP or HTTPS protocol.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher.

  • Video size: Cannot exceed 50 MB.

  • Video length: Cannot exceed 5 seconds. If it does, only the first 5 seconds are used.

  • The URL cannot contain Chinese characters.

Output video resolution:

  • If the input video resolution is 720p or lower, the output video retains the original resolution.

  • If the input video resolution is higher than 720p, it is scaled down to 720p or lower while maintaining the original aspect ratio.

Output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • For example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

parameters object (Optional)

The video processing parameters, such as the scaling ratio.

Properties

top_scale float (Optional)

Centers the video frame and proportionally scales the video upward.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

bottom_scale float (Optional)

Centers the video frame and proportionally scales the video downward.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

left_scale float (Optional)

Centers the video frame and proportionally scales the video to the left.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

right_scale float (Optional)

Centers the video frame and proportionally scales the video to the right.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

duration integer (Optional)

The duration of the generated video in seconds. This parameter is fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the quality of the generated video for short prompts but increases the processing time.

  • false (default, recommended)

  • true

If the text description is inconsistent with the video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, we recommend that you disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

The random seed controls the randomness of the generated content. The value of this parameter ranges from [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. To ensure that the generated content is relatively stable, use the same seed value for each request.

watermark bool (Optional)

Specifies whether to add a watermark. The watermark is located in the lower-right corner of the video and says "AI-generated".

  • false (default)

  • true

Response parameters

Successful response

Save the task_id to query the task status and result.

{
    "output": {
        "task_status": "PENDING",
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
}

Error response

The task creation failed. For more information, see Error messages to resolve the issue.

{
    "code":"InvalidApiKey",
    "message":"Invalid API-key provided.",
    "request_id":"fb53c4ec-1c12-4fc4-a580-xxxxxx"
}

output object

The task output information.

Properties

task_id string

The task ID. The query is valid for 24 hours.

task_status string

The task status.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN

request_id string

The unique request ID. You can use this ID to trace and troubleshoot issues.

code string

The error code for a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

message string

The detailed information about a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

Step 2: Query the result by task ID

Singapore region: GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

Beijing region: GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Request parameters

Query task result

Replace 86ecf553-d340-4e21-xxxxxxxxx with the actual task ID.

The API keys for the Singapore and Beijing regions are different. Obtain an API key.
The following code provides the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Request headers

Authorization string (Required)

The identity authentication credentials for the request. This API uses an Model Studio API key for identity authentication. Example: Bearer sk-xxxx.

URL path parameters

task_id string (Required)

The task ID.

Response parameters

Task successful

Task data, such as the task status and video URL, is retained for only 24 hours and is automatically purged. Make sure to save the generated video promptly.

{
    "request_id": "851985d0-fbba-9d8d-a17a-xxxxxx",
    "output": {
        "task_id": "208e2fd1-fcb4-4adf-9fcc-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-05-15 16:14:44.723",
        "scheduled_time": "2025-05-15 16:14:44.750",
        "end_time": "2025-05-15 16:20:09.389",
        "video_url": "https://dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com/xxx.mp4?xxxxxx",
        "orig_prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records the girl's wonderful encounter with nature.",
        "actual_prompt": "A girl in a light-colored long dress slowly walks out from a misty, ancient forest, her steps as light as a dance. She has slightly curly long hair, a delicate face, and bright eyes. The camera follows her movements, capturing every nimble moment. When she stops, turns, and looks around at the lush woods, a smile of surprise and joy blossoms on her face. Sunlight filters through the leaves, casting mottled shadows and freezing this beautiful moment of harmony between human and nature. The style is a fresh and natural portrait, combining medium and full shots with a level perspective and slight camera movement."
    },
    "usage": {
        "video_duration": 5,
        "video_ratio": "standard",
        "video_count": 1
    }
}

Task failed

If a task fails, task_status is set to FAILED, and an error code and message are provided. For more information, see Error messages to resolve the issue.

{
    "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d",
    "output": {
        "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010",
        "task_status": "FAILED",
        "code": "InvalidParameter",
        "message": "The size is not match xxxxxx"
    }
}

output object

The task output information.

Properties

task_id string

The task ID. The query is valid for 24 hours.

task_status string

The task status.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN

submit_time string

The time when the task was submitted. The format is YYYY-MM-DD HH:mm:ss.SSS.

scheduled_time string

The time when the task started running. The format is YYYY-MM-DD HH:mm:ss.SSS.

end_time string

The time when the task was completed. The format is YYYY-MM-DD HH:mm:ss.SSS.

video_url string

The video URL. The link is valid for 24 hours. You can use this URL to download the video. The output video format is MP4 (H.264 encoding).

orig_prompt string

The original input prompt.

actual_prompt string

The actual prompt used after prompt rewriting is enabled. If prompt rewriting is disabled, this field is not returned.

code string

The error code for a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

message string

The detailed information about a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

usage object

Usage statistics for the task. Only successful tasks are counted.

Properties

video_duration integer

The duration of the generated video in seconds.

video_ratio string

The aspect ratio of the generated video is fixed at standard.

video_count integer

The number of generated videos.

request_id string

The unique request ID. You can use this ID to trace and troubleshoot issues.

Billing and rate limits

Error codes

If a call fails, see Error messages for troubleshooting.

This API also has the specific status codes that are shown in the following table.

HTTP status code

Error code

Error message

Description

200

InvalidParameter

ref_images_url and obj_or_bg must be the same length

When using the multi-image reference feature, ensure that the ref_images_url and obj_or_bg arrays have the same length.

400

InvalidParameter

InvalidParameter

The request parameters are invalid.

400

IPInfringementSuspect

Input data is suspected of being involved in IP infringement.

The input data, such as the prompt or image, is suspected of IP infringement. Check the input to ensure that it does not contain content that poses an infringement risk.

400

DataInspectionFailed

Input data may contain inappropriate content.

The input data, such as the prompt or image, may contain inappropriate content. Modify the input and retry.

500

InternalError

InternalError

A service error occurred. Try the request again to rule out a transient issue.

Video access configuration

Configure a domain name whitelist to ensure that your business system can access video links

Generated videos are stored in Alibaba Cloud OSS. Each video is assigned an OSS link, such as https://dashscope-result-xx.oss-cn-xxxx.aliyuncs.com/xxx.mp4. OSS links allow public access, and you can use them to download the video. The link is valid for only 24 hours.

If your business has high security requirements and cannot access Alibaba Cloud OSS links, you must configure a whitelist for public access. Add the following domain names to your whitelist to access the video links.

# OSS domain name list
dashscope-result-bj.oss-cn-beijing.aliyuncs.com
dashscope-result-hz.oss-cn-hangzhou.aliyuncs.com
dashscope-result-sh.oss-cn-shanghai.aliyuncs.com
dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com
dashscope-result-zjk.oss-cn-zhangjiakou.aliyuncs.com
dashscope-result-sz.oss-cn-shenzhen.aliyuncs.com
dashscope-result-hy.oss-cn-heyuan.aliyuncs.com
dashscope-result-cd.oss-cn-chengdu.aliyuncs.com
dashscope-result-gz.oss-cn-guangzhou.aliyuncs.com
dashscope-result-wlcb-acdr-1.oss-cn-wulanchabu-acdr-1.aliyuncs.com