All Products
Search
Document Center

Alibaba Cloud Model Studio:Wan - general video editing API reference

Last Updated:Feb 11, 2026

The Wan general video editing model supports multiple input modalities, such as text, images, and videos, and can perform various video generation and editing tasks.

References: User guide

Availability

To ensure successful API calls, the model, endpoint URL, and API key must belong to the same region. Cross-region calls will fail.

Note

The example code applies to the Singapore region.

HTTP

Text-to-video tasks are time-consuming (typically 5 to 10 minutes), so the API uses asynchronous invocation with two steps: "Create a task → Poll for the result":

Step 1: Create a task

Singapore

POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Beijing

POST https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Request parameters

Multi-image reference

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "image_reference",
        "prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When she stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.",
        "ref_images_url": [
            "http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
            "http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
        ]
    },
    "parameters": {
        "prompt_extend": true,
        "obj_or_bg": ["obj","bg"],
        "size": "1280*720"
    }
}'

Video repainting

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_repainting",
        "prompt": "The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
    },
    "parameters": {
        "prompt_extend": false,
        "control_condition": "depth"
    }
}'

Local editing

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_edit",
        "prompt": "The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.",
        "mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
        "mask_frame_id": 1
    },
    "parameters": {
        "prompt_extend": false,
        "mask_type": "tracking",
        "expand_ratio": 0.05
    }
}'

Video extension

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_extension",
        "prompt": "A dog wearing sunglasses is skateboarding on the street, 3D cartoon.",
        "first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
    },
    "parameters": {
        "prompt_extend": false
    }
}'

Video outpainting

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_outpainting",
        "prompt": "An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
    },
    "parameters": {
        "prompt_extend": false,
        "top_scale": 1.5,
        "bottom_scale": 1.5,
        "left_scale": 1.5,
        "right_scale": 1.5
    }
}'
Request headers

Content-Type string (Required)

The content type of the request. Must be application/json.

Authorization string (Required)

The authentication credentials using a Model Studio API key.

Example: Bearer sk-xxxx

X-DashScope-Async string (Required)

Enables asynchronous processing. Must be enable as HTTP requests support only asynchronous processing.

Important

Returns "current user api does not support synchronous calls" error if not included.

Request body

Multi-image reference

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text prompt describing the elements and visual features for the generated video.

Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

Feature name. For multi-image reference, set this to image_reference.

The multi-image reference feature supports up to three reference images. The image content can include entities and backgrounds, such as people, animals, clothing, and scenes. Use the prompt parameter to describe the desired video content. The model can then merge multiple images to generate a coherent video.

ref_images_url array[string] (Required)

URLs for input reference images.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

You can provide 1 to 3 reference images. If you provide more than 3 images, only the first 3 are used as input.

Requirements for reference images:

  • Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Resolution: Both width and height must be between 360 and 2,000 pixels.

  • Size: Maximum 10 MB.

  • The URL must not contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

  • If you use the background from a reference image, you can use at most one background image, and it should not contain any entity objects.

parameters object (Optional)

The video processing parameters, such as watermark settings.

Properties

obj_or_bg array[string] (Optional)

Identifies the purpose of each reference image and corresponds one-to-one with the ref_images_url parameter. Each element indicates whether the corresponding image is an "entity" or "background":

  • obj: Entity reference.

  • bg: Background reference. Maximum of one background reference is allowed.

Instructions:

  • We recommend that you provide this parameter. Its length must be the same as ref_images_url. Otherwise, an error is reported.

  • You can omit this parameter only when ref_images_url is a single-element array. In this case, the default value is ["obj"].

Example: ["obj", "obj", "bg"].

size string (Optional)

Video resolution in width*height format. Available values:

  • 1280*720 (default): 16:9.

  • 720*1280: 9:16.

  • 960*960: 1:1.

  • 832*1088: 3:4.

  • 1088*832: 4:3.

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases processing time.

  • true (default)

  • false

seed integer (Optional)

Random seed controlling the randomness of generated content. Value range: [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in the lower-right corner).

  • false (default)

  • true

Video Re-rendering

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text prompt describing the elements and visual features for the generated video.

Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

Feature name. For video repainting, set this to video_repainting.

Video repainting extracts entity pose and actions, composition and motion contours, and sketch structure from an input video, then combines this with a text prompt to generate a new video with the same dynamic features. You can also replace the entity in the original video with a reference image, for example, to change a character's appearance while retaining the original actions.

video_url string (Required)

URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Requirements for input videos:

  • Format: MP4.

  • Frame rate: 16 FPS or higher.

  • Size: Maximum 50 MB.

  • Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

About the output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URLs for the input reference images.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Only 1 reference image is supported. We recommend that this image be an entity image used to replace the entity in the input video.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

parameters object (Required)

The video processing parameters, such as watermark settings.

Properties

control_condition string (Required)

Sets the method for video feature extraction.

  • posebodyface: Extracts the facial expressions and body movements of the entity in the input video. This is suitable for scenarios that require the preservation of facial details.

  • posebody: Extracts an entity's body movements from the input video, excluding facial expressions. Use this for scenarios where you need to control only body movements.

  • depth: Extracts the composition and motion contours from the input video.

  • scribble: Extracts the sketch structure from the input video.

strength float (Optional)

Adjusts the control strength that the control_condition feature extraction method applies to the generated video.

The default value is 1.0. The value range is [0.0, 1.0].

A larger value makes the generated video closer to the original video's actions and composition. A smaller value allows for more creative freedom.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling the randomness of generated content. Value range: [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in the lower-right corner).

  • false (default)

  • true

Local editing

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text prompt describing the elements and visual features for the generated video.

Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. For local editing, set the value to video_edit.

Local editing lets you add, modify, or delete elements in a specified area of an input video. You can also replace the entity or background in the editing area to achieve fine-grained video editing.

video_url string (Required)

URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Requirements for input videos:

  • Format: MP4.

  • Frame rate: 16 FPS or higher.

  • Size: Maximum 50 MB.

  • Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

About the output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URLs for the input reference images.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Currently, only 1 reference image is supported. This image can be used as an entity or background to replace the corresponding content in the input video.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

  • If you use the background from a reference image, the background image should not contain any entity objects.

mask_image_url string (Optional)

The URL of the mask image.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

This parameter specifies the video editing area. You must specify either this parameter or the mask_video_url parameter. We recommend this parameter.

The white area of the mask image (with a pixel value of exactly [255, 255, 255]) indicates the area to edit. The black area (with a pixel value of exactly [0, 0, 0]) indicates the area to preserve.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: Must be exactly the same as the input video (video_url) resolution.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

mask_frame_id integer (Optional)

This parameter is used only when mask_image_url is not empty. It identifies the frame in which the masked object appears, represented by a "frame ID".

The default value is 1, which indicates the first frame of the video.

The value must be in the range [1, max_frame_id], where max_frame_id = input video frame rate × input video duration + 1.

For example, for an input video (video_url) with a frame rate of 16 FPS and a duration of 5 seconds, the total number of frames is 81 (16*5 + 1). Therefore, the value of max_frame_id is 81.

mask_video_url string (Optional)

The URL of the mask video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

This parameter specifies the area of the video to edit. You must specify either this parameter or the mask_image_url parameter.

The mask video must have the same video format, frame rate, resolution, and length as the input video (video_url).

The white area of the mask video (with a pixel value of exactly [255, 255, 255]) indicates the area to edit. The black area (with a pixel value of exactly [0, 0, 0]) indicates the area to preserve.

parameters object (Optional)

The video processing parameters, such as watermark settings.

Properties

control_condition string (Optional)

Sets the method for video feature extraction. The default value is "", which means no extraction is performed.

  • posebodyface: Extracts the facial expressions and body movements of the entity in the input video. This is suitable for scenarios where the entity's face is large in the frame and has clearly visible features.

  • depth: Extracts the composition and motion contours from the input video.

mask_type string (Optional)

This parameter is effective only when mask_image_url is not empty. It specifies the behavior of the editing area.

  • tracking (default): The editing area dynamically follows the trajectory of the target object. This mode is suitable for scenes with moving objects.

  • fixed: The editing area remains fixed and does not change with the video content.

expand_ratio float (Optional)

When mask_type is set to tracking, this parameter applies and specifies the outward expansion ratio of the mask area.

The value range is [0.0, 1.0]. The default value is 0.05. We recommend using the default value.

A smaller value makes the mask area fit the target object more closely. A larger value expands the mask area more widely.

expand_mode string (Optional)

When mask_type is set to tracking, this parameter applies and specifies the shape of the mask area.

The algorithm generates a mask video with a corresponding shape based on the input mask image and the selected expand_mode. The following values are supported:

  • hull (default): Polygon mode. A polygon wraps the masked object.

  • bbox: Bounding box mode. A rectangle wraps the masked object.

  • original: Raw mode. Preserves the shape of the original masked object as much as possible.

size string (Optional)

Video resolution in width*height format. Available values:

  • 1280*720 (default): 16:9.

  • 720*1280: 9:16.

  • 960*960: 1:1.

  • 832*1088: 3:4.

  • 1088*832: 4:3.

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling the randomness of generated content. Value range: [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in the lower-right corner).

  • false (default)

  • true

Video extension

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text prompt describing the elements and visual features for the generated video.

Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. The video extension is set to video_extension.

Video extension generates continuous content based on an image or video. It also extracts dynamic features, such as actions and composition, from a reference video to guide the generation of a video with similar motion.

The total duration of the extended video is 5 seconds. Note: This is the total duration of the final output video, not an additional 5-second extension to the original video.

first_frame_url string (Optional)

The URL of the first frame image.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

last_frame_url string(Optional)

The URL of the last frame image.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

first_clip_url string (Optional)

The URL of the first video segment.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher. If you use both first_clip_url and last_clip_url, we recommend that the two clips have the same frame rate.

  • Video size: Must not exceed 50 MB.

  • Video duration: Must not exceed 3 seconds. If the duration is longer, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration must not exceed 3 seconds.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

last_clip_url string(Optional)

The URL of the last video segment.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://help-static-aliyun-doc.aliyuncs.com/xxx.mp3.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher. When using first_clip_url and last_clip_url together, we recommend that the two clips have the same frame rate.

  • Video size: Must not exceed 50 MB.

  • Video duration: Must not exceed 3 seconds. If the duration is longer, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration must not exceed 3 seconds.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

video_url string (Optional)

The URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://help-static-aliyun-doc.aliyuncs.com/xxx.mp3.

This video is mainly used to extract motion features. It is used with the first_frame_url, last_frame_url, first_clip_url, and last_clip_url parameters to guide the generation of an extended video with similar motion.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher, consistent with the preceding and succeeding clips.

  • Video resolution: Consistent with the preceding and succeeding frames and clips.

  • Video size: Must not exceed 50 MB.

  • Video duration: Must not exceed 5 seconds. If the duration is longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

parameters object (Optional)

The video processing parameters, such as the output video resolution.

Properties

control_condition string (Optional)

Sets the method for video feature extraction. This is required when `video_url` is provided. The default value is "", which means no extraction is performed.

  • posebodyface: Extracts an entity's facial expressions and body movements from the input video.

  • depth: Extracts the composition and motion contours from the input video.

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling the randomness of generated content. Value range: [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in the lower-right corner).

  • false (default)

  • true

Video outpainting

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text prompt describing the elements and visual features for the generated video.

Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

Specifies the feature to use. For video outpainting, set this parameter to video_outpainting.

Video Frame Extension allows for proportional video extension in the up, down, left, and right directions.

video_url string (Required)

URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Requirements for input videos:

  • Format: MP4.

  • Frame rate: 16 FPS or higher.

  • Size: Maximum 50 MB.

  • Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

About the output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

parameters object (Optional)

The video processing parameters, such as the scaling ratio.

Properties

top_scale float (Optional)

Centers the video frame and proportionally scales the video upward.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

bottom_scale float (Optional)

Centers the video frame and proportionally scales the video downward.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

left_scale float (Optional)

Centers the video frame and proportionally scales the video to the left.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

right_scale float (Optional)

Centers the video frame and proportionally scales the video to the right.

The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling.

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling the randomness of generated content. Value range: [0, 2147483647].

If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in the lower-right corner).

  • false (default)

  • true

Response parameters

Successful response

Save the task_id to query the task status and result.

{
    "output": {
        "task_status": "PENDING",
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
}

Error response

Task creation failed. See error codes to resolve the issue.

{
    "code": "InvalidApiKey",
    "message": "No API-key provided.",
    "request_id": "7438d53d-6eb8-4596-8835-xxxxxx"
}

output object

The task output information.

Properties

task_id string

The ID of the task. Can be used to query the task for up to 24 hours.

task_status string

The status of the task.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: Task does not exist or status is unknown

request_id string

Unique identifier for the request. Use for tracing and troubleshooting issues.

code string

The error code. Returned only when the request fails. See error codes for details.

message string

Detailed error message. Returned only when the request fails. See error codes for details.

Step 2: Query the result

Singapore

GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

Beijing

GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Request parameters

Query task result

Replace {task_id} with the task_id value returned by the previous API call.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Request headers

Authorization string (Required)

The authentication credentials using a Model Studio API key.

Example: Bearer sk-xxxx

URL path parameters

task_id string (Required)

The ID of the task to query.

Response parameters

Task succeeded

Task data (task status, video URL, etc.) is retained for only 24 hours and is then automatically deleted. Save generated videos promptly.

{
    "request_id": "851985d0-fbba-9d8d-a17a-xxxxxx",
    "output": {
        "task_id": "208e2fd1-fcb4-4adf-9fcc-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-05-15 16:14:44.723",
        "scheduled_time": "2025-05-15 16:14:44.750",
        "end_time": "2025-05-15 16:20:09.389",
        "video_url": "https://dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com/xxx.mp4?xxxxxx",
        "orig_prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.",
        "actual_prompt": "A girl in a light-colored long dress slowly walks out from a misty, ancient forest, her steps as light as a dance. She has slightly curly long hair, a delicate face, and bright eyes. The camera follows her movements, capturing every nimble moment. When she stops, turns, and looks around at the lush woods, a smile of surprise and joy blossoms on her face. Sunlight filters through the leaves, casting mottled shadows and freezing this beautiful moment of harmony between human and nature. The style is a fresh and natural portrait, combining medium and full shots with a level perspective and slight camera movement."
    },
    "usage": {
        "video_duration": 5,
        "video_ratio": "standard",
        "video_count": 1
    }
}

Task failed

When a task fails, task_status is set to FAILED with an error code and message. See error codes to resolve the issue.

{
    "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d",
    "output": {
        "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010",
        "task_status": "FAILED",
        "code": "InvalidParameter",
        "message": "The size is not match xxxxxx"
    }
}

output object

The task output information.

Properties

task_id string

The ID of the task. Can be used to query the task for up to 24 hours.

task_status string

The status of the task.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: Task does not exist or status is unknown

submit_time string

The time when the task was submitted. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

scheduled_time string

The time when the task started running. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

end_time string

The time when the task was completed. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

video_url string

Video URL. Valid for 24 hours. Use this URL to download the video. Output video format: MP4 (H.264 encoding).

orig_prompt string

Original input prompt.

actual_prompt string

Actual prompt used after prompt rewriting is enabled. If prompt rewriting is disabled, this field is not returned.

code string

The error code. Returned only when the request fails. See error codes for details.

message string

Detailed error message. Returned only when the request fails. See error codes for details.

usage object

Output statistics, counted only for successful tasks.

Properties

video_duration integer

Duration of the generated video in seconds.

video_ratio string

Aspect ratio of the generated video. Fixed at standard.

video_count integer

Number of generated videos.

request_id string

Unique identifier for the request. Use for tracing and troubleshooting issues.

Limitations

  • Data validity: The task_id and video video_url are retained for only 24 hours. After expiration, they cannot be queried or downloaded.

  • Audio support: The model currently generates silent videos and does not support audio output. If needed, you can generate audio using speech synthesis.

  • Network access configuration: Video links are stored in Object Storage Service (OSS). If your system cannot access external OSS links due to security policies, add the following OSS domains to your network access whitelist.

    # List of OSS domain names
    dashscope-result-bj.oss-cn-beijing.aliyuncs.com
    dashscope-result-hz.oss-cn-hangzhou.aliyuncs.com
    dashscope-result-sh.oss-cn-shanghai.aliyuncs.com
    dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com
    dashscope-result-zjk.oss-cn-zhangjiakou.aliyuncs.com
    dashscope-result-sz.oss-cn-shenzhen.aliyuncs.com
    dashscope-result-hy.oss-cn-heyuan.aliyuncs.com
    dashscope-result-cd.oss-cn-chengdu.aliyuncs.com
    dashscope-result-gz.oss-cn-guangzhou.aliyuncs.com
    dashscope-result-wlcb-acdr-1.oss-cn-wulanchabu-acdr-1.aliyuncs.com

Error codes

If a model call fails and returns an error message, see Error messages for troubleshooting.