All Products
Search
Document Center

Alibaba Cloud Model Studio:Wan general video editing API reference

Last Updated:Jun 23, 2025

This topic describes the input and output parameters for the Wan All-in-one Video Creation and Editing (Wan VACE) model. It supports multiple input modalities including text, image, and video, and can perform various video generation and editing tasks.

Model overview

Name

Unit price

Rate limits (shared by Alibaba Cloud account and RAM users)

Free quota

Requests per second (RPS) for task submission

Number of concurrent tasks

wan2.1-vace-plus

$0.1/second

2

2

Free quota: 50 seconds

Valid for 180 days after activation

Performance showcase

Feature

Reference images

Prompt

Output video

Multi-image reference

Image 1 (subject)

image

Image 2 (background)

image

In the video, a girl gracefully emerges from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every fluid moment. When the girl stands still and looks around at the lush trees, her face lights up with a smile that blends surprise and joy. This moment, frozen in the interplay of light and shadow, captures the girl's wonderful encounter with nature.

Output video

Video repainting

The video shows a black steampunk-style car driven by a gentleman, decorated with gears and copper pipes. The background is a steam-powered candy factory with vintage elements, creating a retro and playful scene.

Masked editing

Input video

Input mask image (white area to be edited)

mask

The video shows a Parisian-style French café where a lion wearing a suit is elegantly enjoying coffee. It holds a coffee cup in one hand, sipping gently with a contented expression. The café is decorated elegantly, with soft tones and warm lighting illuminating the area where the lion is seated.

Modifying the content in the editing area according to the prompt

Video extension

Input first segment (1 second)

A dog wearing sunglasses skateboarding on the street, 3D cartoon.

Output extended video (5 seconds)

Video outpainting

An elegant lady is passionately playing the violin, behind her is a complete symphony orchestra.

Prerequisites

The Wan VACE API currently only supports HTTP calls.

You must first obtain an API key and set the API key as an environment variable.

HTTP calls

The video generation model processing takes a relatively long time. To avoid request timeouts, HTTP calls only support asynchronous retrieval of model results. You need to make two requests:

  1. Create task: Send a request to create a task, which will return a task ID.

  2. Query results using the ID: Use the task ID to query the task status and results. If successful, a video URL will be returned, which is valid for 24 hours.

Note

After you create a task, it will be added to a queue. Later, call the query interface to get the task status and results based on the task ID.

Video processing takes a relatively long time (about 5-10 minutes). The actual time depends on the number of queued tasks and service execution conditions.

Step 1: Create task

POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Request parameters

Multi-image reference

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "image_reference",
        "prompt": "In the video, a girl gracefully emerges from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every fluid moment. When the girl stands still and looks around at the lush trees, her face lights up with a smile that blends surprise and joy. This moment, frozen in the interplay of light and shadow, captures the girl's wonderful encounter with nature.",
        "ref_images_url": [
            "http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
            "http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
        ]
    },
    "parameters": {
        "obj_or_bg": ["obj","bg"],
        "size": "1280*720"
    }
}'

Video repainting

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_repainting",
        "prompt": "The video shows a black steampunk-style car driven by a gentleman, decorated with gears and copper pipes. The background is a steam-powered candy factory with vintage elements, creating a retro and playful scene.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
    },
    "parameters": {
        "control_condition": "depth"
    }
}'

Masked editing

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_edit",
        "prompt": "The video shows a Parisian-style French café where a lion wearing a suit is elegantly enjoying coffee. It holds a coffee cup in one hand, sipping gently with a contented expression. The café is decorated elegantly, with soft tones and warm lighting illuminating the area where the lion is seated.",
        "mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
        "mask_frame_id": 1
    },
    "parameters": {
        "mask_type": "tracking",
        "expand_ratio": 0.05
    }
}'

Video extension

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_extension",
        "prompt": "A dog wearing sunglasses skateboarding on the street, 3D cartoon.",
        "first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
    },
    "parameters": {}
}'

Video outpainting

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_outpainting",
        "prompt": "An elegant lady is passionately playing the violin, behind her is a complete symphony orchestra.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
    },
    "parameters": {
        "top_scale": 1.5,
        "bottom_scale": 1.5,
        "left_scale": 1.5,
        "right_scale": 1.5
    }
}'
Headers

Content-Type string (Required)

The type of the request content. This parameter must be set to application/json.

Authorization string (Required)

The identity authentication for the request, which is the Model Studio API key. Example: Bearer d1xxx2a.

X-DashScope-Async string (Required)

The asynchronous processing parameter. HTTP requests only support asynchronous mode, so this parameter must be set to enable.

Request body

Multi-image reference

model string (Required)

Model name. Example value: wan2.1-vace-plus.

input object (Required)

Basic input information, such as prompts.

Properties

prompt string (Required)

The prompt that describes the elements and visual characteristics expected in the generated video.

Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated.

For prompt tips, see Video generation prompt guide.

function string (Required)

The feature name, set to image_reference.

Supports up to 3 reference images, including subjects and backgrounds, such as people, animals, clothing, and scenes. Use prompt to describe the desired video content, and the model can blend multiple images to generate coherent video content.

ref_images_url array[string] (Required)

An array of URLs for input reference images.

The URLs must be publicly accessible and support HTTP or HTTPS protocols.

Supports 1 to 3 reference images. If more than 3 images are provided, only the first 3 will be used as input.

Image restrictions:

  • Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP.

  • Image resolution: The width and height must be within [360, 2000] pixels.

  • File size: Not exceeding 10 MB.

  • URL addresses cannot contain Chinese characters.

Recommendations:

  • When using the subject from a reference image, each image should contain only one subject. The background should be a solid color (such as white or a single color) to better highlight the subject.

  • When using the background from a reference image, there can be at most one background image, and the background image should not contain any subject objects.

parameters object (Optional)

Video processing parameters, such as watermark settings.

Properties

obj_or_bg array[string] (Optional)

The purpose of each reference image, corresponding one-to-one with ref_images_url. Each element in the array indicates whether the image at the corresponding position is a "subject" or "background":

  • obj: A subject reference.

  • bg: A background reference (only one allowed).

Usage notes:

  • The length of this parameter should match ref_images_url.

  • If this parameter is not provided or the lengths do not match, the default is obj (subject).

Example value: ["obj", "obj", "bg"].

size string (Optional)

The resolution of the generated video (width*height). Currently supports generating 720P videos, with the following resolution values:

  • 1280*720 (default): Video aspect ratio is 16:9. Where 1280 represents the width and 720 represents the height.

  • 720*1280: Video aspect ratio is 9:16.

  • 960*960: Video aspect ratio is 1:1.

  • 832*1088: Video aspect ratio is 3:4.

  • 1088*832: Video aspect ratio is 4:3.

duration integer (Optional)

Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video.

prompt_extend bool (Optional)

Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time.

  • true (default)

  • false

seed integer (Optional)

Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647].

If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed.

watermark bool (Optional)

Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated".

  • false (default)

  • true

Video repainting

model string (Required)

Model name. Example value: wan2.1-vace-plus.

input object (Required)

Basic input information, such as prompts.

Properties

prompt string (Required)

The prompt that describes the elements and visual characteristics expected in the generated video.

Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated.

For prompt tips, see Video generation prompt guide.

function string (Required)

The feature name, set to video_repainting.

Video repainting supports extracting subject poses and actions, composition, motion outlines, and line drawing structures from the input video. Combined with text prompts, it generates new videos with the same dynamic characteristics. It also supports replacing the subject in the original video with a reference image, such as changing the character appearance while retaining the original actions.

video_url string (Required)

The URL address of the input video.

The URL must be publicly accessible and support HTTP or HTTPS protocols.

Video restrictions:

  • Video format: MP4.

  • Frame rate: Greater than or equal to 16 FPS.

  • File size: Not exceeding 50 MB.

  • Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used.

  • URL addresses cannot contain Chinese characters.

About the output video resolution:

  • If the input video resolution is ≤ 720P, the output will maintain the original resolution;

  • If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.

About the output video duration:

  • The output video duration matches the input video, but not exceeding 5 seconds.

  • Example: If the input video is 3 seconds, the output will also be 3 seconds. If the input is 6 seconds, the output will be the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URL for input reference image. The URL must be publicly accessible and support HTTP or HTTPS protocols.

Only supports 1 reference image, which should be a subject image used to replace the subject content in the input video.

Image restrictions:

  • Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP.

  • Image resolution: The width and height must be within [360, 2000] pixels.

  • File size: Not exceeding 10 MB.

  • URL addresses cannot contain Chinese characters.

Recommendations:

  • When using the subject from a reference image, each image should contain only one subject. The background should be a solid color (such as white or a single color) to better highlight the subject.

parameters object (Required)

Video processing parameters, such as watermark settings.

Properties

control_condition string (Required)

Sets the method for video feature extraction.

  • posebodyface: Extracts facial expressions and body movements of the subject in the input video, suitable for scenarios where subject expression details need to be preserved.

  • posebody: Extracts the body movements of the subject in the input video (excluding facial expressions), suitable for scenarios where only body movements need to be controlled.

  • depth: Extracts the composition and motion outline of the input video.

  • scribble: Extracts the line drawing structure of the input video.

strength float (Optional)

Adjusts the control strength of the video feature extraction method specified by control_condition on the generated video.

The default value is 1.0, with a range of [0.0, 1.0].

The higher the value, the closer the generated video will be to the original video's actions and composition. The lower the value, the more freedom in the generated content.

prompt_extend bool (Optional)

Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time.

  • false (default) (Recommended)

  • true

When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy.

seed integer (Optional)

Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647].

If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed.

watermark bool (Optional)

Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated".

  • false (default)

  • true

Masked editing

model string (Required)

Model name. Example value: wan2.1-vace-plus.

input object (Required)

Basic input information, such as prompts.

Properties

prompt string (Required)

The prompt that describes the elements and visual characteristics expected in the generated video.

Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated.

For prompt tips, see Video generation prompt guide.

function string (Required)

The feature name, set to video_edit.

Masked editing supports adding, modifying, or removing elements in specified areas of the input video, along with replacing subjects or backgrounds in the editing area, enabling fine-grained video editing.

video_url string (Required)

The URL address of the input video.

The URL must be publicly accessible and support HTTP or HTTPS protocols.

Video restrictions:

  • Video format: MP4.

  • Frame rate: Greater than or equal to 16 FPS.

  • File size: Not exceeding 50 MB.

  • Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used.

  • URL addresses cannot contain Chinese characters.

About the output video resolution:

  • If the input video resolution is ≤ 720P, the output will maintain the original resolution;

  • If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.

About the output video duration:

  • The output video duration matches the input video, but not exceeding 5 seconds.

  • Example: If the input video is 3 seconds, the output will also be 3 seconds. If the input is 6 seconds, the output will be the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URL for input reference image.

The URL must be publicly accessible and support HTTP or HTTPS protocols.

Currently only supports passing 1 reference image, which can be used as a subject or background to replace the corresponding content in the input video.

Image restrictions:

  • Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP.

  • Image resolution: The width and height must be within [360, 2000] pixels.

  • File size: Not exceeding 10 MB.

  • URL addresses cannot contain Chinese characters.

Recommendations:

  • When using the subject from a reference image, each image should contain only one subject. The background should be a solid color (such as white or a single color) to better highlight the subject.

  • When using the background from a reference image, the background image should not contain any subject objects.

mask_image_url string (Optional)

The URL of the mask image. The URL must be publicly accessible and support HTTP or HTTPS protocols.

This parameter is used to specify the editing area of the video. Choose either this parameter or mask_video_url, but we recommend this parameter.

White areas in the mask image (pixel values strictly [255, 255, 255]) indicate parts that need to be edited. Black areas (pixel values strictly [0, 0, 0]) indicate parts that remain unchanged.

Image restrictions:

  • Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP.

  • Image resolution: Must exactly match the input video (video_url) resolution.

  • File size: Not exceeding 10 MB.

  • URL addresses cannot contain Chinese characters.

mask_frame_id integer (Optional)

When mask_image_url is not empty, this parameter takes effect and identifies which frame in the video the mask target appears in, represented as a "frame ID".

The default value is 1, in frames, representing the first frame of the video.

The value range is [1, max_frame_id], where max_frame_id = input video frame rate × input video duration + 1.

For example, if the input video (video_url) has a frame rate of 16 FPS (frames per second), and the video duration is 5 seconds, then the total number of frames in the input video is 16*5+1 = 81, so max_frame_id = 81.

mask_video_url string (Optional)

The URL of the mask video. The URL must be publicly accessible and support HTTP or HTTPS protocols.

This parameter is used to specify the editing area of the video. Choose either this parameter or mask_image_url.

The video format, frame rate, resolution, and length of the mask video must exactly match the input video (video_url).

White areas in the mask video (pixel values strictly [255, 255, 255]) indicate parts that need to be edited. Black areas (pixel values strictly [0, 0, 0]) indicate parts that remain unchanged.

parameters object (Optional)

Video processing parameters, such as watermark settings.

Properties

control_condition string (Optional)

Sets the method for video feature extraction. The default is "", indicating no extraction.

  • posebodyface: Extracts facial expressions and body movements of the subject in the input video, suitable for scenarios where the subject's face occupies a large portion of the frame and features are clearly visible.

  • depth: Extracts the composition and motion outline of the input video.

mask_type string (Optional)

When mask_image_url is not empty, this parameter takes effect and specifies how the editing area behaves.

  • tracking (default): The editing area will dynamically follow the target object's motion trajectory, suitable for scenes with moving subject.

  • fixed: The editing area remains fixed and does not change with the content of the frame.

expand_ratio float (Optional)

When mask_type is set to tracking, this parameter takes effect and represents the ratio for expanding the mask area outward.

The value range is [0.0, 1.0], with a default value of 0.05. The default value is recommended.

The smaller the value, the more the mask area fits the target object. The larger the value, the wider the expansion range of the mask area.

expand_mode string (Optional)

When mask_type is set to tracking, this parameter takes effect and represents the shape of the mask area.

The algorithm will generate a mask video with the corresponding shape based on the input mask image according to the selected expand_mode. Supported values include the following:

  • hull (default): Polygon mode, indicating that a polygon is used to wrap the mask target.

  • bbox: Bounding box mode, indicating that a rectangle is used to wrap the mask target.

  • original: Original mode, indicating that the shape is kept as close as possible to the original mask target.

size string (Optional)

The resolution of the generated video (width*height). Currently supports generating 720P videos, with the following resolution values:

  • 1280*720 (default): Video aspect ratio is 16:9. Where 1280 represents the width and 720 represents the height.

  • 720*1280: Video aspect ratio is 9:16.

  • 960*960: Video aspect ratio is 1:1.

  • 832*1088: Video aspect ratio is 3:4.

  • 1088*832: Video aspect ratio is 4:3.

duration integer (Optional)

Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video.

prompt_extend bool (Optional)

Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time.

  • false (default) (Recommended)

  • true

When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy.

seed integer (Optional)

Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647].

If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed.

watermark bool (Optional)

Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated".

  • false (default)

  • true

Video extension

model string (Required)

Model name. Example value: wan2.1-vace-plus.

input object (Required)

Basic input information, such as prompts.

Properties

prompt string (Required)

The prompt that describes the elements and visual characteristics expected in the generated video.

Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated.

For prompt tips, see Video generation prompt guide.

function string (Required)

The feature name, set to video_extension.

Video extension supports generating continuous content based on images or videos, and also supports extracting dynamic features (such as actions, and compositions) from reference videos to guide the generation of videos with similar motion performance.

The total duration of the extended video is 5 seconds. Note: this refers to the complete duration of the final output video being 5 seconds, not extending the original video by 5 seconds.

first_frame_url string (Optional)

The URL of the first frame image.

The URL must be publicly accessible and support HTTP or HTTPS protocols.

Image restrictions:

  • Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP.

  • Image resolution: The width and height must be within [360, 2000] pixels.

  • File size: Not exceeding 10 MB.

  • URL addresses cannot contain Chinese characters.

last_frame_url string (Optional)

The URL of the last frame image. The URL must be publicly accessible and support HTTP or HTTPS protocols.

Image restrictions:

  • Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP.

  • Image resolution: The width and height must be within [360, 2000] pixels.

  • File size: Not exceeding 10 MB.

  • URL addresses cannot contain Chinese characters.

first_clip_url string (Optional)

The URL of the first segment video. The URL must be publicly accessible and support HTTP or HTTPS protocols.

Video restrictions:

  • Video format: MP4.

  • Frame rate: Greater than or equal to 16 FPS. When first_clip_url and last_clip_url are used together, the frame rates of the two segments should be consistent.

  • File size: Not exceeding 50 MB.

  • Video length: Not exceeding 3 seconds, otherwise only the first 3 seconds will be used. When both first_clip_url and last_clip_url are provided, the total duration of the two video segments should not exceed 3 seconds.

  • URL addresses cannot contain Chinese characters.

About the output video resolution:

  • If the input video resolution is ≤ 720P, the output will maintain the original resolution;

  • If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.

last_clip_url string (Optional)

The URL of the last segment video. The URL must be publicly accessible and support HTTP or HTTPS protocols.

Video restrictions:

  • Video format: MP4.

  • Frame rate: Greater than or equal to 16 FPS. When first_clip_url and last_clip_url are used together, the frame rates of the two segments should be consistent.

  • File size: Not exceeding 50 MB.

  • Video length: Not exceeding 3 seconds, otherwise only the first 3 seconds will be used. When both first_clip_url and last_clip_url are provided, the total duration of the two video segments should not exceed 3 seconds.

  • URL addresses cannot contain Chinese characters.

About the output video resolution:

  • If the input video resolution is ≤ 720P, the output will maintain the original resolution;

  • If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.

video_url string (Optional)

The URL of the input video. The URL must be publicly accessible and support HTTP or HTTPS protocols.

This video is mainly used to extract motion features, and is used together with first_frame_url, last_frame_url, first_clip_url, and last_clip_url parameters to guide the generation of extended videos with similar motion performance.

Video restrictions:

  • Video format: MP4.

  • Frame rate: Greater than or equal to 16 FPS, consistent with the first and last segments.

  • Video resolution: Must match the first and last frames, and the first and last segments.

  • File size: Not exceeding 50 MB.

  • Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used.

  • URL addresses cannot contain Chinese characters.

parameters object (Optional)

Video processing parameters, such as the output video resolution.

Properties

control_condition string (Optional)

Sets the method for video feature extraction. Required when video_url is provided. The default is "", indicating no extraction.

  • posebodyface: Extracts facial expressions and body movements of the subject in the input video.

  • depth: Extracts the composition and motion outline of the input video.

duration integer (Optional)

Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video.

prompt_extend bool (Optional)

Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time.

  • false (default) (Recommended)

  • true

When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy.

seed integer (Optional)

Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647].

If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed.

watermark bool (Optional)

Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated".

  • false (default)

  • true

Video outpainting

model string (Required)

Model name. Example value: wan2.1-vace-plus.

input object (Required)

Basic input information, such as prompts.

Properties

prompt string (Required)

The prompt that describes the elements and visual characteristics expected in the generated video.

Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated.

For prompt tips, see Video generation prompt guide.

function string (Required)

Feature name. For video outpainting, set to video_outpainting.

Video outpainting supports proportionally extending the video in the top, bottom, left, and right directions.

video_url string (Required)

The URL address of the input video.

The URL must be publicly accessible and support HTTP or HTTPS protocols.

Video restrictions:

  • Video format: MP4.

  • Frame rate: Greater than or equal to 16 FPS.

  • File size: Not exceeding 50 MB.

  • Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used.

  • URL addresses cannot contain Chinese characters.

About the output video resolution:

  • If the input video resolution is ≤ 720P, the output will maintain the original resolution;

  • If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.

About the output video duration:

  • The output video duration matches the input video, but not exceeding 5 seconds.

  • Example: If the input video is 3 seconds, the output will also be 3 seconds. If the input is 6 seconds, the output will be the first 5 seconds.

parameters object (Optional)

Video processing parameters, such as setting the expansion ratio.

Properties

top_scale float (Optional)

Centers the video frame and extends it upward by the specified ratio.

The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension.

bottom_scale float (Optional)

Centers the video frame and extends it downward by the specified ratio.

The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension.

left_scale float (Optional)

Centers the video frame and extends it to the left by the specified ratio.

The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension.

right_scale float (Optional)

Centers the video frame and extends it to the right by the specified ratio.

The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension.

duration integer (Optional)

Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video.

prompt_extend bool (Optional)

Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time.

  • false (default) (Recommended)

  • true

When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy.

seed integer (Optional)

Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647].

If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed.

watermark bool (Optional)

Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated".

  • false (default)

  • true

Response parameters

Successful response

{
    "output": {
        "task_status": "PENDING",
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
}

Error response

{
    "code":"InvalidApiKey",
    "message":"Invalid API-key provided.",
    "request_id":"fb53c4ec-1c12-4fc4-a580-xxxxxx"
}

output object

Task output information.

Properties

task_id string

The task ID.

task_status string

The task status.

Valid values

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN

request_id string

The request ID. It can be used for tracing and troubleshooting.

code string

The error code for a failed request. This parameter is not returned when the request is successful.

message string

The error message for a failed request. This parameter is not returned when the request is successful.

Step 2: Query results by task ID

GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

Request parameters

Query task results

Replace 86ecf553-d340-4e21-xxxxxxxxx with the actual task_id.

curl -X GET \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
https://dashscope-intl.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx
Headers

Authorization string (Required)

The identity authentication for the request, which is the Model Studio API key. Example: Bearer d1xxx2a.

Path parameters

task_id string (Required)

The task ID.

Response parameters

Task succeeded

Task data (such as task status and video URL) will be automatically deleted after 24 hours. Save the generated videos promptly.

{
    "request_id": "851985d0-fbba-9d8d-a17a-xxxxxx",
    "output": {
        "task_id": "208e2fd1-fcb4-4adf-9fcc-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-05-15 16:14:44.723",
        "scheduled_time": "2025-05-15 16:14:44.750",
        "end_time": "2025-05-15 16:20:09.389",
        "video_url": "https://dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com/xxx.mp4?xxxxxx",
        "orig_prompt": "In the video, a girl gracefully emerges from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every fluid moment. When the girl stands still and looks around at the lush trees, her face lights up with a smile that blends surprise and joy. This moment, frozen in the interplay of light and shadow, captures the girl's wonderful encounter with nature.",
        "actual_prompt": "A girl in a light-colored long dress slowly emerges from the depths of an ancient forest shrouded in morning mist, her steps as graceful as a dance. Her hair is slightly curly, her features delicate, and her eyes bright. The camera follows her movements, capturing every spirited moment. As she comes to a stop and turns to look around at the lush trees, a smile blending surprise and joy spreads across her face. Sunlight filters through the leaves, casting dappled shadows, freezing this beautiful moment of harmony between humanity and nature. The visual style is fresh and naturalistic photography, combining mid-range and wide shots with a slight moving camera from an eye-level perspective."
    },
    "usage": {
        "video_duration": 5,
        "video_ratio": "standard",
        "video_count": 1
    }
}

Task failed

If the task fails for some reason, the task status will be set to FAILED, and the code and message fields will show the reason.

{
    "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d",
    "output": {
        "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010",
        "task_status": "FAILED",
        "code": "InvalidParameter",
        "message": "The size is not match xxxxxx"
    }
}

output object

Task output information.

Properties

task_id string

The task ID.

task_status string

The task status.

Valid values

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN

submit_time string

The time when the task was submitted.

scheduled_time string

The execution duration of the task.

end_time string

The time when the task was completed.

video_url string

The video URL. The link is valid for 24 hours and can be used to download the video.

orig_prompt string

The original prompt.

actual_prompt string

The prompt actually used after rewriting. This field is returned only if prompt rewriting is enabled.

code string

The error code for a failed request. This parameter is not returned when the request is successful.

message string

The error message for a failed request. This parameter is not returned when the request is successful.

usage object

Output statistics. Only counts successful results.

Properties

video_duration integer

The duration of the generated video, in seconds.

video_ratio string

The aspect ratio of the generated video. Fixed as standard.

video_count integer

The number of generated videos.

request_id string

The request ID. It can be used for tracing and troubleshooting.

Error codes

If the call failed and an error message is returned, see Error messages.

Specific status codes for this API:

HTTP status code

code

message

Description

400

InvalidParameter

InvalidParameter

The request parameter is invalid.

400

IPInfringementSuspect

Input data is suspected of being involved in IP infringement.

The input data (such as prompts or images) may involve intellectual property infringement. Check your input and ensure it does not contain such content.

400

DataInspectionFailed

Input data may contain inappropriate content.

The input data (such as prompts or images) may contain sensitive content. Check your input and try again.

500

InternalError

InternalError

Service exception. Try again first to rule out occasional issues.

Video access configuration

Configure domain whitelist: Ensure your business system can access video links

Generated videos are stored in OSS, and each video is assigned an OSS link, such as https://dashscope-result-xx.oss-cn-xxxx.aliyuncs.com/xxx.mp4. OSS links allow public access. You can use this link to download the video. The link is valid for only 24 hours.

If your business has high security requirements and cannot access OSS links, you need to configure a separate whitelist for external network access Add the following domains to your whitelist to access video links.

# OSS domain list
dashscope-result-bj.oss-cn-beijing.aliyuncs.com
dashscope-result-hz.oss-cn-hangzhou.aliyuncs.com
dashscope-result-sh.oss-cn-shanghai.aliyuncs.com
dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com
dashscope-result-zjk.oss-cn-zhangjiakou.aliyuncs.com
dashscope-result-sz.oss-cn-shenzhen.aliyuncs.com
dashscope-result-hy.oss-cn-heyuan.aliyuncs.com
dashscope-result-cd.oss-cn-chengdu.aliyuncs.com
dashscope-result-gz.oss-cn-guangzhou.aliyuncs.com
dashscope-result-wlcb-acdr-1.oss-cn-wulanchabu-acdr-1.aliyuncs.com