All Products
Search
Document Center

Alibaba Cloud Model Studio:Wan - general video editing API reference

Last Updated:Mar 15, 2026

The Wan general video editing model supports multiple input modalities, such as text, images, and videos, and can perform various video generation and editing tasks.

References: User guide

Availability

Use matching regions for model, endpoint URL, and API key.

Note

Sample codes in this topic apply to the Singapore region.

HTTP

Video generation tasks take 5-10 minutes. The API uses async invocation: "Create task → Poll result".

Step 1: Create a task

Singapore

POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Beijing

POST https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Request parameters

Multi-image reference

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "image_reference",
        "prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When she stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.",
        "ref_images_url": [
            "http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
            "http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
        ]
    },
    "parameters": {
        "prompt_extend": true,
        "obj_or_bg": ["obj","bg"],
        "size": "1280*720"
    }
}'

Video repainting

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_repainting",
        "prompt": "The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
    },
    "parameters": {
        "prompt_extend": false,
        "control_condition": "depth"
    }
}'

Local editing

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_edit",
        "prompt": "The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.",
        "mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
        "mask_frame_id": 1
    },
    "parameters": {
        "prompt_extend": false,
        "mask_type": "tracking",
        "expand_ratio": 0.05
    }
}'

Video extension

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_extension",
        "prompt": "A dog wearing sunglasses is skateboarding on the street, 3D cartoon.",
        "first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
    },
    "parameters": {
        "prompt_extend": false
    }
}'

Video outpainting

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host
The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "wan2.1-vace-plus",
    "input": {
        "function": "video_outpainting",
        "prompt": "An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.",
        "video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
    },
    "parameters": {
        "prompt_extend": false,
        "top_scale": 1.5,
        "bottom_scale": 1.5,
        "left_scale": 1.5,
        "right_scale": 1.5
    }
}'
Request headers

Content-Type string (Required)

The content type of the request. Must be application/json.

Authorization string (Required)

The authentication credentials using a Model Studio API key.

Example: Bearer sk-xxxx

X-DashScope-Async string (Required)

Enables asynchronous processing. Must be enable as HTTP requests support only asynchronous processing.

Important

Returns "current user api does not support synchronous calls" error if not included.

Request body

Multi-image reference

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text describing elements and visual features for the video. Supports Chinese and English. Each character, letter, and punctuation counts as one. Text exceeding the limit is truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

Feature name. For multi-image reference, set to image_reference.

Supports up to 3 reference images. Image content can include entities (people, animals, clothing) and backgrounds (scenes). Use prompt to describe desired video content. The model merges images into a coherent video.

ref_images_url array[string] (Required)

Reference image URLs.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Provide 1-3 reference images. If more than 3 are provided, only the first 3 are used.

Requirements for reference images:

  • Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Resolution: Both width and height must be between 360 and 2,000 pixels.

  • Size: Maximum 10 MB.

  • The URL must not contain Chinese characters.

Suggestions:

  • For entity images, use one entity per image with a solid-color background (white or single color).

  • For background images, use at most one, and it should not contain entity objects.

parameters object (Optional)

The video processing parameters, such as watermark settings.

Properties

obj_or_bg array[string] (Optional)

Identifies each reference image's purpose (corresponds one-to-one with ref_images_url). Each element indicates whether the corresponding image is an "entity" or "background":

  • obj: Entity reference.

  • bg: Background reference. Maximum of one background reference is allowed.

Instructions:

  • Recommended. Length must match ref_images_url or an error is reported.

  • Omit only when ref_images_url has one element (defaults to ["obj"]).

Example: ["obj", "obj", "bg"].

size string (Optional)

Video resolution in width*height format. Available values:

  • 1280*720 (default): 16:9.

  • 720*1280: 9:16.

  • 960*960: 1:1.

  • 832*1088: 3:4.

  • 1088*832: 4:3.

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Whether to enable prompt rewriting. An LLM rewrites the input prompt, improving quality for short prompts but increasing processing time.

  • true (default)

  • false

seed integer (Optional)

Random seed controlling generation randomness. Range: [0, 2147483647].

Auto-generated if omitted. Use the same seed for consistent results.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in lower-right corner).

  • false (default)

  • true

Video Re-rendering

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text describing elements and visual features for the video. Supports Chinese and English. Each character, letter, and punctuation counts as one. Text exceeding the limit is truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

Feature name. For video repainting, set this to video_repainting.

Extracts entity pose, actions, composition, motion contours, and sketch structure from an input video, then combines with a text prompt to generate a new video with the same dynamic features. You can also replace entities with reference images to change appearance while retaining actions.

video_url string (Required)

URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Requirements for input videos:

  • Format: MP4.

  • Frame rate: 16 FPS or higher.

  • Size: Maximum 50 MB.

  • Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

About the output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URLs for the input reference images.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Only 1 reference image is supported. We recommend that this image be an entity image used to replace the entity in the input video.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

parameters object (Required)

The video processing parameters, such as watermark settings.

Properties

control_condition string (Required)

Sets the method for video feature extraction.

  • posebodyface: Extracts the facial expressions and body movements of the entity in the input video. This is suitable for scenarios that require the preservation of facial details.

  • posebody: Extracts an entity's body movements from the input video, excluding facial expressions. Use this for scenarios where you need to control only body movements.

  • depth: Extracts the composition and motion contours from the input video.

  • scribble: Extracts the sketch structure from the input video.

strength float (Optional)

Adjusts the control strength that the control_condition feature extraction method applies to the generated video.

The default value is 1.0. The value range is [0.0, 1.0].

A larger value makes the generated video closer to the original video's actions and composition. A smaller value allows for more creative freedom.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling generation randomness. Range: [0, 2147483647].

Auto-generated if omitted. Use the same seed for consistent results.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in lower-right corner).

  • false (default)

  • true

Local editing

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text describing elements and visual features for the video. Supports Chinese and English. Each character, letter, and punctuation counts as one. Text exceeding the limit is truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. For local editing, set the value to video_edit.

Local editing lets you add, modify, or delete elements in a specified area of an input video. You can also replace the entity or background in the editing area to achieve fine-grained video editing.

video_url string (Required)

URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Requirements for input videos:

  • Format: MP4.

  • Frame rate: 16 FPS or higher.

  • Size: Maximum 50 MB.

  • Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

About the output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

ref_images_url array[string] (Optional)

An array of URLs for the input reference images.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Currently, only 1 reference image is supported. This image can be used as an entity or background to replace the corresponding content in the input video.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

Suggestions:

  • If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.

  • If you use the background from a reference image, the background image should not contain any entity objects.

mask_image_url string (Optional)

The URL of the mask image.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

This parameter specifies the video editing area. You must specify either this parameter or the mask_video_url parameter. We recommend this parameter.

The white area of the mask image (with a pixel value of exactly [255, 255, 255]) indicates the area to edit. The black area (with a pixel value of exactly [0, 0, 0]) indicates the area to preserve.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: Must be exactly the same as the input video (video_url) resolution.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

mask_frame_id integer (Optional)

This parameter is used only when mask_image_url is not empty. It identifies the frame in which the masked object appears, represented by a "frame ID".

The default value is 1, which indicates the first frame of the video.

The value must be in the range [1, max_frame_id], where max_frame_id = input video frame rate × input video duration + 1.

For example, for an input video (video_url) with a frame rate of 16 FPS and a duration of 5 seconds, the total number of frames is 81 (16*5 + 1). Therefore, the value of max_frame_id is 81.

mask_video_url string (Optional)

The URL of the mask video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

This parameter specifies the area of the video to edit. You must specify either this parameter or the mask_image_url parameter.

The mask video must have the same video format, frame rate, resolution, and length as the input video (video_url).

The white area of the mask video (with a pixel value of exactly [255, 255, 255]) indicates the area to edit. The black area (with a pixel value of exactly [0, 0, 0]) indicates the area to preserve.

parameters object (Optional)

The video processing parameters, such as watermark settings.

Properties

control_condition string (Optional)

Sets the method for video feature extraction. The default value is "", which means no extraction is performed.

  • posebodyface: Extracts the facial expressions and body movements of the entity in the input video. This is suitable for scenarios where the entity's face is large in the frame and has clearly visible features.

  • depth: Extracts the composition and motion contours from the input video.

mask_type string (Optional)

This parameter is effective only when mask_image_url is not empty. It specifies the behavior of the editing area.

  • tracking (default): The editing area dynamically follows the trajectory of the target object. This mode is suitable for scenes with moving objects.

  • fixed: The editing area remains fixed and does not change with the video content.

expand_ratio float (Optional)

When mask_type is set to tracking, this parameter applies and specifies the outward expansion ratio of the mask area.

The value range is [0.0, 1.0]. The default value is 0.05. We recommend using the default value.

A smaller value makes the mask area fit the target object more closely. A larger value expands the mask area more widely.

expand_mode string (Optional)

When mask_type is set to tracking, this parameter applies and specifies the shape of the mask area.

The algorithm generates a mask video with a corresponding shape based on the input mask image and the selected expand_mode. The following values are supported:

  • hull (default): Polygon mode. A polygon wraps the masked object.

  • bbox: Bounding box mode. A rectangle wraps the masked object.

  • original: Raw mode. Preserves the shape of the original masked object as much as possible.

size string (Optional)

Video resolution in width*height format. Available values:

  • 1280*720 (default): 16:9.

  • 720*1280: 9:16.

  • 960*960: 1:1.

  • 832*1088: 3:4.

  • 1088*832: 4:3.

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling generation randomness. Range: [0, 2147483647].

Auto-generated if omitted. Use the same seed for consistent results.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in lower-right corner).

  • false (default)

  • true

Video extension

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text describing elements and visual features for the video. Supports Chinese and English. Each character, letter, and punctuation counts as one. Text exceeding the limit is truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

The name of the feature. The video extension is set to video_extension.

Video extension generates continuous content based on an image or video. It also extracts dynamic features, such as actions and composition, from a reference video to guide the generation of a video with similar motion.

The total duration of the extended video is 5 seconds. Note: This is the total duration of the final output video, not an additional 5-second extension to the original video.

first_frame_url string (Optional)

The URL of the first frame image.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

last_frame_url string(Optional)

The URL of the last frame image.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Image requirements:

  • Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • Image size: Must not exceed 10 MB.

  • The URL must not contain Chinese characters.

first_clip_url string (Optional)

The URL of the first video segment.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher. If you use both first_clip_url and last_clip_url, we recommend that the two clips have the same frame rate.

  • Video size: Must not exceed 50 MB.

  • Video duration: Must not exceed 3 seconds. If the duration is longer, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration must not exceed 3 seconds.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

last_clip_url string(Optional)

The URL of the last video segment.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://help-static-aliyun-doc.aliyuncs.com/xxx.mp3.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher. When using first_clip_url and last_clip_url together, we recommend that the two clips have the same frame rate.

  • Video size: Must not exceed 50 MB.

  • Video duration: Must not exceed 3 seconds. If the duration is longer, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration must not exceed 3 seconds.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

video_url string (Optional)

The URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://help-static-aliyun-doc.aliyuncs.com/xxx.mp3.

Used to extract motion features. Combine with first_frame_url, last_frame_url, first_clip_url, and last_clip_url to guide extended video generation with similar motion.

Video requirements:

  • Video format: MP4.

  • Video frame rate: 16 FPS or higher, consistent with the preceding and succeeding clips.

  • Video resolution: Consistent with the preceding and succeeding frames and clips.

  • Video size: Must not exceed 50 MB.

  • Video duration: Must not exceed 5 seconds. If the duration is longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

parameters object (Optional)

The video processing parameters, such as the output video resolution.

Properties

control_condition string (Optional)

Sets the method for video feature extraction. This is required when `video_url` is provided. The default value is "", which means no extraction is performed.

  • posebodyface: Extracts an entity's facial expressions and body movements from the input video.

  • depth: Extracts the composition and motion contours from the input video.

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling generation randomness. Range: [0, 2147483647].

Auto-generated if omitted. Use the same seed for consistent results.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in lower-right corner).

  • false (default)

  • true

Video outpainting

model string (Required)

The model name. Example: wan2.1-vace-plus.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Required)

Text describing elements and visual features for the video. Supports Chinese and English. Each character, letter, and punctuation counts as one. Text exceeding the limit is truncated.

For more information about prompt techniques, see Video generation prompt guide.

function string (Required)

Specifies the feature to use. For video outpainting, set this parameter to video_outpainting.

Allows proportional video extension in up, down, left, and right directions.

video_url string (Required)

URL of the input video.

  1. Public URL:

    • Supports the HTTP and HTTPS protocols.

    • Example: https://xxx/xxx.mp3.

Requirements for input videos:

  • Format: MP4.

  • Frame rate: 16 FPS or higher.

  • Size: Maximum 50 MB.

  • Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used.

  • The URL must not contain Chinese characters.

About the output video resolution:

  • If the input video resolution is 720P or lower, the output retains the original resolution.

  • If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.

About the output video duration:

  • The output video has the same duration as the input video, up to a maximum of 5 seconds.

  • Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.

parameters object (Optional)

The video processing parameters, such as the scaling ratio.

Properties

top_scale float (Optional)

Centers the video frame and scales the video upward proportionally. Range: [1.0, 2.0]. Default: 1.0 (no scaling).

bottom_scale float (Optional)

Centers the video frame and scales the video downward proportionally. Range: [1.0, 2.0]. Default: 1.0 (no scaling).

left_scale float (Optional)

Centers the video frame and scales the video to the left proportionally. Range: [1.0, 2.0]. Default: 1.0 (no scaling).

right_scale float (Optional)

Centers the video frame and scales the video to the right proportionally. Range: [1.0, 2.0]. Default: 1.0 (no scaling).

duration integer (Optional)

Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video.

prompt_extend bool (Optional)

Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time.

  • true (default)

  • false (Recommended)

If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt.

seed integer (Optional)

Random seed controlling generation randomness. Range: [0, 2147483647].

Auto-generated if omitted. Use the same seed for consistent results.

watermark bool (Optional)

Whether to add a watermark ("AI Generated" in lower-right corner).

  • false (default)

  • true

Response parameters

Successful response

Save the task_id to query the task status and result.

{
    "output": {
        "task_status": "PENDING",
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
}

Error response

Task creation failed. See error codes to resolve the issue.

{
    "code": "InvalidApiKey",
    "message": "No API-key provided.",
    "request_id": "7438d53d-6eb8-4596-8835-xxxxxx"
}

output object

The task output information.

Properties

task_id string

The ID of the task. Can be used to query the task for up to 24 hours.

task_status string

The status of the task.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: Task does not exist or status is unknown

request_id string

Unique identifier for the request. Use for tracing and troubleshooting issues.

code string

The error code. Returned only when the request fails. See error codes for details.

message string

Detailed error message. Returned only when the request fails. See error codes for details.

Step 2: Query the result

Singapore

GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

Beijing

GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Request parameters

Query task result

Replace {task_id} with the task_id value returned by the previous API call. task_id is valid for queries within 24 hours.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Request headers

Authorization string (Required)

The authentication credentials using a Model Studio API key.

Example: Bearer sk-xxxx

URL path parameters

task_id string (Required)

The ID of the task to query.

Response parameters

Task succeeded

Task data is retained for 24 hours, then auto-deleted. Save videos promptly.

{
    "request_id": "851985d0-fbba-9d8d-a17a-xxxxxx",
    "output": {
        "task_id": "208e2fd1-fcb4-4adf-9fcc-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-05-15 16:14:44.723",
        "scheduled_time": "2025-05-15 16:14:44.750",
        "end_time": "2025-05-15 16:20:09.389",
        "video_url": "https://dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com/xxx.mp4?xxxxxx",
        "orig_prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When the girl stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.",
        "actual_prompt": "A girl in a light-colored long dress slowly walks out from a misty, ancient forest, her steps as light as a dance. She has slightly curly long hair, a delicate face, and bright eyes. The camera follows her movements, capturing every nimble moment. When she stops, turns, and looks around at the lush woods, a smile of surprise and joy blossoms on her face. Sunlight filters through the leaves, casting mottled shadows and freezing this beautiful moment of harmony between human and nature. The style is a fresh and natural portrait, combining medium and full shots with a level perspective and slight camera movement."
    },
    "usage": {
        "video_duration": 5,
        "video_ratio": "standard",
        "video_count": 1
    }
}

Task failed

When a task fails, task_status is set to FAILED with an error code and message. See error codes to resolve the issue.

{
    "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d",
    "output": {
        "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010",
        "task_status": "FAILED",
        "code": "InvalidParameter",
        "message": "The size is not match xxxxxx"
    }
}

output object

The task output information.

Properties

task_id string

The ID of the task. Can be used to query the task for up to 24 hours.

task_status string

The status of the task.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: Task does not exist or status is unknown

submit_time string

The time when the task was submitted. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

scheduled_time string

The time when the task started running. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

end_time string

The time when the task was completed. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

video_url string

Video URL (valid for 24 hours). Format: MP4 (H.264 encoding).

orig_prompt string

Original input prompt.

actual_prompt string

Actual prompt used after prompt rewriting is enabled. If prompt rewriting is disabled, this field is not returned.

code string

The error code. Returned only when the request fails. See error codes for details.

message string

Detailed error message. Returned only when the request fails. See error codes for details.

usage object

Output statistics, counted only for successful tasks.

Properties

video_duration integer

Duration of the generated video in seconds.

video_ratio string

Aspect ratio of the generated video. Fixed at standard.

video_count integer

Number of generated videos.

request_id string

Unique identifier for the request. Use for tracing and troubleshooting issues.

Limitations

  • Data validity: task_id and video_url are retained for 24 hours. After expiration, they cannot be queried or downloaded.

  • Audio support: The model generates silent videos. To add audio, use speech synthesis.

Error codes

If a model call fails and returns an error message, see Error messages for troubleshooting.

FAQ

Q: How do I get a whitelist for video storage access domain names?

A: Videos generated by models are stored in OSS. The API returns a temporary public URL. To configure a firewall whitelist for this download URL, note the following: The underlying storage may change dynamically. This topic does not provide a fixed OSS domain name whitelist to prevent access issues caused by outdated information. If you have security control requirements, contact your account manager to obtain the latest OSS domain name list.