Request parameters | Multi-image referenceThe API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "image_reference",
"prompt": "In the video, a girl gracefully walks out from a misty, ancient forest. Her steps are light, and the camera captures her every nimble moment. When she stops and looks around at the lush woods, a smile of surprise and joy blossoms on her face. This scene, frozen in a moment of interplay between light and shadow, records her wonderful encounter with nature.",
"ref_images_url": [
"http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
"http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
]
},
"parameters": {
"prompt_extend": true,
"obj_or_bg": ["obj","bg"],
"size": "1280*720"
}
}'
Video repaintingThe API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_repainting",
"prompt": "The video shows a black steampunk-style car driven by a gentleman. The car is decorated with gears and copper pipes. The background features a steam-powered candy factory and retro elements, creating a vintage and playful scene.",
"video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
},
"parameters": {
"prompt_extend": false,
"control_condition": "depth"
}
}'
Local editingThe API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_edit",
"prompt": "The video shows a Parisian-style French cafe where a lion in a suit is elegantly sipping coffee. It holds a coffee cup in one hand, taking a gentle sip with a relaxed expression. The cafe is tastefully decorated, with soft hues and warm lighting illuminating the area where the lion is.",
"mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
"video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
"mask_frame_id": 1
},
"parameters": {
"prompt_extend": false,
"mask_type": "tracking",
"expand_ratio": 0.05
}
}'
Video extensionThe API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_extension",
"prompt": "A dog wearing sunglasses is skateboarding on the street, 3D cartoon.",
"first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
},
"parameters": {
"prompt_extend": false
}
}'
Video outpaintingThe API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key and API host The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_outpainting",
"prompt": "An elegant lady is passionately playing the violin, with a full symphony orchestra behind her.",
"video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
},
"parameters": {
"prompt_extend": false,
"top_scale": 1.5,
"bottom_scale": 1.5,
"left_scale": 1.5,
"right_scale": 1.5
}
}'
|
Multi-image referencemodel string (Required) The model name. Example: wan2.1-vace-plus. input object (Required) The basic input information, such as the prompt. Properties prompt string (Required) Text prompt describing the elements and visual features for the generated video. Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated. For more information about prompt techniques, see Video generation prompt guide. function string (Required) Feature name. For multi-image reference, set this to image_reference. The multi-image reference feature supports up to three reference images. The image content can include entities and backgrounds, such as people, animals, clothing, and scenes. Use the prompt parameter to describe the desired video content. The model can then merge multiple images to generate a coherent video. ref_images_url array[string] (Required) URLs for input reference images. Public URL:
You can provide 1 to 3 reference images. If you provide more than 3 images, only the first 3 are used as input. Requirements for reference images: Format: JPG, JPEG, PNG, BMP, TIFF, or WEBP. Resolution: Both width and height must be between 360 and 2,000 pixels. Size: Maximum 10 MB. The URL must not contain Chinese characters.
Suggestions: If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity. If you use the background from a reference image, you can use at most one background image, and it should not contain any entity objects.
| parameters object (Optional) The video processing parameters, such as watermark settings. Properties obj_or_bg array[string] (Optional) Identifies the purpose of each reference image and corresponds one-to-one with the ref_images_url parameter. Each element indicates whether the corresponding image is an "entity" or "background": Instructions: We recommend that you provide this parameter. Its length must be the same as ref_images_url. Otherwise, an error is reported. You can omit this parameter only when ref_images_url is a single-element array. In this case, the default value is ["obj"].
Example: ["obj", "obj", "bg"]. size string (Optional) Video resolution in width*height format. Available values: duration integer (Optional) Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video. prompt_extend bool (Optional) Whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases processing time. seed integer (Optional) Random seed controlling the randomness of generated content. Value range: [0, 2147483647]. If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value. watermark bool (Optional) Whether to add a watermark ("AI Generated" in the lower-right corner). |
Video Re-renderingmodel string (Required) The model name. Example: wan2.1-vace-plus. input object (Required) The basic input information, such as the prompt. Properties prompt string (Required) Text prompt describing the elements and visual features for the generated video. Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated. For more information about prompt techniques, see Video generation prompt guide. function string (Required) Feature name. For video repainting, set this to video_repainting. Video repainting extracts entity pose and actions, composition and motion contours, and sketch structure from an input video, then combines this with a text prompt to generate a new video with the same dynamic features. You can also replace the entity in the original video with a reference image, for example, to change a character's appearance while retaining the original actions. video_url string (Required) URL of the input video. Public URL:
Requirements for input videos: Format: MP4. Frame rate: 16 FPS or higher. Size: Maximum 50 MB. Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used. The URL must not contain Chinese characters.
About the output video resolution: If the input video resolution is 720P or lower, the output retains the original resolution. If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.
About the output video duration: The output video has the same duration as the input video, up to a maximum of 5 seconds. Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.
ref_images_url array[string] (Optional) An array of URLs for the input reference images. Public URL:
Only 1 reference image is supported. We recommend that this image be an entity image used to replace the entity in the input video. Image requirements: Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP. Image resolution: The width and height must be between 360 and 2,000 pixels. Image size: Must not exceed 10 MB. The URL must not contain Chinese characters.
Suggestions: If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity.
| parameters object (Required) The video processing parameters, such as watermark settings. Properties control_condition string (Required) Sets the method for video feature extraction. posebodyface: Extracts the facial expressions and body movements of the entity in the input video. This is suitable for scenarios that require the preservation of facial details.
posebody: Extracts an entity's body movements from the input video, excluding facial expressions. Use this for scenarios where you need to control only body movements.
depth: Extracts the composition and motion contours from the input video.
scribble: Extracts the sketch structure from the input video.
strength float (Optional) Adjusts the control strength that the control_condition feature extraction method applies to the generated video. The default value is 1.0. The value range is [0.0, 1.0]. A larger value makes the generated video closer to the original video's actions and composition. A smaller value allows for more creative freedom. prompt_extend bool (Optional) Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time. true (default)
false (Recommended)
If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt. seed integer (Optional) Random seed controlling the randomness of generated content. Value range: [0, 2147483647]. If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value. watermark bool (Optional) Whether to add a watermark ("AI Generated" in the lower-right corner). |
Local editingmodel string (Required) The model name. Example: wan2.1-vace-plus. input object (Required) The basic input information, such as the prompt. Properties prompt string (Required) Text prompt describing the elements and visual features for the generated video. Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated. For more information about prompt techniques, see Video generation prompt guide. function string (Required) The name of the feature. For local editing, set the value to video_edit. Local editing lets you add, modify, or delete elements in a specified area of an input video. You can also replace the entity or background in the editing area to achieve fine-grained video editing. video_url string (Required) URL of the input video. Public URL:
Requirements for input videos: Format: MP4. Frame rate: 16 FPS or higher. Size: Maximum 50 MB. Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used. The URL must not contain Chinese characters.
About the output video resolution: If the input video resolution is 720P or lower, the output retains the original resolution. If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.
About the output video duration: The output video has the same duration as the input video, up to a maximum of 5 seconds. Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.
ref_images_url array[string] (Optional) An array of URLs for the input reference images. Public URL:
Currently, only 1 reference image is supported. This image can be used as an entity or background to replace the corresponding content in the input video. Image requirements: Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP. Image resolution: The width and height must be between 360 and 2,000 pixels. Image size: Must not exceed 10 MB. The URL must not contain Chinese characters.
Suggestions: If you use an entity from a reference image, we recommend that each image contain only one entity. The background should be a solid color, such as white or a single color, to better highlight the entity. If you use the background from a reference image, the background image should not contain any entity objects.
mask_image_url string (Optional) The URL of the mask image. Public URL:
This parameter specifies the video editing area. You must specify either this parameter or the mask_video_url parameter. We recommend this parameter. The white area of the mask image (with a pixel value of exactly [255, 255, 255]) indicates the area to edit. The black area (with a pixel value of exactly [0, 0, 0]) indicates the area to preserve. Image requirements: Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP. Image resolution: Must be exactly the same as the input video (video_url) resolution. Image size: Must not exceed 10 MB. The URL must not contain Chinese characters.
mask_frame_id integer (Optional) This parameter is used only when mask_image_url is not empty. It identifies the frame in which the masked object appears, represented by a "frame ID". The default value is 1, which indicates the first frame of the video. The value must be in the range [1, max_frame_id], where max_frame_id = input video frame rate × input video duration + 1. For example, for an input video (video_url) with a frame rate of 16 FPS and a duration of 5 seconds, the total number of frames is 81 (16*5 + 1). Therefore, the value of max_frame_id is 81. mask_video_url string (Optional) The URL of the mask video. Public URL:
This parameter specifies the area of the video to edit. You must specify either this parameter or the mask_image_url parameter. The mask video must have the same video format, frame rate, resolution, and length as the input video (video_url). The white area of the mask video (with a pixel value of exactly [255, 255, 255]) indicates the area to edit. The black area (with a pixel value of exactly [0, 0, 0]) indicates the area to preserve. | parameters object (Optional) The video processing parameters, such as watermark settings. Properties control_condition string (Optional) Sets the method for video feature extraction. The default value is "", which means no extraction is performed. posebodyface: Extracts the facial expressions and body movements of the entity in the input video. This is suitable for scenarios where the entity's face is large in the frame and has clearly visible features.
depth: Extracts the composition and motion contours from the input video.
mask_type string (Optional) This parameter is effective only when mask_image_url is not empty. It specifies the behavior of the editing area. tracking (default): The editing area dynamically follows the trajectory of the target object. This mode is suitable for scenes with moving objects.
fixed: The editing area remains fixed and does not change with the video content.
expand_ratio float (Optional) When mask_type is set to tracking, this parameter applies and specifies the outward expansion ratio of the mask area. The value range is [0.0, 1.0]. The default value is 0.05. We recommend using the default value. A smaller value makes the mask area fit the target object more closely. A larger value expands the mask area more widely. expand_mode string (Optional) When mask_type is set to tracking, this parameter applies and specifies the shape of the mask area. The algorithm generates a mask video with a corresponding shape based on the input mask image and the selected expand_mode. The following values are supported: hull (default): Polygon mode. A polygon wraps the masked object.
bbox: Bounding box mode. A rectangle wraps the masked object.
original: Raw mode. Preserves the shape of the original masked object as much as possible.
size string (Optional) Video resolution in width*height format. Available values: duration integer (Optional) Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video. prompt_extend bool (Optional) Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time. true (default)
false (Recommended)
If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt. seed integer (Optional) Random seed controlling the randomness of generated content. Value range: [0, 2147483647]. If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value. watermark bool (Optional) Whether to add a watermark ("AI Generated" in the lower-right corner). |
Video extensionmodel string (Required) The model name. Example: wan2.1-vace-plus. input object (Required) The basic input information, such as the prompt. Properties prompt string (Required) Text prompt describing the elements and visual features for the generated video. Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated. For more information about prompt techniques, see Video generation prompt guide. function string (Required) The name of the feature. The video extension is set to video_extension. Video extension generates continuous content based on an image or video. It also extracts dynamic features, such as actions and composition, from a reference video to guide the generation of a video with similar motion. The total duration of the extended video is 5 seconds. Note: This is the total duration of the final output video, not an additional 5-second extension to the original video. first_frame_url string (Optional) The URL of the first frame image. Public URL:
Image requirements: Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP. Image resolution: The width and height must be between 360 and 2,000 pixels. Image size: Must not exceed 10 MB. The URL must not contain Chinese characters.
last_frame_url string(Optional) The URL of the last frame image. Public URL:
Image requirements: Image format: JPG, JPEG, PNG, BMP, TIFF, or WEBP. Image resolution: The width and height must be between 360 and 2,000 pixels. Image size: Must not exceed 10 MB. The URL must not contain Chinese characters.
first_clip_url string (Optional) The URL of the first video segment. Public URL:
Video requirements: Video format: MP4. Video frame rate: 16 FPS or higher. If you use both first_clip_url and last_clip_url, we recommend that the two clips have the same frame rate. Video size: Must not exceed 50 MB. Video duration: Must not exceed 3 seconds. If the duration is longer, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration must not exceed 3 seconds. The URL must not contain Chinese characters.
About the output video resolution: If the input video resolution is 720P or lower, the output retains the original resolution. If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.
last_clip_url string(Optional) The URL of the last video segment. Public URL:
Video requirements: Video format: MP4. Video frame rate: 16 FPS or higher. When using first_clip_url and last_clip_url together, we recommend that the two clips have the same frame rate. Video size: Must not exceed 50 MB. Video duration: Must not exceed 3 seconds. If the duration is longer, only the first 3 seconds are used. If both first_clip_url and last_clip_url are specified, their combined duration must not exceed 3 seconds. The URL must not contain Chinese characters.
About the output video resolution: If the input video resolution is 720P or lower, the output retains the original resolution. If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.
video_url string (Optional) The URL of the input video. Public URL:
This video is mainly used to extract motion features. It is used with the first_frame_url, last_frame_url, first_clip_url, and last_clip_url parameters to guide the generation of an extended video with similar motion. Video requirements: Video format: MP4. Video frame rate: 16 FPS or higher, consistent with the preceding and succeeding clips. Video resolution: Consistent with the preceding and succeeding frames and clips. Video size: Must not exceed 50 MB. Video duration: Must not exceed 5 seconds. If the duration is longer, only the first 5 seconds are used. The URL must not contain Chinese characters.
| parameters object (Optional) The video processing parameters, such as the output video resolution. Properties control_condition string (Optional) Sets the method for video feature extraction. This is required when `video_url` is provided. The default value is "", which means no extraction is performed. duration integer (Optional) Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video. prompt_extend bool (Optional) Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time. true (default)
false (Recommended)
If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt. seed integer (Optional) Random seed controlling the randomness of generated content. Value range: [0, 2147483647]. If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value. watermark bool (Optional) Whether to add a watermark ("AI Generated" in the lower-right corner). |
Video outpaintingmodel string (Required) The model name. Example: wan2.1-vace-plus. input object (Required) The basic input information, such as the prompt. Properties prompt string (Required) Text prompt describing the elements and visual features for the generated video. Supports Chinese and English. Each character, letter, and punctuation mark counts as one character. Text exceeding the limit is automatically truncated. For more information about prompt techniques, see Video generation prompt guide. function string (Required) Specifies the feature to use. For video outpainting, set this parameter to video_outpainting. Video Frame Extension allows for proportional video extension in the up, down, left, and right directions. video_url string (Required) URL of the input video. Public URL:
Requirements for input videos: Format: MP4. Frame rate: 16 FPS or higher. Size: Maximum 50 MB. Duration: Maximum 5 seconds. If longer, only the first 5 seconds are used. The URL must not contain Chinese characters.
About the output video resolution: If the input video resolution is 720P or lower, the output retains the original resolution. If the input video resolution is higher than 720P, it is scaled down to 720P or lower while maintaining the original aspect ratio.
About the output video duration: The output video has the same duration as the input video, up to a maximum of 5 seconds. Example: If the input video is 3 seconds long, the output is also 3 seconds long. If the input is 6 seconds long, the output is the first 5 seconds.
| parameters object (Optional) The video processing parameters, such as the scaling ratio. Properties top_scale float (Optional) Centers the video frame and proportionally scales the video upward. The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling. bottom_scale float (Optional) Centers the video frame and proportionally scales the video downward. The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling. left_scale float (Optional) Centers the video frame and proportionally scales the video to the left. The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling. right_scale float (Optional) Centers the video frame and proportionally scales the video to the right. The value range is [1.0, 2.0]. The default value is 1.0, which means no scaling. duration integer (Optional) Video duration in seconds. Fixed at 5 and cannot be modified. The model always generates a 5-second video. prompt_extend bool (Optional) Specifies whether to enable prompt rewriting. If enabled, an LLM rewrites the input prompt. This can significantly improve the generation quality for short prompts but increases the processing time. true (default)
false (Recommended)
If the text description is inconsistent with the input video content, the model may misinterpret the prompt. To improve generation consistency and accuracy, disable prompt rewriting and provide a clear, specific description in the prompt. seed integer (Optional) Random seed controlling the randomness of generated content. Value range: [0, 2147483647]. If you do not provide a seed, the algorithm automatically generates a random number as the seed. If you want to generate relatively stable content, use the same seed parameter value. watermark bool (Optional) Whether to add a watermark ("AI Generated" in the lower-right corner). |
|