Request parameters | Multi-image referencecurl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "image_reference",
"prompt": "In the video, a girl gracefully emerges from the depths of an ancient, misty forest. Her steps are light, and the camera captures her every fluid moment. When the girl stands still and looks around at the lush trees, her face lights up with a smile that blends surprise and joy. This moment, frozen in the interplay of light and shadow, captures the girl's wonderful encounter with nature.",
"ref_images_url": [
"http://wanx.alicdn.com/material/20250318/image_reference_2_5_16.png",
"http://wanx.alicdn.com/material/20250318/image_reference_1_5_16.png"
]
},
"parameters": {
"obj_or_bg": ["obj","bg"],
"size": "1280*720"
}
}'
Video repaintingcurl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_repainting",
"prompt": "The video shows a black steampunk-style car driven by a gentleman, decorated with gears and copper pipes. The background is a steam-powered candy factory with vintage elements, creating a retro and playful scene.",
"video_url": "http://wanx.alicdn.com/material/20250318/video_repainting_1.mp4"
},
"parameters": {
"control_condition": "depth"
}
}'
Masked editingcurl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_edit",
"prompt": "The video shows a Parisian-style French café where a lion wearing a suit is elegantly enjoying coffee. It holds a coffee cup in one hand, sipping gently with a contented expression. The café is decorated elegantly, with soft tones and warm lighting illuminating the area where the lion is seated.",
"mask_image_url": "http://wanx.alicdn.com/material/20250318/video_edit_1_mask.png",
"video_url": "http://wanx.alicdn.com/material/20250318/video_edit_2.mp4",
"mask_frame_id": 1
},
"parameters": {
"mask_type": "tracking",
"expand_ratio": 0.05
}
}'
Video extensioncurl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_extension",
"prompt": "A dog wearing sunglasses skateboarding on the street, 3D cartoon.",
"first_clip_url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
},
"parameters": {}
}'
Video outpaintingcurl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
--header 'X-DashScope-Async: enable' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "wan2.1-vace-plus",
"input": {
"function": "video_outpainting",
"prompt": "An elegant lady is passionately playing the violin, behind her is a complete symphony orchestra.",
"video_url": "http://wanx.alicdn.com/material/20250318/video_outpainting_1.mp4"
},
"parameters": {
"top_scale": 1.5,
"bottom_scale": 1.5,
"left_scale": 1.5,
"right_scale": 1.5
}
}'
|
Multi-image referencemodel string (Required) Model name. Example value: wan2.1-vace-plus. input object (Required) Basic input information, such as prompts. Properties prompt string (Required) The prompt that describes the elements and visual characteristics expected in the generated video. Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated. For prompt tips, see Video generation prompt guide. function string (Required) The feature name, set to image_reference . Supports up to 3 reference images, including subjects and backgrounds, such as people, animals, clothing, and scenes. Use prompt to describe the desired video content, and the model can blend multiple images to generate coherent video content. ref_images_url array[string] (Required) An array of URLs for input reference images. The URLs must be publicly accessible and support HTTP or HTTPS protocols. Supports 1 to 3 reference images. If more than 3 images are provided, only the first 3 will be used as input. Image restrictions: Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP. Image resolution: The width and height must be within [360, 2000] pixels. File size: Not exceeding 10 MB. URL addresses cannot contain Chinese characters.
Recommendations: When using the subject from a reference image, each image should contain only one subject. The background should be a solid color (such as white or a single color) to better highlight the subject. When using the background from a reference image, there can be at most one background image, and the background image should not contain any subject objects.
| parameters object (Optional) Video processing parameters, such as watermark settings. Properties obj_or_bg array[string] (Optional) The purpose of each reference image, corresponding one-to-one with ref_images_url . Each element in the array indicates whether the image at the corresponding position is a "subject" or "background": Usage notes: The length of this parameter should match ref_images_url . If this parameter is not provided or the lengths do not match, the default is obj (subject).
Example value: ["obj", "obj", "bg"]. size string (Optional) The resolution of the generated video (width*height). Currently supports generating 720P videos, with the following resolution values: 1280*720 (default): Video aspect ratio is 16:9. Where 1280 represents the width and 720 represents the height.
720*1280 : Video aspect ratio is 9:16.
960*960 : Video aspect ratio is 1:1.
832*1088 : Video aspect ratio is 3:4.
1088*832 : Video aspect ratio is 4:3.
duration integer (Optional) Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video. prompt_extend bool (Optional) Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time. seed integer (Optional) Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647] . If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed. watermark bool (Optional) Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated". |
Video repaintingmodel string (Required) Model name. Example value: wan2.1-vace-plus. input object (Required) Basic input information, such as prompts. Properties prompt string (Required) The prompt that describes the elements and visual characteristics expected in the generated video. Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated. For prompt tips, see Video generation prompt guide. function string (Required) The feature name, set to video_repainting . Video repainting supports extracting subject poses and actions, composition, motion outlines, and line drawing structures from the input video. Combined with text prompts, it generates new videos with the same dynamic characteristics. It also supports replacing the subject in the original video with a reference image, such as changing the character appearance while retaining the original actions. video_url string (Required) The URL address of the input video. The URL must be publicly accessible and support HTTP or HTTPS protocols. Video restrictions: Video format: MP4. Frame rate: Greater than or equal to 16 FPS. File size: Not exceeding 50 MB. Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used. URL addresses cannot contain Chinese characters.
About the output video resolution: If the input video resolution is ≤ 720P, the output will maintain the original resolution; If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.
About the output video duration: The output video duration matches the input video, but not exceeding 5 seconds. Example: If the input video is 3 seconds, the output will also be 3 seconds. If the input is 6 seconds, the output will be the first 5 seconds.
ref_images_url array[string] (Optional) An array of URL for input reference image. The URL must be publicly accessible and support HTTP or HTTPS protocols. Only supports 1 reference image, which should be a subject image used to replace the subject content in the input video. Image restrictions: Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP. Image resolution: The width and height must be within [360, 2000] pixels. File size: Not exceeding 10 MB. URL addresses cannot contain Chinese characters.
Recommendations: | parameters object (Required) Video processing parameters, such as watermark settings. Properties control_condition string (Required) Sets the method for video feature extraction. posebodyface : Extracts facial expressions and body movements of the subject in the input video, suitable for scenarios where subject expression details need to be preserved.
posebody : Extracts the body movements of the subject in the input video (excluding facial expressions), suitable for scenarios where only body movements need to be controlled.
depth : Extracts the composition and motion outline of the input video.
scribble : Extracts the line drawing structure of the input video.
strength float (Optional) Adjusts the control strength of the video feature extraction method specified by control_condition on the generated video. The default value is 1.0, with a range of [0.0, 1.0]. The higher the value, the closer the generated video will be to the original video's actions and composition. The lower the value, the more freedom in the generated content. prompt_extend bool (Optional) Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time. When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy. seed integer (Optional) Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647] . If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed. watermark bool (Optional) Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated". |
Masked editingmodel string (Required) Model name. Example value: wan2.1-vace-plus. input object (Required) Basic input information, such as prompts. Properties prompt string (Required) The prompt that describes the elements and visual characteristics expected in the generated video. Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated. For prompt tips, see Video generation prompt guide. function string (Required) The feature name, set to video_edit . Masked editing supports adding, modifying, or removing elements in specified areas of the input video, along with replacing subjects or backgrounds in the editing area, enabling fine-grained video editing. video_url string (Required) The URL address of the input video. The URL must be publicly accessible and support HTTP or HTTPS protocols. Video restrictions: Video format: MP4. Frame rate: Greater than or equal to 16 FPS. File size: Not exceeding 50 MB. Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used. URL addresses cannot contain Chinese characters.
About the output video resolution: If the input video resolution is ≤ 720P, the output will maintain the original resolution; If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.
About the output video duration: The output video duration matches the input video, but not exceeding 5 seconds. Example: If the input video is 3 seconds, the output will also be 3 seconds. If the input is 6 seconds, the output will be the first 5 seconds.
ref_images_url array[string] (Optional) An array of URL for input reference image. The URL must be publicly accessible and support HTTP or HTTPS protocols. Currently only supports passing 1 reference image, which can be used as a subject or background to replace the corresponding content in the input video. Image restrictions: Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP. Image resolution: The width and height must be within [360, 2000] pixels. File size: Not exceeding 10 MB. URL addresses cannot contain Chinese characters.
Recommendations: When using the subject from a reference image, each image should contain only one subject. The background should be a solid color (such as white or a single color) to better highlight the subject. When using the background from a reference image, the background image should not contain any subject objects.
mask_image_url string (Optional) The URL of the mask image. The URL must be publicly accessible and support HTTP or HTTPS protocols. This parameter is used to specify the editing area of the video. Choose either this parameter or mask_video_url , but we recommend this parameter. White areas in the mask image (pixel values strictly [255, 255, 255]) indicate parts that need to be edited. Black areas (pixel values strictly [0, 0, 0]) indicate parts that remain unchanged. Image restrictions: Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP. Image resolution: Must exactly match the input video (video_url ) resolution. File size: Not exceeding 10 MB. URL addresses cannot contain Chinese characters.
mask_frame_id integer (Optional) When mask_image_url is not empty, this parameter takes effect and identifies which frame in the video the mask target appears in, represented as a "frame ID". The default value is 1, in frames, representing the first frame of the video. The value range is [1, max_frame_id] , where max_frame_id = input video frame rate × input video duration + 1 . For example, if the input video (video_url ) has a frame rate of 16 FPS (frames per second), and the video duration is 5 seconds, then the total number of frames in the input video is 16*5+1 = 81, so max_frame_id = 81. mask_video_url string (Optional) The URL of the mask video. The URL must be publicly accessible and support HTTP or HTTPS protocols. This parameter is used to specify the editing area of the video. Choose either this parameter or mask_image_url . The video format, frame rate, resolution, and length of the mask video must exactly match the input video (video_url ). White areas in the mask video (pixel values strictly [255, 255, 255]) indicate parts that need to be edited. Black areas (pixel values strictly [0, 0, 0]) indicate parts that remain unchanged. | parameters object (Optional) Video processing parameters, such as watermark settings. Properties control_condition string (Optional) Sets the method for video feature extraction. The default is "", indicating no extraction. posebodyface : Extracts facial expressions and body movements of the subject in the input video, suitable for scenarios where the subject's face occupies a large portion of the frame and features are clearly visible.
depth : Extracts the composition and motion outline of the input video.
mask_type string (Optional) When mask_image_url is not empty, this parameter takes effect and specifies how the editing area behaves. tracking (default): The editing area will dynamically follow the target object's motion trajectory, suitable for scenes with moving subject.
fixed : The editing area remains fixed and does not change with the content of the frame.
expand_ratio float (Optional) When mask_type is set to tracking , this parameter takes effect and represents the ratio for expanding the mask area outward. The value range is [0.0, 1.0], with a default value of 0.05. The default value is recommended. The smaller the value, the more the mask area fits the target object. The larger the value, the wider the expansion range of the mask area. expand_mode string (Optional) When mask_type is set to tracking , this parameter takes effect and represents the shape of the mask area. The algorithm will generate a mask video with the corresponding shape based on the input mask image according to the selected expand_mode . Supported values include the following: hull (default): Polygon mode, indicating that a polygon is used to wrap the mask target.
bbox : Bounding box mode, indicating that a rectangle is used to wrap the mask target.
original : Original mode, indicating that the shape is kept as close as possible to the original mask target.
size string (Optional) The resolution of the generated video (width*height). Currently supports generating 720P videos, with the following resolution values: 1280*720 (default): Video aspect ratio is 16:9. Where 1280 represents the width and 720 represents the height.
720*1280 : Video aspect ratio is 9:16.
960*960 : Video aspect ratio is 1:1.
832*1088 : Video aspect ratio is 3:4.
1088*832 : Video aspect ratio is 4:3.
duration integer (Optional) Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video. prompt_extend bool (Optional) Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time. When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy. seed integer (Optional) Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647] . If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed. watermark bool (Optional) Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated". |
Video extensionmodel string (Required) Model name. Example value: wan2.1-vace-plus. input object (Required) Basic input information, such as prompts. Properties prompt string (Required) The prompt that describes the elements and visual characteristics expected in the generated video. Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated. For prompt tips, see Video generation prompt guide. function string (Required) The feature name, set to video_extension . Video extension supports generating continuous content based on images or videos, and also supports extracting dynamic features (such as actions, and compositions) from reference videos to guide the generation of videos with similar motion performance. The total duration of the extended video is 5 seconds. Note: this refers to the complete duration of the final output video being 5 seconds, not extending the original video by 5 seconds. first_frame_url string (Optional) The URL of the first frame image. The URL must be publicly accessible and support HTTP or HTTPS protocols. Image restrictions: Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP. Image resolution: The width and height must be within [360, 2000] pixels. File size: Not exceeding 10 MB. URL addresses cannot contain Chinese characters.
last_frame_url string (Optional) The URL of the last frame image. The URL must be publicly accessible and support HTTP or HTTPS protocols. Image restrictions: Image formats: JPG, JPEG, PNG, BMP, TIFF, WEBP. Image resolution: The width and height must be within [360, 2000] pixels. File size: Not exceeding 10 MB. URL addresses cannot contain Chinese characters.
first_clip_url string (Optional) The URL of the first segment video. The URL must be publicly accessible and support HTTP or HTTPS protocols. Video restrictions: Video format: MP4. Frame rate: Greater than or equal to 16 FPS. When first_clip_url and last_clip_url are used together, the frame rates of the two segments should be consistent. File size: Not exceeding 50 MB. Video length: Not exceeding 3 seconds, otherwise only the first 3 seconds will be used. When both first_clip_url and last_clip_url are provided, the total duration of the two video segments should not exceed 3 seconds. URL addresses cannot contain Chinese characters.
About the output video resolution: If the input video resolution is ≤ 720P, the output will maintain the original resolution; If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.
last_clip_url string (Optional) The URL of the last segment video. The URL must be publicly accessible and support HTTP or HTTPS protocols. Video restrictions: Video format: MP4. Frame rate: Greater than or equal to 16 FPS. When first_clip_url and last_clip_url are used together, the frame rates of the two segments should be consistent. File size: Not exceeding 50 MB. Video length: Not exceeding 3 seconds, otherwise only the first 3 seconds will be used. When both first_clip_url and last_clip_url are provided, the total duration of the two video segments should not exceed 3 seconds. URL addresses cannot contain Chinese characters.
About the output video resolution: If the input video resolution is ≤ 720P, the output will maintain the original resolution; If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.
video_url string (Optional) The URL of the input video. The URL must be publicly accessible and support HTTP or HTTPS protocols. This video is mainly used to extract motion features, and is used together with first_frame_url , last_frame_url , first_clip_url , and last_clip_url parameters to guide the generation of extended videos with similar motion performance. Video restrictions: Video format: MP4. Frame rate: Greater than or equal to 16 FPS, consistent with the first and last segments. Video resolution: Must match the first and last frames, and the first and last segments. File size: Not exceeding 50 MB. Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used. URL addresses cannot contain Chinese characters.
| parameters object (Optional) Video processing parameters, such as the output video resolution. Properties control_condition string (Optional) Sets the method for video feature extraction. Required when video_url is provided. The default is "", indicating no extraction. duration integer (Optional) Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video. prompt_extend bool (Optional) Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time. When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy. seed integer (Optional) Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647] . If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed. watermark bool (Optional) Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated". |
Video outpaintingmodel string (Required) Model name. Example value: wan2.1-vace-plus. input object (Required) Basic input information, such as prompts. Properties prompt string (Required) The prompt that describes the elements and visual characteristics expected in the generated video. Supports Chinese and English, with a maximum length of 800 characters. Each Chinese character or letter counts as one character. Content exceeding this limit will be truncated. For prompt tips, see Video generation prompt guide. function string (Required) Feature name. For video outpainting, set to video_outpainting . Video outpainting supports proportionally extending the video in the top, bottom, left, and right directions. video_url string (Required) The URL address of the input video. The URL must be publicly accessible and support HTTP or HTTPS protocols. Video restrictions: Video format: MP4. Frame rate: Greater than or equal to 16 FPS. File size: Not exceeding 50 MB. Video length: Not exceeding 5 seconds, otherwise only the first 5 seconds will be used. URL addresses cannot contain Chinese characters.
About the output video resolution: If the input video resolution is ≤ 720P, the output will maintain the original resolution; If the input video resolution is > 720P, the video will be proportionally scaled to not exceed 720P while maintaining the original aspect ratio.
About the output video duration: The output video duration matches the input video, but not exceeding 5 seconds. Example: If the input video is 3 seconds, the output will also be 3 seconds. If the input is 6 seconds, the output will be the first 5 seconds.
| parameters object (Optional) Video processing parameters, such as setting the expansion ratio. Properties top_scale float (Optional) Centers the video frame and extends it upward by the specified ratio. The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension. bottom_scale float (Optional) Centers the video frame and extends it downward by the specified ratio. The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension. left_scale float (Optional) Centers the video frame and extends it to the left by the specified ratio. The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension. right_scale float (Optional) Centers the video frame and extends it to the right by the specified ratio. The value range is [1.0, 2.0], with a default value of 1.0, indicating no extension. duration integer (Optional) Video generation duration in seconds. Fixed at 5. The model will always generate a 5-second video. prompt_extend bool (Optional) Whether to enable prompt rewriting. When enabled, an LLM will rewrite the prompt. This significantly improves generation results for shorter prompts but increases processing time. When the text description does not match the input video content, the model may misinterpret. We recommend disable prompt rewriting and provide clear, specific scene descriptions in the prompt to improve generation consistency and accuracy. seed integer (Optional) Random seed used to control the randomness of model-generated content. The seed parameter value range is [0, 2147483647] . If not provided, the algorithm automatically generates a random number as the seed. If you want the generated content to remain relatively stable, use the same seed. watermark bool (Optional) Whether to add a watermark, which appears in the bottom right corner with the text "AI Generated". |
|