All Products
Search
Document Center

Alibaba Cloud Model Studio:Wan - image-to-video API Reference

Last Updated:Dec 19, 2025

The Wan image-to-video model generates a smooth video from a first frame image and a text prompt. The supported features include the following:

  • Basic features: You can select a video duration from 3/4/5/10/15 seconds, specify a video resolution of 480P, 720P, or 1080P, use intelligent prompt rewriting, and add watermarks.

  • Audio capabilities: You can use automatic dubbing or provide a custom audio file for audio-video synchronization. (Supported by wan2.5 and wan2.6)

  • Multi-shot narrative: You can maintain subject consistency across shots to create a coherent multi-shot narrative. (Supported only by wan2.6)

Quick links: Try it online | Wan official website | Video effect list

Note

The features available on the Wan official website may differ from those supported by the API. This document describes the API's capabilities and is updated promptly to reflect new features.

Model overview

Input first frame image and audio

Output video (wan2.6)

rap-转换自-png

Input audio:

Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.

Note

Before you make a call, check the Models supported in each region.

Prerequisites

Before you make a call, create and configure an API key and then export the API key as an environment variable. To use the SDK, install the DashScope SDK.

Important

The Beijing and Singapore regions have separate API keys and request endpoints. Do not use them interchangeably. Cross-region calls cause authentication failures or service errors.

HTTP

Image-to-video tasks can take a long time to complete, typically 1 to 5 minutes. Therefore, the API uses asynchronous invocation. The process involves two core steps: Create a task -> Poll for results. The steps are as follows:

The actual time required depends on the number of tasks in the queue and the service execution status. Please be patient while you wait for the result.

Step 1: Create a task to get a task ID

The endpoint for the Singapore region is: POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Beijing region: POST https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Note
  • After the task is created, use the returned task_id to query the result. The task_id is valid for 24 hours. Do not create duplicate tasks. Use polling to retrieve the result.

Request parameters

Multi-shot narrative

This feature is supported only by wan2.6-i2v.

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain and configure API keys.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-i2v",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"
    },
    "parameters": {
        "resolution": "720P",
        "prompt_extend": true,
        "duration": 10,
        "audio": true,
        "shot_type":"multi"
    }
}'

Automatic dubbing

This feature is supported only by wan2.5 and later models.

The auto-dubbing feature is enabled by default for the model and requires no configuration. To explicitly declare this setting, you can set the parameters.audio parameter to true.

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain and configure an API key.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.5-i2v-preview",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true,
        "duration": 10,
        "audio": true
    }
}'

Use audio file

This feature is supported only by wan2.5 and later models.

You can use the input.audio_url parameter to provide the URL of a custom audio file.

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.5-i2v-preview",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png",
        "audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true,
        "duration": 10
    }
}'

Generate a silent video

Parameter settings vary by model version:

  • For wan2.5 and later models, you must explicitly set the parameters.audio parameter to false.

  • For wan2.2 and earlier versions, the model generates silent videos by default and requires no parameters. See the following code.

The API keys for the Singapore and Beijing regions are different. For more information, see Obtain API key.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.2-i2v-plus",
    "input": {
        "prompt": "A cat running on the grass",
        "img_url": "https://cdn.translate.alibaba.com/r/wanx-demo-1.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true
    }
}'

Use a negative prompt

Use negative_prompt to prevent the generated video from including "flowers".

The API keys for the Singapore and Beijing regions are different. For more information, see Obtain an API key.
The following example uses the base URL for the Singapore region. If you use a model in the Beijing region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.2-i2v-plus",
    "input": {
        "prompt": "A cat running on the grass",
        "negative_prompt": "flowers",
        "img_url": "https://cdn.translate.alibaba.com/r/wanx-demo-1.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true
    }
}'
Request headers

Content-Type string (Required)

The content type of the request. Set this parameter to application/json.

Authorization string (Required)

The identity authentication credentials for the request. This API uses an Model Studio API key for identity authentication. Example: Bearer sk-xxxx.

X-DashScope-Async string (Required)

The asynchronous processing configuration parameter. HTTP requests support only asynchronous processing. You must set this parameter to enable.

Important

If this request header is missing, the error message "current user api does not support synchronous calls" is returned.

Request body

model string (Required)

The model name. Example: wan2.5-i2v-preview.

For a list of models and their prices, see Model prices.

input object (Required)

Basic input information, such as the prompt.

Properties

prompt string (Optional)

A text prompt is a description of the elements and visual features that you want to include in the generated image.

This parameter supports both Chinese and English. Each Chinese character or letter is counted as one character. Any excess characters are automatically truncated. The length limit varies by model version:

  • wan2.6-i2v: Up to 1,500 characters.

  • wan2.5-i2v-preview: Up to 1,500 characters.

  • wan2.2 and earlier models: Up to 800 characters.

Example: A kitten running on the grass.

For prompt usage tips, see Text-to-video/image-to-video prompt guide.

negative_prompt string (Optional)

The negative prompt, which describes content that you do not want to appear in the video. This can be used to constrain the video content.

This parameter supports both Chinese and English. The length is limited to 500 characters. Any excess characters are automatically truncated.

Example: low resolution, error, worst quality, low quality, deformed, extra fingers, bad proportions.

img_url string (Required)

The URL or Base64-encoded data of the first frame image.

Image limits:

  • Image format: JPEG, JPG, PNG (alpha channels are not supported), BMP, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • File size: No more than 10 MB.

Input image instructions:

  1. Publicly accessible URL

    • Supports HTTP or HTTPS protocols.

    • Example: https://cdn.translate.alibaba.com/r/wanx-demo-1.png.

  2. Base64-encoded image string

    • The data format is data:{MIME_type};base64,{base64_data}.

    • Example: ....... (The encoded string is too long and only a snippet is shown.)

    • For more information, see Input image.

audio_url string (Optional)

Supported models: wan2.6-i2v, wan2.5-i2v-preview.

The URL of the audio file. The model uses this audio to generate the video. For more information, see Audio settings.

This parameter supports HTTP or HTTPS protocols.

Audio limits:

  • Format: WAV, MP3.

  • Duration: 3 to 30 s.

  • File size: No more than 15 MB.

  • Handling of exceeded limits: If the audio duration exceeds the duration value (5 or 10 seconds), the audio is automatically truncated to the first 5 or 10 seconds, and the rest is discarded. If the audio duration is shorter than the video duration, the remaining part of the video is silent. For example, if the audio is 3 s long and the video is 5 s long, the first 3 s of the output video have sound, and the last 2 s are silent.

Example: https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3.

parameters object (Optional)

Video processing parameters, such as the video resolution, video duration, intelligent prompt rewriting, and watermark.

Properties

resolution string (Optional)

Important

The resolution parameter directly affects the cost. For the same model, the cost is as follows: 1080P > 720P > 480P. Before you make a call, confirm the model pricing.

Specifies the resolution tier for the generated video. This setting adjusts the video's definition (total pixels). The model automatically scales the video to a similar total pixel count based on the selected resolution tier. The aspect ratio of the video is kept as consistent as possible with the aspect ratio of the input img_url image. For more information, see the FAQ.

The default value and valid values for this parameter depend on the model parameter, as described in the following list:

  • wan2.6-i2v: Optional values: 720P, 1080P. Default value: 1080P.

  • wan2.5-i2v-preview: Valid values: 480P, 720P, or 1080P. Default: 1080P.

  • wan2.2-i2v-flash: Optional values: 480P, 720P. Default value: 720P.

  • wan2.2-i2v-plus: Optional values: 480P, 1080P. Default value: 1080P.

  • wan2.1-i2v-turbo: Optional values: 480P, 720P. Default value: 720P.

  • wan2.1-i2v-plus: Optional values: 720P. Default value: 720P.

Example: 1080P.

duration integer (Optional)

Important

The duration directly affects the cost. Billing is per second, so a longer duration results in a higher cost. Before you make a call, confirm the model pricing.

The duration of the generated video in seconds. The valid values for this parameter depend on the model parameter:

  • wan2.6-i2v: Optional values: 5, 10, 15. Default value: 5.

  • wan2.5-i2v-preview: Optional values: 5, 10. Default value: 5.

  • wan2.2-i2v-plus: Fixed at 5 seconds and cannot be modified.

  • wan2.2-i2v-flash: Fixed at 5 seconds and cannot be modified.

  • wan2.1-i2v-plus: Fixed at 5 seconds and cannot be modified.

  • wan2.1-i2v-turbo: Optional values: 3, 4, or 5. Default value: 5.

Example: 5.

prompt_extend boolean (Optional)

Specifies whether to enable prompt rewriting. If enabled, a large language model (LLM) rewrites the input prompt. This can significantly improve the generation quality for shorter prompts but increases the time required.

  • true (default)

  • false

Example: true.

shot_type string (Optional)

Supported model: wan2.6-i2v.

Specifies the shot type for the generated video, that is, whether the video consists of a single continuous shot or multiple switched shots.

Condition: This parameter is effective only when "prompt_extend": true.

Parameter priority: shot_type > prompt. For example, if shot_type is set to "single", even if the prompt contains "generate a multi-shot video", the model will still output a single-shot video.

Optional values:

  • single: (default) Outputs a single-shot video.

  • multi: Outputs a multi-shot video.

Example: single.

Note

Use this parameter to strictly control the narrative structure of the video, for example, using a single shot for a product display or multiple shots for a short story.

audio boolean (Optional)

Supported models: wan2.6-i2v, wan2.5-i2v-preview.

Specifies whether to automatically add audio to the generated video.

Effective condition: This parameter takes effect only when audio_url is not provided.

The parameter priority is audio_url > audio. For more information, see Audio settings.

Optional values:

  • true: (default) Automatically adds audio to the video.

  • false: Does not add audio. Outputs a silent video.

Example: true.

Note

If you want to generate visual-only content, such as effects demos or silent animations, explicitly set "audio": false. To use a custom voiceover, use audio_url instead of this parameter.

watermark boolean (Optional)

Specifies whether to add a watermark. The watermark, which says AI Generated, is placed in the lower-right corner of the video.

  • false (default)

  • true

Example: false.

seed integer (Optional)

The random number seed. The value must be in the range [0, 2147483647].

If this parameter is not specified, the system automatically generates a random seed. To improve the reproducibility of the generated results, set a fixed seed value.

Note that because model generation is probabilistic, using the same seed value does not guarantee that the generated results are identical for every call.

Example: 12345.

Response parameters

Successful response

Save the task_id to query the task status and result.

{
    "output": {
        "task_status": "PENDING",
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
}

Error response

The task creation failed. For more information, see Error messages to resolve the issue.

{
    "code":"InvalidApiKey",
    "message":"Invalid API-key provided.",
    "request_id":"fb53c4ec-1c12-4fc4-a580-xxxxxx"
}

output object

The task output information.

Properties

task_id string

The task ID. The query is valid for 24 hours.

task_status string

The task status.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: The task does not exist or its status cannot be determined.

request_id string

The unique request ID. You can use this ID to trace and troubleshoot issues.

code string

The error code for a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

message string

The detailed information about a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

Step 2: Query the result by task ID

Singapore region: GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

Beijing region: GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Note
  • Polling suggestion: Video generation takes several minutes. Use a polling mechanism and set a reasonable query interval, such as 15 seconds, to retrieve the result.

  • Task status transition: PENDING → RUNNING → SUCCEEDED or FAILED.

  • Result link: After the task is successful, a video link is returned. The link is valid for 24 hours. After you retrieve the link, immediately download and save the video to a permanent storage service, such as Object Storage Service.

  • task_id validity: 24 hours. After this period, you cannot query the result, and the API returns a task status of UNKNOWN.

Request parameters

Query task results

Replace 86ecf553-d340-4e21-xxxxxxxxx with the actual task ID.

The API keys for the Singapore and Beijing regions are different. Create an API key.
The following `base_url` is for the Singapore region. For models in the Beijing region, replace the `base_url` with `https://dashscope.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx`.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Request headers

Authorization string (Required)

The identity authentication credentials for the request. This API uses an Model Studio API key for identity authentication. Example: Bearer sk-xxxx.

URL path parameters

task_id string (Required)

The task ID.

Response parameters

Task succeeded

Video URLs are retained for only 24 hours and are automatically purged after this period. You must save the generated videos promptly.

{
    "request_id": "2ca1c497-f9e0-449d-9a3f-xxxxxx",
    "output": {
        "task_id": "af6efbc0-4bef-4194-8246-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-09-25 11:07:28.590",
        "scheduled_time": "2025-09-25 11:07:35.349",
        "end_time": "2025-09-25 11:17:11.650",
        "orig_prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "duration": 10,
        "input_video_duration": 0,
        "output_video_duration": 10,
        "video_count": 1,
        "SR": 720
    }
}

Task failed

If a task fails, task_status is set to FAILED, and an error code and message are provided. For more information, see Error messages to resolve the issue.

{
    "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d",
    "output": {
        "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010",
        "task_status": "FAILED",
        "code": "InvalidParameter",
        "message": "The size is not match xxxxxx"
    }
}

Task query expired

The task_id is valid for 24 hours. After this period, the query fails and the following error message is returned.

{
    "request_id": "a4de7c32-7057-9f82-8581-xxxxxx",
    "output": {
        "task_id": "502a00b1-19d9-4839-a82f-xxxxxx",
        "task_status": "UNKNOWN"
    }
}

output object

The task output information.

Properties

task_id string

The task ID. The query is valid for 24 hours.

task_status string

The task status.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: The task does not exist or its status cannot be determined.

Status transitions during polling:

  • PENDING → RUNNING → SUCCEEDED or FAILED.

  • The status of the first query is usually PENDING or RUNNING.

  • If the status changes to SUCCEEDED, the response contains the generated video URL.

  • If the status is FAILED, check the error message and retry.

submit_time string

The time when the task was submitted. The time is in the UTC+8 time zone. The format is YYYY-MM-DD HH:mm:ss.SSS.

scheduled_time string

The time when the task started running. The time is in the UTC+8 time zone. The format is YYYY-MM-DD HH:mm:ss.SSS.

end_time string

The time when the task was completed. The time is in the UTC+8 time zone. The format is YYYY-MM-DD HH:mm:ss.SSS.

video_url string

The video URL. This parameter is returned only if task_status is SUCCEEDED.

The link is valid for 24 hours. You can use this URL to download the video. The video is in MP4 format with H.264 encoding.

orig_prompt string

The original input prompt. This corresponds to the prompt request parameter.

actual_prompt string

When prompt_extend=true, the system intelligently rewrites the input prompt. This field returns the optimized prompt that is actually used for generation.

  • If prompt_extend=false, this field is not returned.

  • Note: The wan2.6 model does not return this field, regardless of the value of prompt_extend.

code string

The error code for a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

message string

The detailed information about a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

usage object

Usage statistics for the task. Only successful tasks are counted.

Properties

Parameters returned by the wan2.6 model

input_video_duration integer

The duration of the input video in seconds. This is currently fixed at 0 because video input is not supported.

output_video_duration integer

Returned only when you use the wan2.6 model.

The duration of the output video, in seconds. Its value is equal to the value of input.duration.

duration integer

The total video duration, used for billing.

Billing formula: duration=input_video_duration+output_video_duration.

SR integer

Returned only when you use the wan2.6 model. The resolution tier of the generated video. Example: 720.

video_ratio string

The resolution of the generated video is the same as the value of parameters.size.

Format: widthxheight, for example, 1920x1080.

video_count integer

The number of generated videos. The value is fixed at 1.

Parameters returned by the wan2.2 and wan2.5 models

duration integer

The duration of the generated video in seconds. Enumeration values: 5, 10.

Billing formula: Cost = Video duration in seconds × Unit price.

SR integer

The resolution of the generated video. Enumeration values: 480, 720, 1080.

video_count integer

The number of generated videos. The value is fixed at 1.

Parameters returned by the wan2.1 model

video_duration integer

The duration of the generated video in seconds. Enumeration values: 3, 4, 5.

Billing formula: Cost = Video duration in seconds × Unit price.

video_ratio string

The aspect ratio of the generated video. The value is fixed at "standard".

video_count integer

The number of generated videos. The value is fixed at 1.

request_id string

The unique request ID. You can use this ID to trace and troubleshoot issues.

DashScope SDK

The parameter names in the SDK are mostly consistent with the HTTP API. The parameter structure is encapsulated based on the features of the programming language.

Because image-to-video tasks can take a long time to complete, typically 1 to 5 minutes, the SDK encapsulates the asynchronous HTTP call process at the underlying layer and supports both synchronous and asynchronous call methods.

The actual time required depends on the number of tasks in the queue and the service execution status. Please be patient while you wait for the result.

Python SDK

The Python SDK supports three image input methods: public URL, Base64-encoded string, and local file path (absolute or relative). You can choose one of these methods. For more information, see Input image.

Important
  • wan2.6-i2v does not currently support SDK calls.

  • Ensure that the DashScope Python SDK version is at least 1.25.2 before you run the following code.

    If the version is too low, you may encounter errors such as "url error, please check url!". For more information, see Install the SDK to update your version.

Sample code

Synchronous invocation

A synchronous call blocks and waits until the video generation is complete and the result is returned. This example shows three image input methods: public URL, Base64 encoding, and local file path.

Request example
import base64
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import mimetypes
import dashscope

# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'


# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key = os.getenv("DASHSCOPE_API_KEY")

# --- Helper function for Base64 encoding ---
# Format: data:{MIME_type};base64,{base64_data}
def encode_file(file_path):
    mime_type, _ = mimetypes.guess_type(file_path)
    if not mime_type or not mime_type.startswith("image/"):
        raise ValueError("Unsupported or unknown image format")
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return f"data:{mime_type};base64,{encoded_string}"

"""
Image input methods:
The following are three image input methods.

1. Use a public URL - Suitable for publicly accessible images.
2. Use a local file - Suitable for local development and testing.
3. Use Base64 encoding - Suitable for private images or scenarios requiring encrypted transmission.
"""

# [Method 1] Use a publicly accessible image URL
# Example: Use a public image URL
img_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"

# [Method 2] Use a local file (supports absolute and relative paths)
# Format: file:// + file path
# Example (absolute path):
# img_url = "file://" + "/path/to/your/img.png"    # Linux/macOS
# img_url = "file://" + "C:/path/to/your/img.png"  # Windows
# Example (relative path):
# img_url = "file://" + "./img.png"                # Path relative to the current execution file

# [Method 3] Use a Base64-encoded image
# img_url = encode_file("./img.png")

# Set the audio URL
audio_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"

def sample_call_i2v():
    # Synchronous call, returns the result directly
    print('please wait...')
    rsp = VideoSynthesis.call(api_key=api_key,
                              model='wan2.5-i2v-preview',
                              prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.',
                              img_url=img_url,
                              audio_url=audio_url,
                              resolution="480P",
                              duration=10,
                              # audio=True,
                              prompt_extend=True,
                              watermark=False,
                              negative_prompt="",
                              seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("video_url:", rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_call_i2v()
Response example
The video_url is valid for 24 hours. Download the video promptly.
{
    "status_code": 200,
    "request_id": "55194b9a-d281-4565-8ef6-xxxxxx",
    "code": null,
    "message": "",
    "output": {
        "task_id": "e2bb35a2-0218-4969-8c0d-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx",
        "submit_time": "2025-10-28 13:45:48.620",
        "scheduled_time": "2025-10-28 13:45:57.378",
        "end_time": "2025-10-28 13:48:05.361",
        "orig_prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "actual_prompt": "A boy made of spray paint emerges from a concrete wall, stands still, and begins to sing an English rap, his mouth opening and closing, his head nodding to the rhythm, and his eyes focused. He gives a thumbs-up with his right hand, puts his left hand on his hip, and moves his body rhythmically in place. The background is a night scene under a railway bridge, lit by a single streetlight. The audio is his rap performance, with the lyrics: 'Skyscrapers loom, shadows kiss the pavement. Dreams stack high, but the soul's in the basement. Pocket full of lint, chasing gold like it's sacred. Every breath a gamble, the odds never patient.'"
    },
    "usage": {
        "video_count": 1,
        "video_duration": 0,
        "video_ratio": "",
        "duration": 10,
        "SR": 480
    }
}

Asynchronous invocation

This example shows an asynchronous call. This method immediately returns a task ID, and you must poll for or wait for the task to complete.

Request example
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import dashscope

# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'


# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key = os.getenv("DASHSCOPE_API_KEY")

# Use a publicly accessible image URL
img_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"

# Set the audio URL
audio_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"


def sample_async_call_i2v():
    # Asynchronous call, returns a task_id
    rsp = VideoSynthesis.async_call(api_key=api_key,
                                    model='wan2.5-i2v-preview',
                                    prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.',
                                    img_url=img_url,
                                    audio_url=audio_url,
                                    resolution="480P",
                                    duration=10,
                                    # audio=True,
                                    prompt_extend=True,
                                    watermark=False,
                                    negative_prompt="",
                                    seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("task_id: %s" % rsp.output.task_id)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))

    # Get asynchronous task information
    status = VideoSynthesis.fetch(task=rsp, api_key=api_key)
    if status.status_code == HTTPStatus.OK:
        print(status.output.task_status)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (status.status_code, status.code, status.message))

    # Wait for the asynchronous task to finish
    rsp = VideoSynthesis.wait(task=rsp, api_key=api_key)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print(rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_async_call_i2v()
Response example

1. Response example for creating a task

{
    "status_code": 200,
    "request_id": "6dc3bf6c-be18-9268-9c27-xxxxxx",
    "code": "",
    "message": "",
    "output": {
        "task_id": "686391d9-7ecf-4290-a8e9-xxxxxx",
        "task_status": "PENDING",
        "video_url": ""
    },
    "usage": null
}

2. Response example for querying a task result

The video_url is valid for 24 hours. Download the video promptly.
{
    "status_code": 200,
    "request_id": "55194b9a-d281-4565-8ef6-xxxxxx",
    "code": null,
    "message": "",
    "output": {
        "task_id": "e2bb35a2-0218-4969-8c0d-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx",
        "submit_time": "2025-10-28 13:45:48.620",
        "scheduled_time": "2025-10-28 13:45:57.378",
        "end_time": "2025-10-28 13:48:05.361",
        "orig_prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "actual_prompt": "A boy made of spray paint emerges from a concrete wall, stands still, and begins to sing an English rap, his mouth opening and closing, his head nodding to the rhythm, and his eyes focused. He gives a thumbs-up with his right hand, puts his left hand on his hip, and moves his body rhythmically in place. The background is a night scene under a railway bridge, lit by a single streetlight. The audio is his rap performance, with the lyrics: 'Skyscrapers loom, shadows kiss the pavement. Dreams stack high, but the soul's in the basement. Pocket full of lint, chasing gold like it's sacred. Every breath a gamble, the odds never patient.'"
    },
    "usage": {
        "video_count": 1,
        "video_duration": 0,
        "video_ratio": "",
        "duration": 10,
        "SR": 480
    }
}

Java SDK

The Java SDK supports three image input methods: public URL, Base64-encoded string, and local file path (absolute path only). You can choose one of these methods. For more information, see Input image.

Important
  • wan2.6-i2v does not currently support SDK calls.

  • Ensure that the DashScope Java SDK version is at least 2.22.2 before you run the following code.

    An outdated version may trigger errors such as "url error, please check url!". For more information, see Install the SDK to update it.

Sample code

Synchronous invocation

A synchronous call blocks and waits until the video generation is complete and the result is returned. This example shows three image input methods: public URL, Base64 encoding, and local file path.

Request example
// Copyright (c) Alibaba, Inc. and its affiliates.

import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;

 
public class Image2Video {

    static {
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        // The preceding is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
    }

    // If you have not configured an environment variable, replace the following line with your Model Studio API key: apiKey="sk-xxx"
    // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
    /**
     * Image input methods: Choose one of the following three.
     *
     * 1. Use a public URL - Suitable for publicly accessible images.
     * 2. Use a local file - Suitable for local development and testing.
     * 3. Use Base64 encoding - Suitable for private images or scenarios requiring encrypted transmission.
     */

    // [Method 1] Public URL
    static String imgUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png";

    // [Method 2] Local file path (file:// + absolute path)
    // static String imgUrl = "file://" + "/your/path/to/img.png";    // Linux/macOS
    // static String imgUrl = "file://" + "C:/your/path/to/img.png";  // Windows

    // [Method 3] Base64 encoding
    // static String imgUrl = Image2Video.encodeFile("/your/path/to/img.png");
    
    // Set the audio URL
    static String audioUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3";

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        // Set the parameters
        Map<String, Object> parameters = new HashMap<>();
        parameters.put("prompt_extend", true);
        parameters.put("watermark", false);
        parameters.put("seed", 12345);

        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.5-i2v-preview")
                        .prompt("A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.")
                        .imgUrl(imgUrl)
                        .audioUrl(audioUrl)
                        //.audio(true)
                        .duration(10)
                        .parameters(parameters)
                        .resolution("480P")
                        .negativePrompt("")
                        .build();
        System.out.println("please wait...");
        VideoSynthesisResult result = vs.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    
     /**
     * Encodes a file into a Base64 string.
     * @param filePath The file path.
     * @return A Base64 string in the format: data:{MIME_type};base64,{base64_data}
     */
    public static String encodeFile(String filePath) {
        Path path = Paths.get(filePath);
        if (!Files.exists(path)) {
            throw new IllegalArgumentException("File does not exist: " + filePath);
        }
        // Detect MIME type
        String mimeType = null;
        try {
            mimeType = Files.probeContentType(path);
        } catch (IOException e) {
            throw new IllegalArgumentException("Cannot detect file type: " + filePath);
        }
        if (mimeType == null || !mimeType.startsWith("image/")) {
            throw new IllegalArgumentException("Unsupported or unknown image format");
        }
        // Read file content and encode
        byte[] fileBytes = null;
        try{
            fileBytes = Files.readAllBytes(path);
        } catch (IOException e) {
            throw new IllegalArgumentException("Cannot read file content: " + filePath);
        }
    
        String encodedString = Base64.getEncoder().encodeToString(fileBytes);
        return "data:" + mimeType + ";base64," + encodedString;
    }
    

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
Response example
The video_url is valid for 24 hours. Download the video promptly.
{
    "request_id": "f1bfb531-6e13-4e17-8e93-xxxxxx",
    "output": {
        "task_id": "9ddebba6-f784-4f55-b845-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "video_count": 1
    }
}

Asynchronous invocation

This example shows an asynchronous call. This method immediately returns a task ID, and you must poll for or wait for the task to complete.

Request example
// Copyright (c) Alibaba, Inc. and its affiliates.

import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisListResult;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.task.AsyncTaskListParam;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

import java.util.HashMap;
import java.util.Map;

public class Image2Video {

    static {
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    }

    // If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
    // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");
    // Set the input image URL
    static String imgUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png";

    // Set the audio URL
    static String audioUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3";

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        // Set the parameters
        Map<String, Object> parameters = new HashMap<>();
        parameters.put("prompt_extend", true);
        parameters.put("watermark", false);
        parameters.put("seed", 12345);

        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.5-i2v-preview")
                        .prompt("A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.")
                        .imgUrl(imgUrl)
                        .audioUrl(audioUrl)
                        //.audio(true)
                        .duration(10)
                        .parameters(parameters)
                        .resolution("480P")
                        .negativePrompt("")
                        .build();
        // Asynchronous call
        VideoSynthesisResult task = vs.asyncCall(param);
        System.out.println(JsonUtils.toJson(task));
        System.out.println("please wait...");

        // Get the result
        VideoSynthesisResult result = vs.wait(task, apiKey);
        System.out.println(JsonUtils.toJson(result));
    }

    // Get the task list
    public static void listTask() throws ApiException, NoApiKeyException {
        VideoSynthesis is = new VideoSynthesis();
        AsyncTaskListParam param = AsyncTaskListParam.builder().build();
        param.setApiKey(apiKey);
        VideoSynthesisListResult result = is.list(param);
        System.out.println(result);
    }

    // Get a single task result
    public static void fetchTask(String taskId) throws ApiException, NoApiKeyException {
        VideoSynthesis is = new VideoSynthesis();
        // If DASHSCOPE_API_KEY is set as an environment variable, apiKey can be null.
        VideoSynthesisResult result = is.fetch(taskId, apiKey);
        System.out.println(result.getOutput());
        System.out.println(result.getUsage());
    }

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
Response example

1. Response example for creating a task

{
    "request_id": "5dbf9dc5-4f4c-9605-85ea-xxxxxxxx",
    "output": {
        "task_id": "7277e20e-aa01-4709-xxxxxxxx",
        "task_status": "PENDING"
    }
}

2. Response example for querying a task result

The video_url is valid for 24 hours. Download the video promptly.
{
    "request_id": "f1bfb531-6e13-4e17-8e93-xxxxxx",
    "output": {
        "task_id": "9ddebba6-f784-4f55-b845-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "video_count": 1
    }
}

Limitations

  • Data retention: The task_id and video URL are retained for only 24 hours. After this period, they cannot be queried or downloaded.

  • Audio support: The wan2.5 model supports videos with audio, including automatic dubbing or uploading a custom audio file. The wan2.2 and earlier versions only output silent videos. If needed, you can use speech synthesis to generate audio.

  • Content moderation: The input prompt and image, and the output video, are subject to content moderation. Non-compliant content results in an "IPInfringementSuspect" or "DataInspectionFailed" error. For more information, see Error codes.

  • Network access configuration: Video links are stored in Alibaba Cloud OSS. If your business system cannot access external OSS links due to security policies, you must add the following OSS domain names to your network access whitelist.

    # OSS domain name list
    dashscope-result-bj.oss-cn-beijing.aliyuncs.com
    dashscope-result-hz.oss-cn-hangzhou.aliyuncs.com
    dashscope-result-sh.oss-cn-shanghai.aliyuncs.com
    dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com
    dashscope-result-zjk.oss-cn-zhangjiakou.aliyuncs.com
    dashscope-result-sz.oss-cn-shenzhen.aliyuncs.com
    dashscope-result-hy.oss-cn-heyuan.aliyuncs.com
    dashscope-result-cd.oss-cn-chengdu.aliyuncs.com
    dashscope-result-gz.oss-cn-guangzhou.aliyuncs.com
    dashscope-result-wlcb-acdr-1.oss-cn-wulanchabu-acdr-1.aliyuncs.com

Key parameter descriptions

Input image

The input image img_url parameter supports the following three input methods:

Method 1: Public URL

  • A publicly accessible address that supports HTTP/HTTPS.

  • Example: https://example.com/images/cat.png.

Method 2: Base64 encoding

Sample code

import base64
import mimetypes


# --- For Base64 encoding ---
# Format: data:{MIME_type};base64,{base64_data}
def encode_file(file_path):
    mime_type, _ = mimetypes.guess_type(file_path)
    if not mime_type or not mime_type.startswith("image/"):
        raise ValueError("Unsupported or unknown image format")
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return f"data:{mime_type};base64,{encoded_string}"


if __name__ == "__main__":
    print(encode_file("./image_demo_input.png"))
  • Example value: ......(Snippet shown due to length limit). When you make a call, pass the complete string.

  • Encoding format: Use the data:{MIME_type};base64,{base64_data} format, where:

    • {base64_data}: The Base64-encoded string of the image file.

    • {MIME_type}: The media type of the image, which must correspond to the file format.

      Image format

      MIME Type

      JPEG

      image/jpeg

      JPG

      image/jpeg

      PNG

      image/png

      BMP

      image/bmp

      WEBP

      image/webp

Method 3: Local file path (SDK only)

  • Python SDK: Supports both absolute and relative file paths. The file path rules are as follows:

    System

    Passed file path

    Example (absolute path)

    Example (relative path)

    Linux or macOS

    file://{absolute or relative path of the file}

    file:///home/images/test.png

    file://./images/test.png

    Windows

    file://D:/images/test.png

    file://./images/test.png

  • Java SDK: Supports only the absolute path of the file. The file path rules are as follows:

    System

    Passed file path

    Example (absolute path)

    Linux or macOS

    file://{absolute path of the file}

    file:///home/images/test.png

    Windows

    file:///{absolute path of the file}

    file:///D:/images/test.png

Audio settings

Supported models: wan2.6-i2v, wan2.5-i2v-preview.

Audio settings: You can control the audio behavior using the input.audio_url and parameters.audio parameters. Parameter priority: audio_url > audio. Three modes are supported:

  1. Generate a silent video

    1. Parameter settings: Do not pass an audio_url, and set the audio parameter to false.

    2. Scenario: This is useful for creating purely visual content when you plan to add your own audio or music later.

  2. Generate audio automatically

    1. Parameter settings: Do not pass an audio_url, and set the audio parameter to true.

    2. Effect description: The model automatically generates background audio or music that matches the prompt and visual content.

  3. Use custom audio

    1. Parameter settings: Pass an audio_url. The audio parameter is ignored.

    2. Effect description: The video content is synchronized with the audio, including elements such as lip movements and rhythm.

Billing and rate limiting

  • For information about the model's free quota and pricing, see Models.

  • For more information about model rate limiting, see Wan series.

  • Billing description:

    • You are charged based on the duration in seconds of successfully generated videos. A charge is incurred only when the query result API returns a task_status of SUCCEEDED and the video is successfully generated.

    • Failed model calls or processing errors do not incur any fees or consume the free quota.

Error codes

If a model call fails and returns an error message, see Error messages for a solution.

FAQ

Video FAQ quick link: FAQ.

Q: How do I generate a video with a specific aspect ratio, such as 3:4?

A: The aspect ratio of the output video is determined by the input first frame image (img_url), but an exact ratio, such as a strict 3:4, cannot be guaranteed.

How it works: The model uses the aspect ratio of the input image as a baseline and then adapts it to a supported resolution based on the resolution parameter, such as 480P, 720P, or 1080P. Because the output resolution must meet technical requirements where the width and height must be divisible by 16, the final aspect ratio may have a slight deviation, for example, an adjustment from 0.75 to 0.739. This is normal behavior.

  • Example: An input image is 750 × 1000 (aspect ratio 3:4 = 0.75), and `resolution` is set to "720P" (target total pixels approx. 920,000). The actual output is 816 × 1104 (aspect ratio ≈ 0.739, total pixels approx. 900,000).

  • Note that the resolution parameter mainly controls the video's definition (total pixel count). The final video aspect ratio is still based on the input image, with only necessary minor adjustments.

Best practice: To strictly match a target aspect ratio, use an input image with that ratio and then post-process the output video by cropping or padding it. For example, you can use a video editing tool to crop the output video to the target ratio, or add black bars or a blurred background for padding.

Appendix

Examples of basic image-to-video features

Feature

Input first frame image

Input prompt

Output video

Silent video

image

A cat running on the grass