All Products
Search
Document Center

Alibaba Cloud Model Studio:Wan - image-to-video API reference

Last Updated:Dec 29, 2025

The Wan image-to-video model generates a smooth video based on a first frame image and a text prompt. The supported features include the following:

  • Basic features: Select a video duration of 3, 4, 5, 10, or 15 seconds. You can also specify a video resolution of 480P, 720P, or 1080P, use prompt rewriting, and add watermarks.

  • Audio capabilities: Use automatic dubbing or provide a custom audio file for audio-video synchronization. (Supported by wan2.5 and wan2.6)

  • Multi-shot narrative: Maintain subject consistency across shots to create a coherent multi-shot narrative. (Supported only by wan2.6)

Quick links: Try it online (Singapore | Beijing) | Wan official website | Video effect list

Note

The features available on the Wan official website may differ from those supported by the API. This document describes the API's capabilities and is updated promptly to reflect new features.

Model overview

Input first frame image and audio

Output video (wan2.6, multi-shot video)

rap-转换自-png

Input audio:

Input prompt: A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.

Note

Before you make a call, check the Models and pricing in each region.

Prerequisites

Before you make a call, create and configure an API key and then export the API key as an environment variable. To use the SDK, install the DashScope SDK.

Important

The Beijing and Singapore regions have separate API keys and request endpoints. Do not use them interchangeably. Cross-region calls cause authentication failures or service errors.

HTTP

Image-to-video tasks can take a long time to complete, typically 1 to 5 minutes. Therefore, the API uses asynchronous invocation. The process involves two core steps: Create a task -> Poll for results. The steps are as follows:

The actual time required depends on the number of tasks in the queue and the service execution status. Please be patient while you wait for the result.

Step 1: Create a task to get a task ID

Singapore region: POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Beijing region: POST https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Note
  • After the task is created, use the returned task_id to query the result. The task_id is valid for 24 hours. Do not create duplicate tasks. Use polling to retrieve the result.

Request parameters

Multi-shot narrative

Only the wan2.6-i2v model supports generating multi-shot videos.

You can enable this feature by setting "prompt_extend": true and "shot_type":"multi".

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain and configure API keys.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-i2v",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png",
        "audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
    },
    "parameters": {
        "resolution": "720P",
        "prompt_extend": true,
        "duration": 10,
        "shot_type":"multi"
    }
}'

Automatic dubbing

This feature is supported only by wan2.5 and later models.

If you do not provide input.audio_url, the model automatically generates matching background music or sound effects based on the video content.

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain and configure an API key.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.5-i2v-preview",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true,
        "duration": 10
    }
}'

Input audio file

This feature is supported only by wan2.5 and later models.

To specify background music or a voiceover for the video, you can pass the URL of a custom audio file in the input.audio_url parameter.

The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain an API key.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.5-i2v-preview",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png",
        "audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true,
        "duration": 10
    }
}'

Generate a silent video

wan2.2 and earlier models generate silent videos by default and require no parameter settings.

wan2.5 and later models generate videos with audio by default.
The API keys for the Singapore and China (Beijing) regions are different. For more information, see Obtain API key.
The following example uses the base URL for the Singapore region. If you use a model in the China (Beijing) region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.2-i2v-plus",
    "input": {
        "prompt": "A cat running on the grass",
        "img_url": "https://cdn.translate.alibaba.com/r/wanx-demo-1.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true
    }
}'

Use a negative prompt

You can use negative_prompt to prevent the generated video from including "flowers".

The API keys for the Singapore and Beijing regions are different. For more information, see Obtain an API key.
The following example uses the base URL for the Singapore region. If you use a model in the Beijing region, replace the base URL with https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.2-i2v-plus",
    "input": {
        "prompt": "A cat running on the grass",
        "negative_prompt": "flowers",
        "img_url": "https://cdn.translate.alibaba.com/r/wanx-demo-1.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true
    }
}'
Request headers

Content-Type string (Required)

The content type of the request. Set this parameter to application/json.

Authorization string (Required)

The identity authentication credentials for the request. This API uses an Model Studio API key for identity authentication. Example: Bearer sk-xxxx.

X-DashScope-Async string (Required)

The asynchronous processing configuration parameter. HTTP requests support only asynchronous processing. You must set this parameter to enable.

Important

If this request header is missing, the error message "current user api does not support synchronous calls" is returned.

Request body

model string (Required)

The model name. Example: wan2.5-i2v-preview.

For a list of models and their prices, see Model prices.

input object (Required)

Basic input information, such as the prompt.

Properties

prompt string (Optional)

A text prompt specifies the desired elements and visual features for the generated image.

This parameter supports both Chinese and English. Each Chinese character or letter is counted as one character. Any excess characters are automatically truncated. The length limit varies by model version:

  • wan2.6-i2v: Up to 1,500 characters.

  • wan2.5-i2v-preview: Up to 1,500 characters.

  • wan2.2 and earlier models: Up to 800 characters.

Example: A kitten running on the grass.

For prompt usage tips, see Text-to-video/image-to-video prompt guide.

negative_prompt string (Optional)

The negative prompt, which describes content that you do not want to appear in the video. This can be used to constrain the video content.

This parameter supports both Chinese and English. The length is limited to 500 characters. Any excess characters are automatically truncated.

Example: low resolution, error, worst quality, low quality, deformed, extra fingers, bad proportions.

img_url string (Required)

The URL or Base64-encoded data of the first frame image.

Image limits:

  • Image format: JPEG, JPG, PNG (alpha channels are not supported), BMP, or WEBP.

  • Image resolution: The width and height must be between 360 and 2,000 pixels.

  • File size: No more than 10 MB.

Input image instructions:

  1. Use a publicly accessible URL

    • Supports HTTP or HTTPS protocols.

    • Example: https://cdn.translate.alibaba.com/r/wanx-demo-1.png.

  2. Pass a Base64-encoded image string

    • Data format: data:{MIME_type};base64,{base64_data}.

    • Example: ....... (The encoded string is too long and only a snippet is shown.)

    • For more information, see Input image.

audio_url string (Optional)

Supported models: wan2.6-i2v, wan2.5-i2v-preview.

The URL of the audio file. The model uses this audio to generate the video. For more information, see Audio settings.

This parameter supports HTTP or HTTPS protocols.

Audio limits:

  • Format: WAV or MP3.

  • Duration: 3 to 30 s.

  • File size: No more than 15 MB.

  • Handling of exceeded limits: If the audio duration exceeds the duration value (5 or 10 seconds), the audio is automatically truncated to the first 5 or 10 seconds, and the rest is discarded. If the audio duration is shorter than the video duration, the remaining part of the video is silent. For example, if the audio is 3 seconds long and the video is 5 seconds long, the first 3 seconds of the output video have sound, and the last 2 seconds are silent.

Example: https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3.

parameters object (Optional)

Video processing parameters, such as the video resolution, video duration, prompt rewriting, and watermark.

Properties

resolution string (Optional)

Important

The resolution parameter directly affects the cost. For the same model, the cost is as follows: 1080P > 720P > 480P. Before you make a call, confirm the model pricing.

Specifies the resolution tier for the generated video. This setting adjusts the video's definition (total pixels). The model automatically scales the video to a similar total pixel count based on the selected resolution tier. The aspect ratio of the video is kept as consistent as possible with the aspect ratio of the input img_url image. For more information, see the FAQ.

The default value and valid values for this parameter depend on the model parameter, as described in the following list:

  • wan2.6-i2v: Valid values: 720P, 1080P. Default value: 1080P.

  • wan2.5-i2v-preview: Valid values: 480P, 720P, or 1080P. Default: 1080P.

  • wan2.2-i2v-flash: Valid values: 480P, 720P. Default value: 720P.

  • wan2.2-i2v-plus: Valid values: 480P, 1080P. Default value: 1080P.

  • wan2.1-i2v-turbo: Valid values: 480P, 720P. Default value: 720P.

  • wan2.1-i2v-plus: Valid values: 720P. Default value: 720P.

Example: 1080P.

duration integer (Optional)

Important

The duration directly affects the cost. Billing is per second, so a longer duration results in a higher cost. Before you make a call, confirm the model pricing.

The duration of the generated video in seconds. The valid values for this parameter depend on the model parameter:

  • wan2.6-i2v: Valid values: 5, 10, 15. Default value: 5.

  • wan2.5-i2v-preview: Valid values: 5, 10. Default value: 5.

  • wan2.2-i2v-plus: Fixed at 5 seconds and cannot be modified.

  • wan2.2-i2v-flash: Fixed at 5 seconds and cannot be modified.

  • wan2.1-i2v-plus: Fixed at 5 seconds and cannot be modified.

  • wan2.1-i2v-turbo: Valid values: 3, 4, or 5. Default value: 5.

Example: 5.

prompt_extend boolean (Optional)

Specifies whether to enable prompt rewriting. If enabled, a large language model (LLM) rewrites the input prompt. This can significantly improve the generation quality for shorter prompts but increases the time required.

  • true (default)

  • false

Example: true.

shot_type string (Optional)

Supported model: wan2.6-i2v.

Specifies the shot type for the generated video, that is, whether the video consists of a single continuous shot or multiple switched shots.

Condition: This parameter is effective only when "prompt_extend": true.

Parameter priority: shot_type > prompt. For example, if shot_type is set to "single", even if the prompt contains "generate a multi-shot video", the model will still output a single-shot video.

Valid values:

  • single: (default) Outputs a single-shot video.

  • multi: Outputs a multi-shot video.

Example: single.

Note

You can use this parameter to strictly control the narrative structure of the video, for example, using a single shot for a product display or multiple shots for a short story.

watermark boolean (Optional)

Specifies whether to add a watermark. The watermark, which says "AI Generated", is placed in the lower-right corner of the video.

  • false (default)

  • true

Example: false.

seed integer (Optional)

The random number seed. The value must be in the range [0, 2147483647].

If this parameter is not specified, the system automatically generates a random seed. To improve the reproducibility of the generated results, you can set a fixed seed value.

Note that because model generation is probabilistic, using the same seed value does not guarantee that the generated results are identical for every call.

Example: 12345.

Response parameters

Successful response

Save the task_id to query the task status and result.

{
    "output": {
        "task_status": "PENDING",
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
}

Error response

The task creation failed. For more information, see Error messages to resolve the issue.

{
    "code": "InvalidApiKey",
    "message": "No API-key provided.",
    "request_id": "7438d53d-6eb8-4596-8835-xxxxxx"
}

output object

The task output information.

Properties

task_id string

The task ID. The query is valid for 24 hours.

task_status string

The task status.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: The task does not exist or its status cannot be determined.

request_id string

The unique request ID. You can use this ID to trace and troubleshoot issues.

code string

The error code for a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

message string

The detailed information about a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

Step 2: Query the result by task ID

Singapore region: GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

Beijing region: GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Note
  • Polling suggestion: Video generation takes several minutes. Use a polling mechanism and set a reasonable query interval, such as 15 seconds, to retrieve the result.

  • Task status transition: PENDING → RUNNING → SUCCEEDED or FAILED.

  • Result link: After the task is successful, a video link is returned. The link is valid for 24 hours. After you retrieve the link, immediately download and save the video to a permanent storage service, such as Object Storage Service.

  • task_id validity: 24 hours. After this period, you cannot query the result, and the API returns a task status of UNKNOWN.

Request parameters

Query task results

Replace 86ecf553-d340-4e21-xxxxxxxxx with the actual task ID.

The API keys for the Singapore and Beijing regions are different. Create an API key.
The following `base_url` is for the Singapore region. For models in the Beijing region, replace the `base_url` with `https://dashscope.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx`.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/86ecf553-d340-4e21-xxxxxxxxx \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Request headers

Authorization string (Required)

The identity authentication credentials for the request. This API uses an Model Studio API key for identity authentication. Example: Bearer sk-xxxx.

URL path parameters

task_id string (Required)

The task ID.

Response parameters

Task succeeded

Video URLs are retained for only 24 hours and are automatically purged after this period. You must save the generated videos promptly.

{
    "request_id": "2ca1c497-f9e0-449d-9a3f-xxxxxx",
    "output": {
        "task_id": "af6efbc0-4bef-4194-8246-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-09-25 11:07:28.590",
        "scheduled_time": "2025-09-25 11:07:35.349",
        "end_time": "2025-09-25 11:17:11.650",
        "orig_prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "duration": 10,
        "input_video_duration": 0,
        "output_video_duration": 10,
        "video_count": 1,
        "SR": 720
    }
}

Task failed

If a task fails, task_status is set to FAILED, and an error code and message are provided. For more information, see Error messages to resolve the issue.

{
    "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d",
    "output": {
        "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010",
        "task_status": "FAILED",
        "code": "InvalidParameter",
        "message": "The size is not match xxxxxx"
    }
}

Task query expired

The task_id is valid for 24 hours. After this period, the query fails and the following error message is returned.

{
    "request_id": "a4de7c32-7057-9f82-8581-xxxxxx",
    "output": {
        "task_id": "502a00b1-19d9-4839-a82f-xxxxxx",
        "task_status": "UNKNOWN"
    }
}

output object

The task output information.

Properties

task_id string

The task ID. The query is valid for 24 hours.

task_status string

The task status.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: The task does not exist or its status cannot be determined.

Status transitions during polling:

  • PENDING → RUNNING → SUCCEEDED or FAILED.

  • The status of the first query is usually PENDING or RUNNING.

  • If the status changes to SUCCEEDED, the response contains the generated video URL.

  • If the status is FAILED, check the error message and retry.

submit_time string

The time when the task was submitted. The time is in the UTC+8 time zone. The format is YYYY-MM-DD HH:mm:ss.SSS.

scheduled_time string

The time when the task started running. The time is in the UTC+8 time zone. The format is YYYY-MM-DD HH:mm:ss.SSS.

end_time string

The time when the task was completed. The time is in the UTC+8 time zone. The format is YYYY-MM-DD HH:mm:ss.SSS.

video_url string

The video URL. This parameter is returned only if task_status is SUCCEEDED.

The link is valid for 24 hours. You can use this URL to download the video. The video is in MP4 format with H.264 encoding.

orig_prompt string

The original input prompt. This corresponds to the prompt request parameter.

actual_prompt string

When prompt_extend=true, the system intelligently rewrites the input prompt. This field returns the optimized prompt that is actually used for generation.

  • If prompt_extend=false, this field is not returned.

  • Note: The wan2.6 model does not return this field, regardless of the value of prompt_extend.

code string

The error code for a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

message string

The detailed information about a failed request. This parameter is not returned if the request is successful. For more information, see Error messages.

usage object

Usage statistics for the task. Only successful tasks are counted.

Properties

Parameters returned by the wan2.6 model

input_video_duration integer

The duration of the input video in seconds. This is currently fixed at 0 because video input is not supported.

output_video_duration integer

Returned only when you use the wan2.6 model.

The duration of the output video, in seconds. Its value is equal to the value of input.duration.

duration integer

The total video duration, used for billing.

Billing formula: duration = input_video_duration + output_video_duration.

SR integer

Returned only when you use the wan2.6 model. The resolution tier of the generated video. Example: 720.

video_count integer

The number of generated videos. The value is fixed at 1.

Parameters returned by the wan2.2 and wan2.5 models

duration integer

The duration of the generated video in seconds. Valid values: 5, 10.

Billing formula: Cost = Video duration in seconds × Unit price.

SR integer

The resolution of the generated video. Valid values: 480, 720, 1080.

video_count integer

The number of generated videos. The value is fixed at 1.

Parameters returned by the wan2.1 model

video_duration integer

The duration of the generated video in seconds. Valid values: 3, 4, or 5.

Billing formula: Cost = Video duration in seconds × Unit price.

video_ratio string

The aspect ratio of the generated video. The value is fixed at "standard".

video_count integer

The number of generated videos. The value is fixed at 1.

request_id string

The unique request ID. You can use this ID to trace and troubleshoot issues.

DashScope SDK

The parameter names in the SDK are mostly consistent with the HTTP API. The parameter structure is encapsulated based on the features of the programming language.

Because image-to-video tasks can take a long time to complete, typically 1 to 5 minutes, the SDK encapsulates the asynchronous HTTP call process at the underlying layer and supports both synchronous and asynchronous call methods.

The actual time required depends on the number of tasks in the queue and the service execution status. Please be patient while you wait for the result.

Python SDK

The Python SDK supports three image input methods: a public URL, a Base64-encoded string, and a local file path (absolute or relative). You can choose one of these methods. For more information, see Input image.

Important
  • The wan2.6-i2v model does not currently support SDK calls.

  • Make sure that your DashScope Python SDK version is at least 1.25.2 before you run the following code.

    If the version is too low, you may encounter errors such as "url error, please check url!". See Install the SDK to update it.

Sample code

Synchronous invocation

A synchronous call blocks and waits until the video generation is complete and the result is returned. This example shows three image input methods: a public URL, Base64 encoding, and a local file path.

Request example
import base64
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import mimetypes
import dashscope

# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'


# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key = os.getenv("DASHSCOPE_API_KEY")

# --- Helper function for Base64 encoding ---
# Format: data:{MIME_type};base64,{base64_data}
def encode_file(file_path):
    mime_type, _ = mimetypes.guess_type(file_path)
    if not mime_type or not mime_type.startswith("image/"):
        raise ValueError("Unsupported or unknown image format")
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return f"data:{mime_type};base64,{encoded_string}"

"""
Image input methods:
The following are three image input methods.

1. Use a public URL - Suitable for publicly accessible images.
2. Use a local file - Suitable for local development and testing.
3. Use Base64 encoding - Suitable for private images or scenarios requiring encrypted transmission.
"""

# [Method 1] Use a publicly accessible image URL
# Example: Use a public image URL
img_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"

# [Method 2] Use a local file (supports absolute and relative paths)
# Format: file:// + file path
# Example (absolute path):
# img_url = "file://" + "/path/to/your/img.png"    # Linux/macOS
# img_url = "file://" + "/C:/path/to/your/img.png"  # Windows
# Example (relative path):
# img_url = "file://" + "./img.png"                # Path relative to the current execution file

# [Method 3] Use a Base64-encoded image
# img_url = encode_file("./img.png")

# Set the audio URL
audio_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"

def sample_call_i2v():
    # Synchronous call, returns the result directly
    print('please wait...')
    rsp = VideoSynthesis.call(api_key=api_key,
                              model='wan2.5-i2v-preview',
                              prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.',
                              img_url=img_url,
                              audio_url=audio_url,
                              resolution="480P",
                              duration=10,
                              prompt_extend=True,
                              watermark=False,
                              negative_prompt="",
                              seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("video_url:", rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_call_i2v()
Response example
The video_url is valid for 24 hours. Download the video promptly.
{
    "status_code": 200,
    "request_id": "55194b9a-d281-4565-8ef6-xxxxxx",
    "code": null,
    "message": "",
    "output": {
        "task_id": "e2bb35a2-0218-4969-8c0d-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx",
        "submit_time": "2025-10-28 13:45:48.620",
        "scheduled_time": "2025-10-28 13:45:57.378",
        "end_time": "2025-10-28 13:48:05.361",
        "orig_prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "actual_prompt": "A boy made of spray paint emerges from a concrete wall, stands still, and begins to sing an English rap, his mouth opening and closing, his head nodding to the rhythm, and his eyes focused. He gives a thumbs-up with his right hand, puts his left hand on his hip, and moves his body rhythmically in place. The background is a night scene under a railway bridge, lit by a single streetlight. The audio is his rap performance, with the lyrics: 'Skyscrapers loom, shadows kiss the pavement. Dreams stack high, but the soul's in the basement. Pocket full of lint, chasing gold like it's sacred. Every breath a gamble, the odds never patient.'"
    },
    "usage": {
        "video_count": 1,
        "video_duration": 0,
        "video_ratio": "",
        "duration": 10,
        "SR": 480
    }
}

Asynchronous invocation

This example shows an asynchronous call. This method immediately returns a task ID, and you must poll for or wait for the task to complete.

Request example
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import dashscope

# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'


# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key = os.getenv("DASHSCOPE_API_KEY")

# Use a publicly accessible image URL
img_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"

# Set the audio URL
audio_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"


def sample_async_call_i2v():
    # Asynchronous call, returns a task_id
    rsp = VideoSynthesis.async_call(api_key=api_key,
                                    model='wan2.5-i2v-preview',
                                    prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.',
                                    img_url=img_url,
                                    audio_url=audio_url,
                                    resolution="480P",
                                    duration=10,
                                    prompt_extend=True,
                                    watermark=False,
                                    negative_prompt="",
                                    seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("task_id: %s" % rsp.output.task_id)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))

    # Get asynchronous task information
    status = VideoSynthesis.fetch(task=rsp, api_key=api_key)
    if status.status_code == HTTPStatus.OK:
        print(status.output.task_status)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (status.status_code, status.code, status.message))

    # Wait for the asynchronous task to finish
    rsp = VideoSynthesis.wait(task=rsp, api_key=api_key)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print(rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_async_call_i2v()
Response example

1. Response example for creating a task

{
    "status_code": 200,
    "request_id": "6dc3bf6c-be18-9268-9c27-xxxxxx",
    "code": "",
    "message": "",
    "output": {
        "task_id": "686391d9-7ecf-4290-a8e9-xxxxxx",
        "task_status": "PENDING",
        "video_url": ""
    },
    "usage": null
}

2. Response example for querying a task result

The video_url is valid for 24 hours. Download the video promptly.
{
    "status_code": 200,
    "request_id": "55194b9a-d281-4565-8ef6-xxxxxx",
    "code": null,
    "message": "",
    "output": {
        "task_id": "e2bb35a2-0218-4969-8c0d-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx",
        "submit_time": "2025-10-28 13:45:48.620",
        "scheduled_time": "2025-10-28 13:45:57.378",
        "end_time": "2025-10-28 13:48:05.361",
        "orig_prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "actual_prompt": "A boy made of spray paint emerges from a concrete wall, stands still, and begins to sing an English rap, his mouth opening and closing, his head nodding to the rhythm, and his eyes focused. He gives a thumbs-up with his right hand, puts his left hand on his hip, and moves his body rhythmically in place. The background is a night scene under a railway bridge, lit by a single streetlight. The audio is his rap performance, with the lyrics: 'Skyscrapers loom, shadows kiss the pavement. Dreams stack high, but the soul's in the basement. Pocket full of lint, chasing gold like it's sacred. Every breath a gamble, the odds never patient.'"
    },
    "usage": {
        "video_count": 1,
        "video_duration": 0,
        "video_ratio": "",
        "duration": 10,
        "SR": 480
    }
}

Java SDK

The Java SDK supports three image input methods: a public URL, a Base64-encoded string, or a local file path (absolute paths only). For more information, see Input image.

Important
  • The wan2.6-i2v model does not currently support SDK calls.

  • Make sure that your DashScope Java SDK version is at least 2.22.2 before you run the following code.

    An outdated version may trigger errors such as "url error, please check url!". For more information, see Install the SDK to update it.

Sample code

Synchronous invocation

A synchronous call blocks and waits until the video generation is complete and the result is returned. This example shows three image input methods: a public URL, Base64 encoding, and a local file path.

Request example
// Copyright (c) Alibaba, Inc. and its affiliates.

import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;

 
public class Image2Video {

    static {
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        // The preceding is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
    }

    // If you have not configured an environment variable, replace the following line with your Model Studio API key: apiKey="sk-xxx"
    // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
    /**
     * Image input methods: Choose one of the following three.
     *
     * 1. Use a public URL - Suitable for publicly accessible images.
     * 2. Use a local file - Suitable for local development and testing.
     * 3. Use Base64 encoding - Suitable for private images or scenarios requiring encrypted transmission.
     */

    // [Method 1] Public URL
    static String imgUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png";

    // [Method 2] Local file path (file:// + absolute path)
    // static String imgUrl = "file://" + "/your/path/to/img.png";    // Linux/macOS
    // static String imgUrl = "file://" + "/C:/your/path/to/img.png";  // Windows

    // [Method 3] Base64 encoding
    // static String imgUrl = Image2Video.encodeFile("/your/path/to/img.png");
    
    // Set the audio URL
    static String audioUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3";

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        // Set the parameters
        Map<String, Object> parameters = new HashMap<>();
        parameters.put("prompt_extend", true);
        parameters.put("watermark", false);
        parameters.put("seed", 12345);

        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.5-i2v-preview")
                        .prompt("A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.")
                        .imgUrl(imgUrl)
                        .audioUrl(audioUrl)
                        .duration(10)
                        .parameters(parameters)
                        .resolution("480P")
                        .negativePrompt("")
                        .build();
        System.out.println("please wait...");
        VideoSynthesisResult result = vs.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    
     /**
     * Encodes a file into a Base64 string.
     * @param filePath The file path.
     * @return A Base64 string in the format: data:{MIME_type};base64,{base64_data}
     */
    public static String encodeFile(String filePath) {
        Path path = Paths.get(filePath);
        if (!Files.exists(path)) {
            throw new IllegalArgumentException("File does not exist: " + filePath);
        }
        // Detect MIME type
        String mimeType = null;
        try {
            mimeType = Files.probeContentType(path);
        } catch (IOException e) {
            throw new IllegalArgumentException("Cannot detect file type: " + filePath);
        }
        if (mimeType == null || !mimeType.startsWith("image/")) {
            throw new IllegalArgumentException("Unsupported or unknown image format");
        }
        // Read file content and encode
        byte[] fileBytes = null;
        try{
            fileBytes = Files.readAllBytes(path);
        } catch (IOException e) {
            throw new IllegalArgumentException("Cannot read file content: " + filePath);
        }
    
        String encodedString = Base64.getEncoder().encodeToString(fileBytes);
        return "data:" + mimeType + ";base64," + encodedString;
    }
    

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
Response example
The video_url is valid for 24 hours. Download the video promptly.
{
    "request_id": "f1bfb531-6e13-4e17-8e93-xxxxxx",
    "output": {
        "task_id": "9ddebba6-f784-4f55-b845-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "video_count": 1
    }
}

Asynchronous invocation

This example shows an asynchronous call. This method immediately returns a task ID, and you must poll for or wait for the task to complete.

Request example
// Copyright (c) Alibaba, Inc. and its affiliates.

import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisListResult;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.task.AsyncTaskListParam;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

import java.util.HashMap;
import java.util.Map;

public class Image2Video {

    static {
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    }

    // If you have not configured an environment variable, replace the following line with your Model Studio API key: apiKey="sk-xxx"
    // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");
    // Set the input image URL
    static String imgUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png";

    // Set the audio URL
    static String audioUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3";

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        // Set the parameters
        Map<String, Object> parameters = new HashMap<>();
        parameters.put("prompt_extend", true);
        parameters.put("watermark", false);
        parameters.put("seed", 12345);

        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.5-i2v-preview")
                        .prompt("A scene of urban fantasy art. A dynamic graffiti art character. A boy painted with spray paint comes to life from a concrete wall. He sings an English rap song at a very fast pace while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single streetlight, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.")
                        .imgUrl(imgUrl)
                        .audioUrl(audioUrl)
                        .duration(10)
                        .parameters(parameters)
                        .resolution("480P")
                        .negativePrompt("")
                        .build();
        // Asynchronous call
        VideoSynthesisResult task = vs.asyncCall(param);
        System.out.println(JsonUtils.toJson(task));
        System.out.println("please wait...");

        // Get the result
        VideoSynthesisResult result = vs.wait(task, apiKey);
        System.out.println(JsonUtils.toJson(result));
    }

    // Get the task list
    public static void listTask() throws ApiException, NoApiKeyException {
        VideoSynthesis is = new VideoSynthesis();
        AsyncTaskListParam param = AsyncTaskListParam.builder().build();
        param.setApiKey(apiKey);
        VideoSynthesisListResult result = is.list(param);
        System.out.println(result);
    }

    // Get a single task result
    public static void fetchTask(String taskId) throws ApiException, NoApiKeyException {
        VideoSynthesis is = new VideoSynthesis();
        // If DASHSCOPE_API_KEY is set as an environment variable, apiKey can be null.
        VideoSynthesisResult result = is.fetch(taskId, apiKey);
        System.out.println(result.getOutput());
        System.out.println(result.getUsage());
    }

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
Response example

1. Response example for creating a task

{
    "request_id": "5dbf9dc5-4f4c-9605-85ea-xxxxxxxx",
    "output": {
        "task_id": "7277e20e-aa01-4709-xxxxxxxx",
        "task_status": "PENDING"
    }
}

2. Response example for querying a task result

The video_url is valid for 24 hours. Download the video promptly.
{
    "request_id": "f1bfb531-6e13-4e17-8e93-xxxxxx",
    "output": {
        "task_id": "9ddebba6-f784-4f55-b845-xxxxxx",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "video_count": 1
    }
}

Limitations

  • Data retention: The `task_id` and video URL are retained for only 24 hours. After this period, they cannot be queried or downloaded.

  • Audio support: wan2.5 and later versions generate videos with audio by default and support automatic dubbing or uploading a custom audio file. wan2.2 and earlier versions only output silent videos. If needed, you can use speech synthesis to generate audio.

  • Content moderation: The input content, such as prompts and images, and the output video are subject to content moderation. Non-compliant content results in an "IPInfringementSuspect" or "DataInspectionFailed" error. For more information, see Error messages.

  • Network access configuration: Video links are stored in Object Storage Service (OSS). If your business system cannot access external OSS links due to security policies, you must add the following OSS domain names to your network access whitelist.

    # OSS domain name list
    dashscope-result-bj.oss-cn-beijing.aliyuncs.com
    dashscope-result-hz.oss-cn-hangzhou.aliyuncs.com
    dashscope-result-sh.oss-cn-shanghai.aliyuncs.com
    dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com
    dashscope-result-zjk.oss-cn-zhangjiakou.aliyuncs.com
    dashscope-result-sz.oss-cn-shenzhen.aliyuncs.com
    dashscope-result-hy.oss-cn-heyuan.aliyuncs.com
    dashscope-result-cd.oss-cn-chengdu.aliyuncs.com
    dashscope-result-gz.oss-cn-guangzhou.aliyuncs.com
    dashscope-result-wlcb-acdr-1.oss-cn-wulanchabu-acdr-1.aliyuncs.com

Key parameter descriptions

Input image

The input image img_url parameter supports the following three input methods:

Method 1: Public URL

  • A publicly accessible address that supports HTTP or HTTPS.

  • Example: https://example.com/images/cat.png.

Method 2: Base64 encoding

Sample code

import base64
import mimetypes


# --- For Base64 encoding ---
# Format: data:{MIME_type};base64,{base64_data}
def encode_file(file_path):
    mime_type, _ = mimetypes.guess_type(file_path)
    if not mime_type or not mime_type.startswith("image/"):
        raise ValueError("Unsupported or unknown image format")
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return f"data:{mime_type};base64,{encoded_string}"


if __name__ == "__main__":
    print(encode_file("./image_demo_input.png"))
  • Example value: ......(Snippet shown due to length limit). When you make a call, pass the complete string.

  • Encoding format: Use the data:{MIME_type};base64,{base64_data} format, where:

    • {base64_data}: The Base64-encoded string of the image file.

    • {MIME_type}: The media type of the image, which must correspond to the file format.

      Image format

      MIME Type

      JPEG

      image/jpeg

      JPG

      image/jpeg

      PNG

      image/png

      BMP

      image/bmp

      WEBP

      image/webp

Method 3: Local file path (SDK only)

  • Python SDK: Supports both absolute and relative file paths. The file path rules are as follows:

    System

    Passed file path

    Example (absolute path)

    Example (relative path)

    Linux or macOS

    file://{absolute or relative path of the file}

    file:///home/images/test.png

    file://./images/test.png

    Windows

    file://D:/images/test.png

    file://./images/test.png

  • Java SDK: Supports only the absolute path of the file. The file path rules are as follows:

    System

    Passed file path

    Example (absolute path)

    Linux or macOS

    file://{absolute path of the file}

    file:///home/images/test.png

    Windows operating system

    file:///{absolute path of the file}

    file:///D:/images/test.png

Audio settings

Supported models: wan2.6-i2v, wan2.5-i2v-preview.

Audio settings: wan2.5 and later models generate videos with audio by default. The audio behavior is determined by whether input.audio_url is passed. Two modes are supported:

  1. Automatic dubbing: When audio_url is not passed, the model automatically generates matching background audio or music based on the prompt and visual content.

  2. Use custom audio: When audio_url is passed, the model uses your provided audio file to generate the video. The video content is synchronized with the audio, such as lip movements and rhythm.

Billing and rate limiting

  • For free quota and pricing, see Models.

  • For model rate limits, see Wan series.

  • Billing description:

    • You are charged based on the duration in seconds of successfully generated videos. A charge is incurred only when the query result API returns a task_status of SUCCEEDED and the video is successfully generated.

    • Failed model calls or processing errors do not incur any fees or consume the free quota.

Error codes

If a model call fails and returns an error message, see Error messages for a solution.

FAQ

Video FAQ quick link: FAQ.

Q: How do I generate a video with a specific aspect ratio, such as 3:4?

A: The aspect ratio of the output video is determined by the input first frame image (img_url), but an exact ratio, such as a strict 3:4, cannot be guaranteed.

How it works: The model uses the aspect ratio of the input image as a baseline and then adapts it to a supported resolution based on the resolution parameter, such as 480P, 720P, or 1080P. Because the output resolution must meet technical requirements where the width and height must be divisible by 16, the final aspect ratio may have a slight deviation, for example, an adjustment from 0.75 to 0.739. This is normal behavior.

  • Example: An input image is 750 × 1000 (aspect ratio 3:4 = 0.75), and `resolution` is set to "720P" (target total pixels approx. 920,000). The actual output is 816 × 1104 (aspect ratio ≈ 0.739, total pixels approx. 900,000).

  • Note that the resolution parameter mainly controls the video's definition (total pixel count). The final video aspect ratio is still based on the input image, with only necessary minor adjustments.

Best practices: To strictly match a target aspect ratio, use an input image with that ratio and then post-process the output video by cropping or padding it. For example, you can use a video editing tool to crop the output video to the target ratio, or add black bars or a blurred background for padding.

Appendix

Examples of basic image-to-video features

Model Features

Input first frame image

Input prompt

Output video

Silent video

image

A cat running on the grass