All Products
Search
Document Center

Alibaba Cloud Model Studio:Wan image-to-video API reference

Last Updated:Apr 03, 2026

The Wan image-to-video model supports multi-modal input (text, images, audio, and video) and can perform three main tasks: video generation from the first frame, video generation from the first and last frames, and video continuation.

Note

The new image-to-video API (wan2.7-i2v model) supports these three tasks. Use this new API.

The original image-to-video from first frame API (wan2.6 and earlier models) supports only video generation from the first frame.

Availability

For successful API calls, use the same region for model, endpoint URL, and API key. Cross-region calls will fail.

Note

The sample code in this topic applies to the Singapore region.

HTTP

Important

This API uses the new image-to-video protocol and supports only the wan2.7 model.

Image-to-video tasks take 1 to 5 minutes, so the API uses asynchronous invocation. The process has two steps:

Step 1: Create a task and get the task ID

Singapore

POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Beijing

POST https://dashscope.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis

Note
  • After the task is created, use the returned task_id to query the result. The task_id is valid for 24 hours. Do not create duplicate tasks. Instead, use polling to retrieve the result.

  • For a beginner's tutorial, see Postman.

Request parameters

Video generation from the first frame

Generate a video based on a first frame image and audio.

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.7-i2v",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He sings an English rap song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The light comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise.",
        "media": [
            {
                "type": "first_frame",
                "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"
            },
            {
                "type": "driving_audio",
                "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
                
            }
        ]
    },
    "parameters": {
        "resolution": "720P",
        "duration": 10,
        "prompt_extend": true,
        "watermark": true
    }
}'

Video generation from the first and last frames

Pass a first frame and a last frame to generate a video.

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.7-i2v",
    "input": {
        "prompt": "Realistic style, a small black cat looks up at the sky curiously. The camera angle gradually rises from eye level, finally capturing its curious gaze from a top-down view.",
        "media": [
            {
                "type": "first_frame",
                "url": "https://wanx.alicdn.com/material/20250318/first_frame.png"
            },
            {
                "type": "last_frame",
                "url": "https://wanx.alicdn.com/material/20250318/last_frame.png"
            }
        ]
    },
    "parameters": {
        "resolution": "720P",
        "duration": 10,
        "prompt_extend": false,
        "watermark": true
    }
}'

Video continuation

Generate subsequent content based on an initial video clip.

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.7-i2v",
    "input": {
        "prompt": "A dog wearing sunglasses skateboards on a street, 3D cartoon.",
        "media": [
            {
                "type": "first_clip",
                "url": "http://wanx.alicdn.com/material/20250318/video_extension_1.mp4"
            }
        ]
    },
    "parameters": {
        "resolution": "720P",
        "duration": 10,
        "prompt_extend": true,
        "watermark": true
    }
}'

Content-Type string (Required)

The content type of the request. Must be application/json.

Authorization string (Required)

The authentication credentials using a Model Studio API key.

Example: Bearer sk-xxxx

X-DashScope-Async string (Required)

Enables asynchronous processing. Must be enable as HTTP requests support only asynchronous processing.

Important

Returns "current user api does not support synchronous calls" error if not included.

Request body

model string (Required)

The model name. For a list of models and their pricing, see Model pricing.

Example: wan2.7-i2v.

input object (Required)

The basic input information, such as the prompt.

Properties

prompt string (Optional)

Text prompt: describes the elements and visual characteristics for the generated video.

Chinese and English are supported. Each Chinese character or letter counts as one character. Text that exceeds the limit is automatically truncated. The length limit varies by model version:

  • wan2.7-i2v: up to 5,000 characters.

Example: A kitten runs on the grass.

For more information about how to use prompts, see Prompt guide for text-to-video and image-to-video.

negative_prompt string (Optional)

The negative prompt. Describes content you do not want in the video.

Chinese and English are supported. The prompt can be up to 500 characters long. Text that exceeds the limit is automatically truncated.

Example: low resolution, error, worst quality, low quality, deformed, extra fingers, bad proportions.

media array (Required)

Specifies reference materials (images, audio, and video) for video generation.

Each element in the array is a media object that contains the type and url fields.

Asset combinations

Only the following asset combinations are supported. Invalid combinations result in an error.

  • Video generation from the first frame:

    • First frame: first_frame

    • First frame + audio: first_frame+driving_audio

  • Video generation from the first and last frames:

    • First frame + last frame: first_frame+last_frame

    • First frame + last frame + audio: first_frame+last_frame+driving_audio

  • Video continuation:

    • First video clip continuation: first_clip

    • First video clip + last frame continuation: first_clip+last_frame

Properties

type string (Required)

The type of media asset. Valid values:

  • first_frame

  • last_frame

  • driving_audio

  • first_clip

Limit: Each type can appear at most once in the media array. For example, you cannot pass two first_frame assets.

url string (Required)

The URL of the media asset.

Pass an image (type=first_frame or last_frame)

The URL of the first or last frame.

Image limits:

  • Format: JPEG, JPG, PNG (alpha channel not supported), BMP, WEBP.

  • Resolution: The width and height must be in the range of [240, 8000] pixels.

  • Aspect ratio: 1:8 to 8:1.

  • File size: up to 20 MB.

Supported input formats:

  1. Public URL:

    • The HTTP or HTTPS protocol is supported.

    • Example: https://xxx/xxx.png.

Pass audio (type=driving_audio)

The URL of the audio file.

  • Pass audio: The model uses the audio as a driving source to generate the video, such as for lip-syncing and action timing.

  • Do not pass audio: The model automatically generates matching background music or sound effects based on the video content.

Audio limits:

  • Format: WAV, MP3.

  • Duration: 2 s to 30 s.

  • File size: up to 15 MB.

  • Truncation: If the audio duration exceeds the duration value, for example, 5 s, the first 5 s are used and the remaining audio is discarded. If the audio is shorter than the video, the portion of the video exceeding the audio duration will be silent. For example, if the audio is 3 s long and the video is 5 s long, the first 3 s of the output video will have sound, and the last 2 s will be silent.

Supported input formats:

  1. Public URL:

    • The HTTP and HTTPS protocols are supported.

    • Example: https://xxx/xxx.mp3.

Pass a video (type=first_clip)

The URL of the video file. The model continues the video based on its content. The maximum duration of the continuation is controlled by the duration parameter.

For example, if duration=15 and the input video is 3 s long, the model generates a 12-s continuation. The final output video is 15 s long and is billed for 15 s.

Video limits:

  • Format: MP4, MOV.

  • Duration: 2 s to 10 s.

  • Resolution: The width and height must be in the range of [240, 4096] pixels.

  • Aspect ratio: 1:8 to 8:1.

  • File size: up to 100 MB.

Supported input formats:

  1. Public URL:

    • The HTTP and HTTPS protocols are supported.

    • Example: https://xxx/xxx.mp4.

parameters object (Optional)

Video processing parameters, such as resolution, duration, prompt rewriting, and watermarks.

Properties

resolution string (Optional)

Important

The resolution directly affects the cost. Before you make a call, confirm the Model pricing.

The resolution tier for the generated video. Controls the total pixel count.

The model automatically scales the video to a total pixel count close to the selected resolution tier. The video's aspect ratio should be as consistent as possible with the input material (first frame or first video clip). For more information, see FAQ.

  • wan2.7-i2v: Valid values are 720P and 1080P. Default: 1080P.

Example: 1080P.

duration integer (Optional)

Important

The duration directly affects the cost. Billing is by the second. Before you make a call, confirm the Model pricing.

The duration of the generated video in seconds.

  • wan2.7-i2v: an integer from 2 to 15. Default: 5.

Example: 5.

prompt_extendboolean (Optional)

Specifies whether to enable prompt rewriting. When enabled, a large language model rewrites the input prompt. This can improve results for short prompts but increases the running time.

  • true (default)

  • false

Example: true.

watermark boolean (Optional)

Specifies whether to add a watermark. The watermark is placed in the lower-right corner of the video and contains the fixed text "AI Generated".

  • false (default)

  • true

Example: false.

seed integer (Optional)

The random number seed. Must be an integer between 0 and 2147483647.

If not provided, a random seed is generated. Using a fixed seed improves reproducibility, though results may still vary due to model randomness.

Example: 12345

Response parameters

Successful response

Save the task_id to query the task status and result.

{
    "output": {
        "task_status": "PENDING",
        "task_id": "0385dc79-5ff8-4d82-bcb6-xxxxxx"
    },
    "request_id": "4909100c-7b5a-9f92-bfe5-xxxxxx"
}

Error response

Task creation failed. See error codes to resolve the issue.

{
    "code": "InvalidApiKey",
    "message": "No API-key provided.",
    "request_id": "7438d53d-6eb8-4596-8835-xxxxxx"
}

output object

The task output information.

Properties

task_id string

The ID of the task. Can be used to query the task for up to 24 hours.

task_status string

The status of the task.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: Task does not exist or status is unknown

request_id string

Unique identifier for the request. Use for tracing and troubleshooting issues.

code string

The error code. Returned only when the request fails. See error codes for details.

message string

Detailed error message. Returned only when the request fails. See error codes for details.

Step 2: Query the result by task ID

Singapore

GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

Beijing

GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Note
  • Polling suggestion: Video generation can take several minutes. Use a polling mechanism with a reasonable query interval, such as 15 seconds, to retrieve the result.

  • Task status transition: PENDING → RUNNING → SUCCEEDED or FAILED.

  • Result URL: After the task is successful, a video URL is returned. The URL is valid for 24 hours. After you retrieve the URL, you must immediately download and save the video to a permanent storage service, such as Object Storage Service (OSS).

  • task_id validity: 24 hours. After this period, you cannot query the result, and the API returns a task status of UNKNOWN.

Request parameters

Query task result

Replace {task_id} with the task_id value returned by the previous API call. task_id is valid for queries within 24 hours.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Headers

Authorization string (Required)

The authentication credentials using a Model Studio API key.

Example: Bearer sk-xxxx

Path parameters

task_id string (Required)

The ID of the task to query.

Response parameters

Task successful

Video URLs are retained for only 24 hours and then automatically purged. Save generated videos promptly.

{
    "request_id": "2ca1c497-f9e0-449d-9a3f-xxxxxx",
    "output": {
        "task_id": "af6efbc0-4bef-4194-8246-xxxxxx",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-09-25 11:07:28.590",
        "scheduled_time": "2025-09-25 11:07:35.349",
        "end_time": "2025-09-25 11:17:11.650",
        "orig_prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He sings an English rap song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The light comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "video_url": "https://dashscope-result-sh.oss-cn-shanghai.aliyuncs.com/xxx.mp4?Expires=xxx"
    },
    "usage": {
        "duration": 15,
        "input_video_duration": 0,
        "output_video_duration": 15,
        "video_count": 1,
        "SR": 720
    }
}

Task failed

When a task fails, task_status is set to FAILED with an error code and message. See error codes to resolve the issue.

{
    "request_id": "e5d70b02-ebd3-98ce-9fe8-759d7d7b107d",
    "output": {
        "task_id": "86ecf553-d340-4e21-af6e-a0c6a421c010",
        "task_status": "FAILED",
        "code": "InvalidParameter",
        "message": "The size is not match xxxxxx"
    }
}

Task query expired

The task_id is valid for 24 hours. After this period, queries fail and return the following error message.

{
    "request_id": "a4de7c32-7057-9f82-8581-xxxxxx",
    "output": {
        "task_id": "502a00b1-19d9-4839-a82f-xxxxxx",
        "task_status": "UNKNOWN"
    }
}

output object

The task output information.

Properties

task_id string

The ID of the task. Can be used to query the task for up to 24 hours.

task_status string

The status of the task.

Enumeration

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • CANCELED

  • UNKNOWN: Task does not exist or status is unknown

Status transitions during polling:

  • PENDING → RUNNING → SUCCEEDED or FAILED

  • First query typically returns PENDING or RUNNING

  • SUCCEEDED status includes the generated video URL in the response

  • FAILED status requires checking the error message and retrying

submit_time string

The time when the task was submitted. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

scheduled_time string

The time when the task started running. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

end_time string

The time when the task was completed. Time is in UTC+8. Format: YYYY-MM-DD HH:mm:ss.SSS.

video_url string

The URL of the generated video. Returned only when task_status is SUCCEEDED.

URL is valid for 24 hours. Use to download the video in MP4 format with H.264 encoding.

orig_prompt string

The original input prompt. This is the value of the prompt request parameter.

code string

The error code. Returned only when the request fails. See error codes for details.

message string

Detailed error message. Returned only when the request fails. See error codes for details.

usage object

Statistics for the output information. Only successful results are counted.

Properties

input_video_duration integer

The duration of the input video in seconds.

output_video_duration integer

The duration of the output video in seconds.

duration integer

The total video duration, used for billing.

SR integer

The resolution tier of the output video. Example: 720.

video_count integer

The number of output videos. The value is fixed at 1.

request_id string

Unique identifier for the request. Use for tracing and troubleshooting issues.

Limitations

  • Data validity: The task_id and video_url are retained for only 24 hours. After this period, you cannot query or download them.

  • Content moderation: The input content (such as prompts, images, and videos) and the output video are subject to content moderation. If the content violates the rules, the system returns an "IPInfringementSuspect" or "DataInspectionFailed" error. For more information, see Error messages.

Error codes

If a model call fails and returns an error message, see Error messages to resolve the issue.

FAQ

Q: How do I generate a video with a specific aspect ratio, such as 3:4?

A: The output video's aspect ratio is determined by the input material (first frame image or first video clip). However, the output aspect ratio is not guaranteed to be exactly the same as the input ratio. For example, it may not be exactly 3:4. A slight drift may occur.

The following example explains the logic using an "input first frame image":

  • Why does bias occur?

    • Execution logic: The system uses the input image's aspect ratio as a baseline reference. It combines this with the target total pixels of the resolution tier. The video width and height must be multiples of 16 because of video encoding specifications. The system automatically adjusts to the closest valid resolution.

    • Calculation example: An input first frame image measures 750 × 1000 pixels (aspect ratio 3:4 = 0.75). The resolution is set to "720P" (with a target of approximately 920,000 total pixels). The actual output video resolution is 816 × 1104 pixels (aspect ratio ≈ 0.739, approximately 900,000 total pixels).

  • Recommendations:

    • Input control: Use a first frame or video clip that matches your target aspect ratio.

    • Post-processing: If you have strict aspect ratio requirements, use an editing tool to crop the video or add black bars after generation.

Q: How do I get the whitelist of domain names for video storage access?

A: Videos generated by models are stored in OSS. The API returns a temporary public URL. To configure a firewall whitelist for this download URL, note the following: The underlying storage may change dynamically. This topic does not provide a fixed OSS domain name whitelist to prevent access issues caused by outdated information. If you have security control requirements, contact your account manager to obtain the latest OSS domain name list.