All Products
Search
Document Center

Alibaba Cloud Model Studio:Image-to-video: first frame

Last Updated:Mar 27, 2026

Generate videos from an image and text prompt using the Wan model. Supports multimodal input (text, image, audio) and outputs MP4 videos up to 15 seconds at up to 1080P.

  • Duration: 2 to 15 seconds (integer values only)

  • Resolution: 480P, 720P, or 1080P

  • Audio: Automatic dubbing or custom audio upload for audio-video sync (wan2.5 and wan2.6)

  • Multi-shot narrative: Automatic shot transitions while keeping the subject consistent across shots (wan2.6 only)

  • Additional options: Prompt rewriting and watermarks

Quick links: Try it online (Singapore | Virginia | Beijing) | API reference

Quick start

Example: Generate a multi-shot video with audio from an image and text prompt:

Input prompt

Input first frame

Output video (multi-shot with audio)

The camera slowly moves up from below the sea turtle. The turtle swims leisurely, and the details of its belly are clearly visible.

wan-i2v-haigui

Prerequisites

Before you begin:

Python

Important

Requires DashScope Python SDK 1.25.8 or later. Older versions may produce a "url error, please check url!" error. Install or upgrade the SDK.

The VideoSynthesis.call() method submits an async video generation task and blocks until completion.

import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import dashscope

# Set the base URL for your region (Singapore shown here)
# For other regions, see the Availability section below
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Read API key from environment variable
api_key = os.getenv("DASHSCOPE_API_KEY", "YOUR_API_KEY")

print('please wait...')
rsp = VideoSynthesis.call(api_key=api_key,
                          model='wan2.6-i2v-flash',
                          prompt='The camera slowly moves up from below the sea turtle. The turtle swims leisurely, and the details of its belly are clearly visible.',
                          img_url="https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260121/zlpocv/wan-i2v-haigui.webp",
                          resolution="720P",
                          duration=10,
                          shot_type="multi",
                          prompt_extend=True,
                          watermark=True)
print(rsp)
if rsp.status_code == HTTPStatus.OK:
    print("video_url:", rsp.output.video_url)
else:
    print('Failed, status_code: %s, code: %s, message: %s' % (rsp.status_code, rsp.code, rsp.message))

Java

Important

Requires DashScope Java SDK 2.22.6 or later. Older versions may produce a "url error, please check url!" error. Install or upgrade the SDK.

import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Image2Video {

    static {
        // Set the base URL for your region (Singapore shown here)
        // For other regions, see the Availability section below
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    }

    // Read API key from environment variable
    // To hardcode instead: apiKey="sk-xxx"
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.6-i2v-flash")
                        .prompt("The camera slowly moves up from below the sea turtle. The turtle swims leisurely, and the details of its belly are clearly visible.")
                        .imgUrl("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260121/zlpocv/wan-i2v-haigui.webp")
                        .duration(10)
                        .resolution("720P")
                        .shotType("multi")
                        .promptExtend(true)
                        .watermark(true)
                        .build();
        System.out.println("please wait...");
        VideoSynthesisResult result = vs.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

The HTTP API is asynchronous. Submit a task in step 1, then poll for the result in step 2.

Step 1: Submit a task

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-i2v-flash",
    "input": {
        "prompt": "The camera slowly moves up from below the sea turtle. The turtle swims leisurely, and the details of its belly are clearly visible.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260121/zlpocv/wan-i2v-haigui.webp"
    },
    "parameters": {
        "resolution": "720P",
        "prompt_extend": true,
        "watermark": true,
        "duration": 10,
        "shot_type":"multi"
    }
}'

Step 2: Get the result

Replace {task_id} with the task_id value from the step 1 response.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Availability

Supported models vary by region. Model name, endpoint, and API key must all match the same region, or calls fail.

Model overview

Model

Audio

Multi-shot

Resolution

Duration

Recommended for

wan2.6-i2v-flash

With or without audio

Yes

720P, 1080P

2-15s

General use (International, Chinese Mainland)

wan2.6-i2v

With audio

Yes

720P, 1080P

2-15s (International, Chinese Mainland); 5s, 10s, 15s (Global)

High-quality generation (International, Chinese Mainland, Global)

wan2.6-i2v-us

With audio

Yes

720P, 1080P

5s, 10s, 15s

US-restricted compute

wan2.5-i2v-preview

With audio

No

480P, 720P, 1080P

5s, 10s

Audio sync without multi-shot

wan2.2-i2v-flash

No

No

480P, 720P, 1080P

5s

Fast silent video (50% faster than 2.1)

wan2.2-i2v-plus

No

No

480P, 1080P

5s

Improved stability and success rate over 2.1

wan2.1-i2v-plus

No

No

720P

5s

wan2.1-i2v-turbo

No

No

480P, 720P

3s, 4s, 5s

All models output MP4 (H.264, 30fps).

Regional availability

Global

Access and data: US (Virginia). Compute: worldwide. Learn more

Model

Features

Input modalities

Resolution

Duration

wan2.6-i2v Recommended

Video with audio, multi-shot narrative, audio-video sync

Text, image, audio

720P, 1080P

5s, 10s, 15s

International

Access and data: Singapore. Compute: worldwide (excluding Chinese mainland). Learn more

Model

Features

Input modalities

Resolution

Duration

wan2.6-i2v-flash Recommended

Video with or without audio, multi-shot narrative, audio-video sync

Text, image, audio

720P, 1080P

2-15s (integer)

wan2.6-i2v Recommended

Video with audio, multi-shot narrative, audio-video sync

Text, image, audio

720P, 1080P

2-15s (integer)

wan2.5-i2v-preview

Video with audio, audio-video sync

Text, image, audio

480P, 720P, 1080P

5s, 10s

wan2.2-i2v-flash

Video without audio (50% faster than 2.1)

Text, image

480P, 720P, 1080P

5s

wan2.2-i2v-plus

Video without audio (improved stability and success rate over 2.1)

Text, image

480P, 1080P

5s

wan2.1-i2v-plus

Video without audio

Text, image

720P

5s

wan2.1-i2v-turbo

Video without audio

Text, image

480P, 720P

3s, 4s, 5s

US

Access and data: US (Virginia). Compute: US only. Learn more

Model

Features

Input modalities

Resolution

Duration

wan2.6-i2v-us Recommended

Video with audio, multi-shot narrative, audio-video sync

Text, image, audio

720P, 1080P

5s, 10s, 15s

Chinese Mainland

Access and data: Beijing. Compute: Chinese mainland only. Learn more

Model

Features

Input modalities

Resolution

Duration

wan2.6-i2v-flash Recommended

Video with or without audio, multi-shot narrative, audio-video sync

Text, image, audio

720P, 1080P

2-15s (integer)

wan2.6-i2v Recommended

Video with audio, multi-shot narrative, audio-video sync

Text, image, audio

720P, 1080P

2-15s (integer)

wan2.5-i2v-preview

Video with audio, audio-video sync

Text, image, audio

480P, 720P, 1080P

5s, 10s

wan2.2-i2v-flash

Video without audio (50% faster than 2.1)

Text, image

480P, 720P, 1080P

5s

wan2.2-i2v-plus

Video without audio (improved stability and success rate over 2.1)

Text, image

480P, 1080P

5s

wanx2.1-i2v-plus

Video without audio

Text, image

720P

5s

wanx2.1-i2v-turbo

Video without audio

Text, image

480P, 720P

3s, 4s, 5s

Core features

Create multi-shot videos

Models: wan2.6-i2v-flash, wan2.6-i2v

Automatically transitions between shots (e.g., wide to close-up) while keeping the subject consistent. Ideal for MVs and short narratives.

Required parameters:

  • shot_type: Set to "multi"

  • prompt_extend: Set to true for intelligent prompt rewriting

Input prompt

Input first frame

Output video (multi-shot)

A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise.

rap-image Input audio:

Python

Requires DashScope Python SDK 1.25.8 or later. Install the SDK.
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import dashscope

# Set the base URL for your region (Singapore shown here)
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Read API key from environment variable
# To hardcode instead: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")

def sample_async_call_i2v():
    # Submit an asynchronous task
    rsp = VideoSynthesis.async_call(api_key=api_key,
                                    model='wan2.6-i2v-flash',
                                    prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise.',
                                    img_url="https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png",
                                    audio_url="https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3",
                                    resolution="720P",
                                    duration=10,
                                    shot_type="multi",     # Enable multi-shot
                                    prompt_extend=True,
                                    watermark=True,
                                    negative_prompt="",
                                    seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("task_id: %s" % rsp.output.task_id)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' % (rsp.status_code, rsp.code, rsp.message))

    # Wait for the task to complete
    rsp = VideoSynthesis.wait(task=rsp, api_key=api_key)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print(rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' % (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_async_call_i2v()

Java

Requires DashScope Java SDK 2.22.6 or later. Install the SDK.
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Image2Video {
    static {
        // Set the base URL for your region (Singapore shown here)
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    }

    // Read API key from environment variable
    // To hardcode instead: apiKey="sk-xxx"
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");

    // Input image and audio
    static String imgUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png";
    static String audioUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3";

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.6-i2v-flash")
                        .prompt("A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise.")
                        .imgUrl(imgUrl)
                        .audioUrl(audioUrl)
                        .shotType("multi")      // Enable multi-shot
                        .duration(10)
                        .resolution("720P")
                        .negativePrompt("")
                        .promptExtend(true)
                        .watermark(true)
                        .seed(12345)
                        .build();
        // Submit asynchronous task
        VideoSynthesisResult task = vs.asyncCall(param);
        System.out.println(JsonUtils.toJson(task));
        System.out.println("please wait...");

        // Wait for the result
        VideoSynthesisResult result = vs.wait(task, apiKey);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

Step 1: Submit a task

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-i2v-flash",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png",
        "audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
    },
    "parameters": {
        "resolution": "720P",
        "prompt_extend": true,
        "duration": 10,
        "shot_type":"multi"
    }
}'

Step 2: Get the result

Replace {task_id} with the task_id value from the step 1 response.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Synchronize audio and video

Models: wan2.6-i2v-flash, wan2.6-i2v, wan2.5-i2v-preview

Animates characters in photos to speak or sing with lip-synced movements. For prompt examples, see Sound generation.

Two modes:

  • Custom audio: Pass audio_url to sync lip movements with your audio

  • Auto dubbing: Omit audio_url -- the model generates sound effects, music, or vocals from visuals

Input prompt

Input first frame

Output video (with audio)

A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise.

rap-image Input audio:

Python

Requires DashScope Python SDK 1.25.8 or later. Install the SDK.
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import dashscope

# Set the base URL for your region (Singapore shown here)
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Read API key from environment variable
# To hardcode instead: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")

def sample_async_call_i2v():
    # Submit an asynchronous task
    rsp = VideoSynthesis.async_call(api_key=api_key,
                                    model='wan2.6-i2v-flash',
                                    prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise.',
                                    img_url="https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png",
                                    audio_url="https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3",
                                    resolution="720P",
                                    duration=10,
                                    prompt_extend=True,
                                    watermark=True,
                                    negative_prompt="",
                                    seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("task_id: %s" % rsp.output.task_id)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' % (rsp.status_code, rsp.code, rsp.message))

    # Wait for the task to complete
    rsp = VideoSynthesis.wait(task=rsp, api_key=api_key)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print(rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' % (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_async_call_i2v()

Java

Requires DashScope Java SDK 2.22.6 or later. Install the SDK.
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Image2Video {
    static {
        // Set the base URL for your region (Singapore shown here)
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    }

    // Read API key from environment variable
    // To hardcode instead: apiKey="sk-xxx"
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");

    // Input image and audio
    static String imgUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png";
    static String audioUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3";

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.6-i2v-flash")
                        .prompt("A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life on a concrete wall. He performs an English rap at high speed while striking a classic, energetic rapper pose. The scene is set at night under an urban railway bridge. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of the rap, with no other dialogue or noise.")
                        .imgUrl(imgUrl)
                        .audioUrl(audioUrl)
                        .duration(10)
                        .resolution("720P")
                        .negativePrompt("")
                        .promptExtend(true)
                        .watermark(true)
                        .seed(12345)
                        .build();
        // Submit asynchronous task
        VideoSynthesisResult task = vs.asyncCall(param);
        System.out.println(JsonUtils.toJson(task));
        System.out.println("please wait...");

        // Wait for the result
        VideoSynthesisResult result = vs.wait(task, apiKey);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

Step 1: Submit a task

Custom audio
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.5-i2v-preview",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png",
        "audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true,
        "duration": 10
    }
}'
Automatic dubbing
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.5-i2v-preview",
    "input": {
        "prompt": "A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.",
        "img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true,
        "duration": 10
    }
}'

Step 2: Get the result

Replace {task_id} with the task_id value from the step 1 response.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Generate videos without audio

Models: wan2.6-i2v-flash, wan2.2 and earlier models

Generates silent videos suitable for dynamic posters, visual-only content, and other scenarios that do not require audio.

Configuration:

  • wan2.6-i2v-flash: Set audio=false for silent output (overrides audio_url if provided). Billed at video without audio rate.

  • wan2.2 and earlier: Silent by default, no configuration needed.

Prompt

Input first frame

Output video (without audio)

A cat running on the grass

image

Python

Requires DashScope Python SDK 1.25.8 or later. Install the SDK.
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import dashscope

# Set the base URL for your region (Singapore shown here)
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Read API key from environment variable
# To hardcode instead: api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")

def sample_async_call_i2v():
    # Submit an asynchronous task
    rsp = VideoSynthesis.async_call(api_key=api_key,
                                    model='wan2.6-i2v-flash',
                                    prompt='A cat running on the grass',
                                    img_url="https://cdn.translate.alibaba.com/r/wanx-demo-1.png",
                                    audio=False,   # Explicitly disable audio output
                                    resolution="720P",
                                    duration=5,
                                    prompt_extend=True,
                                    watermark=True,
                                    negative_prompt="",
                                    seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("task_id: %s" % rsp.output.task_id)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' % (rsp.status_code, rsp.code, rsp.message))

    # Wait for the task to complete
    rsp = VideoSynthesis.wait(task=rsp, api_key=api_key)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print(rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' % (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_async_call_i2v()

Java

Requires DashScope Java SDK 2.22.6 or later. Install the SDK.
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Image2Video {
    static {
        // Set the base URL for your region (Singapore shown here)
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    }

    // Read API key from environment variable
    // To hardcode instead: apiKey="sk-xxx"
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.6-i2v-flash")
                        .prompt("A cat running on the grass")
                        .imgUrl("https://cdn.translate.alibaba.com/r/wanx-demo-1.png")
                        .audio(false)   // Explicitly disable audio output
                        .duration(10)
                        .resolution("720P")
                        .negativePrompt("")
                        .promptExtend(true)
                        .watermark(true)
                        .seed(12345)
                        .build();
        // Submit asynchronous task
        VideoSynthesisResult task = vs.asyncCall(param);
        System.out.println(JsonUtils.toJson(task));
        System.out.println("please wait...");

        // Wait for the result
        VideoSynthesisResult result = vs.wait(task, apiKey);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

Step 1: Submit a task

wan2.6-i2v-flash
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-i2v-flash",
    "input": {
        "prompt": "A cat running on the grass",
        "img_url": "https://cdn.translate.alibaba.com/r/wanx-demo-1.png"
    },
    "parameters": {
        "audio": false,
        "resolution": "720P",
        "prompt_extend": true,
        "watermark": true,
        "duration": 5
    }
}'
wan2.2 and earlier models
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.2-i2v-plus",
    "input": {
        "prompt": "A cat running on the grass",
        "img_url": "https://cdn.translate.alibaba.com/r/wanx-demo-1.png"
    },
    "parameters": {
        "resolution": "480P",
        "prompt_extend": true
    }
}'

Step 2: Get the result

Replace {task_id} with the task_id value from the step 1 response.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"

Input and output specifications

Input image

  • Count: 1 image

  • Accepted formats: JPEG, JPG, PNG, BMP, WEBP

Three input methods:

Method 1: Public URL Recommended

Provide a publicly accessible HTTP/HTTPS URL.

Example: "https://example.com/img.png"

Method 2: Local file path (SDK only)

The file path format varies by SDK and operating system:

Python SDK -- supports absolute and relative paths:

Operating system

Format

Absolute path example

Relative path example

Linux / macOS

file://{path}

file:///home/images/test.png

file://./images/test.png

Windows

file://{path}

file://D:/images/test.png

file://./images/test.png

Java SDK (absolute paths only):

Operating system

Format

Absolute path example

Linux / macOS

file://{absolute_path}

file:///home/images/test.png

Windows

file:///{absolute_path}

file:///D:/images/test.png

Method 3: Base64-encoded string (HTTP API and SDK)

Format: data:{MIME_type};base64,{base64_data}

Example: data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABDg......

Supported MIME types:

Image format

MIME type

JPEG / JPG

image/jpeg

PNG

image/png

BMP

image/bmp

WEBP

image/webp

Example: three input methods

Python
import base64
import os
from http import HTTPStatus
from dashscope import VideoSynthesis
import mimetypes
import dashscope

# Singapore region URL
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Read API key from environment variable
api_key = os.getenv("DASHSCOPE_API_KEY")

# Helper: encode a local image file to Base64
def encode_file(file_path):
    mime_type, _ = mimetypes.guess_type(file_path)
    if not mime_type or not mime_type.startswith("image/"):
        raise ValueError("Unsupported or unrecognized image format")
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return f"data:{mime_type};base64,{encoded_string}"

"""
Choose one of the following three input methods:

1. Public URL -- for publicly accessible images
2. Local file -- for local development and testing
3. Base64 encoding -- for private images or encrypted transmission
"""

# [Method 1] Public URL
img_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png"

# [Method 2] Local file (absolute or relative path)
# img_url = "file:///path/to/your/img.png"       # Linux/macOS absolute
# img_url = "file://./img.png"                    # Relative path

# [Method 3] Base64 encoding
# img_url = encode_file("./img.png")

# Audio URL
audio_url = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"

def sample_call_i2v():
    print('please wait...')
    rsp = VideoSynthesis.call(api_key=api_key,
                              model='wan2.6-i2v-flash',
                              prompt='A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.',
                              img_url=img_url,
                              audio_url=audio_url,
                              resolution="720P",
                              duration=10,
                              prompt_extend=True,
                              watermark=False,
                              negative_prompt="",
                              seed=12345)
    print(rsp)
    if rsp.status_code == HTTPStatus.OK:
        print("video_url:", rsp.output.video_url)
    else:
        print('Failed, status_code: %s, code: %s, message: %s' %
              (rsp.status_code, rsp.code, rsp.message))


if __name__ == '__main__':
    sample_call_i2v()
Java
// Copyright (c) Alibaba, Inc. and its affiliates.

import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesis;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisParam;
import com.alibaba.dashscope.aigc.videosynthesis.VideoSynthesisResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;


public class Image2Video {

    static {
        // Singapore region URL
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    }

    // Read API key from environment variable
    // To hardcode instead: apiKey="sk-xxx"
    static String apiKey = System.getenv("DASHSCOPE_API_KEY");

    /**
     * Choose one of the following three input methods:
     *
     * 1. Public URL -- for publicly accessible images
     * 2. Local file -- for local development and testing
     * 3. Base64 encoding -- for private images or encrypted transmission
     */

    // [Method 1] Public URL
    static String imgUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png";

    // [Method 2] Local file (file:// + absolute path)
    // static String imgUrl = "file:///your/path/to/img.png";    // Linux/macOS
    // static String imgUrl = "file:///C:/your/path/to/img.png";  // Windows

    // [Method 3] Base64 encoding
    // static String imgUrl = Image2Video.encodeFile("/your/path/to/img.png");

    // Audio URL
    static String audioUrl = "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3";

    public static void image2video() throws ApiException, NoApiKeyException, InputRequiredException {
        Map<String, Object> parameters = new HashMap<>();
        parameters.put("prompt_extend", true);
        parameters.put("watermark", false);
        parameters.put("seed", 12345);

        VideoSynthesis vs = new VideoSynthesis();
        VideoSynthesisParam param =
                VideoSynthesisParam.builder()
                        .apiKey(apiKey)
                        .model("wan2.6-i2v-flash")
                        .prompt("A scene of urban fantasy art. A dynamic graffiti art character. A boy made of spray paint comes to life from a concrete wall. He raps an English song at high speed while striking a classic, energetic rapper pose. The scene is set under an urban railway bridge at night. The lighting comes from a single street lamp, creating a cinematic atmosphere full of high energy and amazing detail. The audio of the video consists entirely of his rap, with no other dialogue or noise.")
                        .imgUrl(imgUrl)
                        .audioUrl(audioUrl)
                        .duration(10)
                        .parameters(parameters)
                        .resolution("720P")
                        .negativePrompt("")
                        .build();
        System.out.println("please wait...");
        VideoSynthesisResult result = vs.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

     /**
     * Encode a local image file to a Base64 string.
     * @param filePath Path to the image file
     * @return Base64 string in the format: data:{MIME_type};base64,{base64_data}
     */
    public static String encodeFile(String filePath) {
        Path path = Paths.get(filePath);
        if (!Files.exists(path)) {
            throw new IllegalArgumentException("File does not exist: " + filePath);
        }
        String mimeType = null;
        try {
            mimeType = Files.probeContentType(path);
        } catch (IOException e) {
            throw new IllegalArgumentException("Cannot detect file type: " + filePath);
        }
        if (mimeType == null || !mimeType.startsWith("image/")) {
            throw new IllegalArgumentException("Unsupported or unrecognized image format");
        }
        byte[] fileBytes = null;
        try{
            fileBytes = Files.readAllBytes(path);
        } catch (IOException e) {
            throw new IllegalArgumentException("Cannot read file content: " + filePath);
        }

        String encodedString = Base64.getEncoder().encodeToString(fileBytes);
        return "data:" + mimeType + ";base64," + encodedString;
    }


    public static void main(String[] args) {
        try {
            image2video();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Input audio

  • Count: 1

  • Input method: Public URL only (HTTP or HTTPS)

Output video

  • Count: 1

  • Format: Specifications vary by model. See Availability.

  • URL validity: 24 hours

  • Dimensions: Based on input aspect ratio and resolution setting. Model preserves aspect ratio while scaling to target pixel count. Dimensions are adjusted to multiples of 16 for video encoding. Example: 750×1000 input (3:4 ratio) at resolution="720P" produces ~816×1104 output (~0.739 ratio), both divisible by 16.

Billing

  • Billed per second of successfully generated video

  • Failed calls incur no charges or quota usage

  • Savings plans are available

For pricing details, see Model pricing. For rate limits, see Wan series rate limits.

API reference

Image-to-video API reference

FAQ

Why can't I set the video aspect ratio directly (such as 16:9)?

The API does not support specifying aspect ratio. The resolution parameter controls pixel count, not ratio. The model preserves input aspect ratio and adjusts dimensions to multiples of 16.

To get a specific aspect ratio, provide an input image with that aspect ratio.

Why do I get a "url error, please check url!" error?

This error indicates an outdated SDK version. Upgrade to:

  • Python SDK: version 1.25.8 or later

  • Java SDK: version 2.22.6 or later

For upgrade instructions, see Install the SDK.