All Products
Search
Document Center

Alibaba Cloud Model Studio:Reference-to-video

Last Updated:Mar 15, 2026

Generate natural, lifelike performance videos from reference images or videos. The Wan reference-to-video models accept multimodal input (text, image, or video) and replicate character appearance and motion across scenes. Voice timbre replication requires a video reference.

  • Character portrayal: Replicate a character's appearance from a reference image or video. If the reference is a video, the model can also replicate voice timbre.

  • Multi-character interaction: Compose scenes with up to five characters for natural dialogue and interaction.

  • Multi-shot narrative: Maintain character consistency across shots with intelligent multi-shot scheduling.

Quick links: API reference | Prompt guide

Supported models

Available models vary by region. Resources are isolated between regions -- your model, endpoint URL, and API key must all belong to the same region. Cross-region calls fail.

Global

Access point and data storage in US (Virginia). Inference compute is dynamically scheduled worldwide. See Global deployment mode.

Model Features Input Output
wan2.6-r2v Recommended Video with audio, single or multi-character, multi-shot narrative, audio-video sync Text, video 720P or 1080P, 5s or 10s, 30 fps, MP4 (H.264)

International

Access point and data storage in Singapore. Inference compute is dynamically scheduled worldwide, excluding Chinese mainland. See International deployment mode.

Model Features Input Output
wan2.6-r2v-flash Recommended Video with or without audio, single or multi-character, multi-shot narrative, audio-video sync. Fast and cost-effective. Text, image, video 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264)
wan2.6-r2v Video with audio, multi-role reference-to-video, multi-shot narrative, audio-video sync Text, image, video 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264)

Chinese mainland

Access point and data storage in Beijing. Inference compute is restricted to Chinese mainland. See Chinese Mainland deployment mode.

Model Features Input Output
wan2.6-r2v-flash Recommended Video with or without audio, single or multi-character, multi-shot narrative, audio-video sync. Fast and cost-effective. Text, image, video 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264)
wan2.6-r2v Video with audio, single or multi-character, multi-shot narrative, audio-video sync Text, image, video 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264)
Note

The sample code in this topic uses the Singapore endpoint. For other regions, see the API reference.

Before you begin

Get a DashScope API key from the Model Studio console and set it as an environment variable:

export DASHSCOPE_API_KEY="your-api-key"

How it works

Reference-to-video tasks run asynchronously:

  1. Submit a task -- POST your model, prompt, reference URLs, and parameters. The API returns a task_id.

  2. Poll for results -- GET the task status using the task_id. When the status is SUCCEEDED, the response includes a URL to the generated video.

Task status Description
PENDING Queued
RUNNING Generating
SUCCEEDED Video generation complete; video URL available in output.video_url
FAILED Video generation failed; check the message field for error details

The output video URL expires after 24 hours -- download or transfer before expiration.

Quick start

Submit a single-character reference-to-video task and poll for the result.

# Step 1: Submit the task
curl -s --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-r2v-flash",
    "input": {
        "prompt": "character1 gives an enthusiastic product introduction to camera, smiling and gesturing naturally.",
        "reference_urls": ["https://example.com/presenter.mp4"]
    },
    "parameters": {
        "size": "1280*720",
        "duration": 5
    }
}'

The response includes a task_id:

{
    "output": {
        "task_id": "your-task-id",
        "task_status": "PENDING"
    },
    "request_id": "abc123"
}
# Step 2: Poll for the result (repeat until task_status is SUCCEEDED or FAILED)
curl -s https://dashscope-intl.aliyuncs.com/api/v1/tasks/your-task-id \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY"

When the task succeeds, the response includes the video URL:

{
    "output": {
        "task_id": "your-task-id",
        "task_status": "SUCCEEDED",
        "video_url": "https://dashscope-result.aliyuncs.com/..."
    },
    "usage": {
        "video_duration": 5,
        "video_ratio": "1280*720"
    },
    "request_id": "abc123"
}

For a complete Python example with built-in polling, see Multi-character interaction.

Parameters

Parameter Type Required Description
model string Yes Model identifier. Example: "wan2.6-r2v-flash"
input.prompt string Yes Text prompt describing the scene. Reference characters as character1, character2, etc., matching the order of reference_urls.
input.reference_urls array Yes Up to 5 URLs to reference images or videos (one character per reference).
parameters.size string No Output resolution. Example: "1280*720" (16:9). See the API reference for all options.
parameters.duration integer No Output video length in seconds.
parameters.audio boolean No true for audio, false for silent video. Default: true. Silent video is supported only by wan2.6-r2v-flash.
parameters.shot_type string No "multi" for multi-shot switching or "single" for a fixed perspective.
parameters.watermark boolean No true to add a watermark.

Reference input limits

Type Maximum count Notes
Images 5 People, objects, or backgrounds
Videos 3 Best for characters or objects. Avoid background or empty scene videos.
Total (images + videos) 5 Combined limit across all references

Input method: public URL (HTTP or HTTPS).

Character referencing

Character identifiers map to reference_urls array positions:

"reference_urls": [
    "https://example.com/person-a.mp4",   // character1
    "https://example.com/person-b.mp4",   // character2
    "https://example.com/guitar.png"      // character3
]

Use these identifiers in your prompt: *"character1 plays guitar while character2 sings along, holding character3."*

Multi-character interaction

Compose scenes with up to five characters for natural dialogue and interaction (interviews, conversations, tutorials, etc.).

Supported models: All models.

Set shot_type to multi for multi-shot switching or single for fixed perspective.

Example: Four-reference scene

Prompt: *"character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. character1 says to character2: 'that sounds great'"*

Input Type Reference
wan-r2v-role1.mp4 Video character1 (person)
wan-r2v-role2.mp4 Video character2 (person)
wan-r2v-object4 Image character3 (object)
wan-r2v-backgroud5 Image character4 (background)

Output: Multi-shot video with audio

Example: Two-character dialogue

Prompt: *"character1 says to character2: 'I'll rely on you tomorrow morning!' character2 replies: 'You can count on me!'"*

Input Type Reference
compressed-video (1).mp4 Video character1
compressed-video.mp4 Video character2

Output: Multi-shot video with audio

Sample code (curl)

All examples use the X-DashScope-Async: enable header to submit asynchronous tasks.

Step 1: Submit the task

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-r2v-flash",
    "input": {
        "prompt": "Character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. Character1 says to Character2: \"that sounds great\"",
        "reference_urls": [
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/aacgyk/wan-r2v-role1.mp4",
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mmizqq/wan-r2v-role2.mp4",
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png",
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/wfjikw/wan-r2v-backgroud5.png"
        ]
    },
    "parameters": {
        "size": "1280*720",
        "duration": 10,
        "audio": true,
        "shot_type": "multi",
        "watermark": true
    }
}'

Step 2: Retrieve the result

Replace <task-id> with the task_id from the Step 1 response.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
    --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Sample code (Python)

This example submits a multi-character task and polls until the result is ready.

import os
import time
import requests

API_KEY = os.environ.get("DASHSCOPE_API_KEY")
BASE_URL = "https://dashscope-intl.aliyuncs.com/api/v1"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    "X-DashScope-Async": "enable",
}

# Step 1: Submit the task
payload = {
    "model": "wan2.6-r2v-flash",
    "input": {
        "prompt": (
            'Character2 sits on a chair by the window, holding character3, '
            'and plays a soothing American country folk song next to character4. '
            'Character1 says to Character2: "that sounds great"'
        ),
        "reference_urls": [
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/aacgyk/wan-r2v-role1.mp4",
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mmizqq/wan-r2v-role2.mp4",
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png",
            "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/wfjikw/wan-r2v-backgroud5.png",
        ],
    },
    "parameters": {
        "size": "1280*720",
        "duration": 10,
        "audio": True,
        "shot_type": "multi",
        "watermark": True,
    },
}

response = requests.post(
    f"{BASE_URL}/services/aigc/video-generation/video-synthesis",
    headers=headers,
    json=payload,
)
result = response.json()
task_id = result["output"]["task_id"]
print(f"Task submitted: {task_id}")

# Step 2: Poll for the result
poll_headers = {"Authorization": f"Bearer {API_KEY}"}

while True:
    status_response = requests.get(
        f"{BASE_URL}/tasks/{task_id}", headers=poll_headers
    )
    status = status_response.json()
    task_status = status["output"]["task_status"]

    if task_status == "SUCCEEDED":
        video_url = status["output"]["video_url"]
        print(f"Video ready: {video_url}")
        break
    elif task_status == "FAILED":
        print(f"Task failed: {status['output'].get('message', 'Unknown error')}")
        break
    else:
        print(f"Status: {task_status} -- waiting 10s...")
        time.sleep(10)

Single-character performance

Create a complete character performance across different scenes from a single reference video or image (personal branding, product endorsements, educational training, etc.).

Supported models: All models.

Pass a single URL in reference_urls and use character1 in the prompt. Setting shot_type to multi is recommended.

Example: Holiday unboxing video

Prompt: *"Create a festive holiday unboxing experience. Shot 1 [0-2s]: Character1 sits by a beautifully decorated Christmas tree with twinkling lights, holding a wrapped gift box with elegant red and gold wrapping. Shot 2 [2-4s]: Close-up as Character1 carefully unwraps the gift, revealing premium skincare products inside. Shot 3 [4-6s]: Character1 applies the product with delight, saying: 'This holiday glow is exactly what I wanted!' Shot 4 [6-10s]: Character1 admires their radiant skin in a handheld mirror, surrounded by festive decorations, ending with a warm smile to camera."*

Input Output
wan-r2v-role-4.mp4 (reference video) Multi-shot video with audio

Sample code (curl)

Step 1: Submit the task

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-r2v-flash",
    "input": {
        "prompt": "Create a festive holiday unboxing experience.Shot 1 [0-2s]: Character1 sits by a beautifully decorated Christmas tree with twinkling lights, holding a wrapped gift box with elegant red and gold wrapping. Shot 2 [2-4s]: Close-up as Character1 carefully unwraps the gift, revealing premium skincare products inside. Shot 3 [4-6s]: Character1 applies the product with delight, saying: \"This holiday glow is exactly what I wanted!\" Shot 4 [6-10s]: Character1 admires their radiant skin in a handheld mirror, surrounded by festive decorations, ending with a warm smile to camera.",
        "reference_urls":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mjgmzx/wan-r2v-role-4.mp4"]
    },
    "parameters": {
        "size": "1280*720",
        "duration": 10,
        "shot_type":"multi",
        "watermark": true
    }
}'

Step 2: Retrieve the result

Replace <task-id> with the task_id from the Step 1 response.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
    --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Silent video generation

Create visual-only videos without audio (animated posters, silent short videos, etc.).

Supported model: wan2.6-r2v-flash only.

Set audio to false.

Example: Silent dance video

Prompt: *"character1 drinks bubble tea while dancing spontaneously to the music."*

Input Output
wan-r2v-role-1.mp4 (reference video) Silent video

Sample code (curl)

Step 1: Submit the task

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
    -H 'X-DashScope-Async: enable' \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "wan2.6-r2v-flash",
    "input": {
        "prompt": "character1 drinks bubble tea while dancing spontaneously to the music.",
        "reference_urls":["https://cdn.wanx.aliyuncs.com/static/demo-wan26/vace.mp4"]
    },
    "parameters": {
        "size": "1280*720",
        "duration": 5,
        "shot_type":"multi",
        "audio": false,
        "watermark": true
    }
}'

Step 2: Retrieve the result

Replace <task-id> with the task_id from the Step 1 response.

curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
    --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Output specifications

Property Details
Number of videos 1 per task
Format MP4 (H.264, 30 fps)
Resolution Set by size parameter. Example: 1280*720 produces 16:9 ratio.
URL expiration 24 hours

Billing and rate limits

Billing rules

Item Billed? Unit
Input images No --
Input videos Yes Per second
Output videos Yes Per second
Failed tasks No Do not consume the free quota

Videos with audio and silent videos are priced differently. For example, wan2.6-r2v-flash has separate rates for each.

Total billed duration = Input video duration (capped at 5s) + Output video duration

How input video duration is billed

The 5-second cap is distributed evenly across all references. Each video is billed for min(actual duration, truncation limit). Images are free.

Number of references Truncation limit per video
1 5s
2 2.5s
3 1.65s
4 1.25s
5 1s

Example: 3 references (1 image + 2 videos) with a 1.65s truncation limit per video:

Billed input duration = min(video 1 duration, 1.65s) + min(video 2 duration, 1.65s). The image is not billed.

API reference

For the complete parameter list, response schema, and region-specific endpoints, see the Wan video-to-video API reference.

FAQ

How do I set the video aspect ratio?

Use the size parameter. Each resolution maps to a fixed aspect ratio -- for example, size=1280*720 produces 16:9. See the size parameter reference for all options.

How do I reference characters in the prompt?

Each reference must contain one character. Use character1, character2, etc. -- identifiers map to the URL order in reference_urls:

"reference_urls": [
    "https://example.com/girl.mp4",   // character1
    "https://example.com/clock.png"   // character2
]

What happens when a task fails?

Poll the task status endpoint. If task_status is FAILED, check the message field for error details. Common causes: invalid reference URLs, unsupported file formats, or exceeded rate limits. Failed tasks are not billed.