Generate natural, lifelike performance videos from reference images or videos. The Wan reference-to-video models accept multimodal input (text, image, or video) and replicate character appearance and motion across scenes. Voice timbre replication requires a video reference.
-
Character portrayal: Replicate a character's appearance from a reference image or video. If the reference is a video, the model can also replicate voice timbre.
-
Multi-character interaction: Compose scenes with up to five characters for natural dialogue and interaction.
-
Multi-shot narrative: Maintain character consistency across shots with intelligent multi-shot scheduling.
Quick links: API reference | Prompt guide
Supported models
Available models vary by region. Resources are isolated between regions -- your model, endpoint URL, and API key must all belong to the same region. Cross-region calls fail.
Global
Access point and data storage in US (Virginia). Inference compute is dynamically scheduled worldwide. See Global deployment mode.
| Model | Features | Input | Output |
|---|---|---|---|
| wan2.6-r2v Recommended | Video with audio, single or multi-character, multi-shot narrative, audio-video sync | Text, video | 720P or 1080P, 5s or 10s, 30 fps, MP4 (H.264) |
International
Access point and data storage in Singapore. Inference compute is dynamically scheduled worldwide, excluding Chinese mainland. See International deployment mode.
| Model | Features | Input | Output |
|---|---|---|---|
| wan2.6-r2v-flash Recommended | Video with or without audio, single or multi-character, multi-shot narrative, audio-video sync. Fast and cost-effective. | Text, image, video | 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264) |
| wan2.6-r2v | Video with audio, multi-role reference-to-video, multi-shot narrative, audio-video sync | Text, image, video | 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264) |
Chinese mainland
Access point and data storage in Beijing. Inference compute is restricted to Chinese mainland. See Chinese Mainland deployment mode.
| Model | Features | Input | Output |
|---|---|---|---|
| wan2.6-r2v-flash Recommended | Video with or without audio, single or multi-character, multi-shot narrative, audio-video sync. Fast and cost-effective. | Text, image, video | 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264) |
| wan2.6-r2v | Video with audio, single or multi-character, multi-shot narrative, audio-video sync | Text, image, video | 720P or 1080P, 2-10s (integer), 30 fps, MP4 (H.264) |
The sample code in this topic uses the Singapore endpoint. For other regions, see the API reference.
Before you begin
Get a DashScope API key from the Model Studio console and set it as an environment variable:
export DASHSCOPE_API_KEY="your-api-key"
How it works
Reference-to-video tasks run asynchronously:
-
Submit a task -- POST your model, prompt, reference URLs, and parameters. The API returns a
task_id. -
Poll for results -- GET the task status using the
task_id. When the status isSUCCEEDED, the response includes a URL to the generated video.
| Task status | Description |
|---|---|
PENDING |
Queued |
RUNNING |
Generating |
SUCCEEDED |
Video generation complete; video URL available in output.video_url |
FAILED |
Video generation failed; check the message field for error details |
The output video URL expires after 24 hours -- download or transfer before expiration.
Quick start
Submit a single-character reference-to-video task and poll for the result.
# Step 1: Submit the task
curl -s --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
-H 'X-DashScope-Async: enable' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.6-r2v-flash",
"input": {
"prompt": "character1 gives an enthusiastic product introduction to camera, smiling and gesturing naturally.",
"reference_urls": ["https://example.com/presenter.mp4"]
},
"parameters": {
"size": "1280*720",
"duration": 5
}
}'
The response includes a task_id:
{
"output": {
"task_id": "your-task-id",
"task_status": "PENDING"
},
"request_id": "abc123"
}# Step 2: Poll for the result (repeat until task_status is SUCCEEDED or FAILED)
curl -s https://dashscope-intl.aliyuncs.com/api/v1/tasks/your-task-id \
-H "Authorization: Bearer $DASHSCOPE_API_KEY"
When the task succeeds, the response includes the video URL:
{
"output": {
"task_id": "your-task-id",
"task_status": "SUCCEEDED",
"video_url": "https://dashscope-result.aliyuncs.com/..."
},
"usage": {
"video_duration": 5,
"video_ratio": "1280*720"
},
"request_id": "abc123"
}
For a complete Python example with built-in polling, see Multi-character interaction.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier. Example: "wan2.6-r2v-flash" |
input.prompt |
string | Yes | Text prompt describing the scene. Reference characters as character1, character2, etc., matching the order of reference_urls. |
input.reference_urls |
array | Yes | Up to 5 URLs to reference images or videos (one character per reference). |
parameters.size |
string | No | Output resolution. Example: "1280*720" (16:9). See the API reference for all options. |
parameters.duration |
integer | No | Output video length in seconds. |
parameters.audio |
boolean | No | true for audio, false for silent video. Default: true. Silent video is supported only by wan2.6-r2v-flash. |
parameters.shot_type |
string | No | "multi" for multi-shot switching or "single" for a fixed perspective. |
parameters.watermark |
boolean | No | true to add a watermark. |
Reference input limits
| Type | Maximum count | Notes |
|---|---|---|
| Images | 5 | People, objects, or backgrounds |
| Videos | 3 | Best for characters or objects. Avoid background or empty scene videos. |
| Total (images + videos) | 5 | Combined limit across all references |
Input method: public URL (HTTP or HTTPS).
Character referencing
Character identifiers map to reference_urls array positions:
"reference_urls": [
"https://example.com/person-a.mp4", // character1
"https://example.com/person-b.mp4", // character2
"https://example.com/guitar.png" // character3
]
Use these identifiers in your prompt: *"character1 plays guitar while character2 sings along, holding character3."*
Multi-character interaction
Compose scenes with up to five characters for natural dialogue and interaction (interviews, conversations, tutorials, etc.).
Supported models: All models.
Set shot_type to multi for multi-shot switching or single for fixed perspective.
Example: Four-reference scene
Prompt: *"character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. character1 says to character2: 'that sounds great'"*
| Input | Type | Reference |
|---|---|---|
| wan-r2v-role1.mp4 | Video | character1 (person) |
| wan-r2v-role2.mp4 | Video | character2 (person) |
![]() |
Image | character3 (object) |
![]() |
Image | character4 (background) |
Output: Multi-shot video with audio
Example: Two-character dialogue
Prompt: *"character1 says to character2: 'I'll rely on you tomorrow morning!' character2 replies: 'You can count on me!'"*
| Input | Type | Reference |
|---|---|---|
| compressed-video (1).mp4 | Video | character1 |
| compressed-video.mp4 | Video | character2 |
Output: Multi-shot video with audio
Sample code (curl)
All examples use the X-DashScope-Async: enable header to submit asynchronous tasks.
Step 1: Submit the task
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
-H 'X-DashScope-Async: enable' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.6-r2v-flash",
"input": {
"prompt": "Character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. Character1 says to Character2: \"that sounds great\"",
"reference_urls": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/aacgyk/wan-r2v-role1.mp4",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mmizqq/wan-r2v-role2.mp4",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/wfjikw/wan-r2v-backgroud5.png"
]
},
"parameters": {
"size": "1280*720",
"duration": 10,
"audio": true,
"shot_type": "multi",
"watermark": true
}
}'
Step 2: Retrieve the result
Replace <task-id> with the task_id from the Step 1 response.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Sample code (Python)
This example submits a multi-character task and polls until the result is ready.
import os
import time
import requests
API_KEY = os.environ.get("DASHSCOPE_API_KEY")
BASE_URL = "https://dashscope-intl.aliyuncs.com/api/v1"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
"X-DashScope-Async": "enable",
}
# Step 1: Submit the task
payload = {
"model": "wan2.6-r2v-flash",
"input": {
"prompt": (
'Character2 sits on a chair by the window, holding character3, '
'and plays a soothing American country folk song next to character4. '
'Character1 says to Character2: "that sounds great"'
),
"reference_urls": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/aacgyk/wan-r2v-role1.mp4",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mmizqq/wan-r2v-role2.mp4",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/wfjikw/wan-r2v-backgroud5.png",
],
},
"parameters": {
"size": "1280*720",
"duration": 10,
"audio": True,
"shot_type": "multi",
"watermark": True,
},
}
response = requests.post(
f"{BASE_URL}/services/aigc/video-generation/video-synthesis",
headers=headers,
json=payload,
)
result = response.json()
task_id = result["output"]["task_id"]
print(f"Task submitted: {task_id}")
# Step 2: Poll for the result
poll_headers = {"Authorization": f"Bearer {API_KEY}"}
while True:
status_response = requests.get(
f"{BASE_URL}/tasks/{task_id}", headers=poll_headers
)
status = status_response.json()
task_status = status["output"]["task_status"]
if task_status == "SUCCEEDED":
video_url = status["output"]["video_url"]
print(f"Video ready: {video_url}")
break
elif task_status == "FAILED":
print(f"Task failed: {status['output'].get('message', 'Unknown error')}")
break
else:
print(f"Status: {task_status} -- waiting 10s...")
time.sleep(10)
Single-character performance
Create a complete character performance across different scenes from a single reference video or image (personal branding, product endorsements, educational training, etc.).
Supported models: All models.
Pass a single URL in reference_urls and use character1 in the prompt. Setting shot_type to multi is recommended.
Example: Holiday unboxing video
Prompt: *"Create a festive holiday unboxing experience. Shot 1 [0-2s]: Character1 sits by a beautifully decorated Christmas tree with twinkling lights, holding a wrapped gift box with elegant red and gold wrapping. Shot 2 [2-4s]: Close-up as Character1 carefully unwraps the gift, revealing premium skincare products inside. Shot 3 [4-6s]: Character1 applies the product with delight, saying: 'This holiday glow is exactly what I wanted!' Shot 4 [6-10s]: Character1 admires their radiant skin in a handheld mirror, surrounded by festive decorations, ending with a warm smile to camera."*
| Input | Output |
|---|---|
| wan-r2v-role-4.mp4 (reference video) | Multi-shot video with audio |
Sample code (curl)
Step 1: Submit the task
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
-H 'X-DashScope-Async: enable' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.6-r2v-flash",
"input": {
"prompt": "Create a festive holiday unboxing experience.Shot 1 [0-2s]: Character1 sits by a beautifully decorated Christmas tree with twinkling lights, holding a wrapped gift box with elegant red and gold wrapping. Shot 2 [2-4s]: Close-up as Character1 carefully unwraps the gift, revealing premium skincare products inside. Shot 3 [4-6s]: Character1 applies the product with delight, saying: \"This holiday glow is exactly what I wanted!\" Shot 4 [6-10s]: Character1 admires their radiant skin in a handheld mirror, surrounded by festive decorations, ending with a warm smile to camera.",
"reference_urls":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mjgmzx/wan-r2v-role-4.mp4"]
},
"parameters": {
"size": "1280*720",
"duration": 10,
"shot_type":"multi",
"watermark": true
}
}'
Step 2: Retrieve the result
Replace <task-id> with the task_id from the Step 1 response.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Silent video generation
Create visual-only videos without audio (animated posters, silent short videos, etc.).
Supported model: wan2.6-r2v-flash only.
Set audio to false.
Example: Silent dance video
Prompt: *"character1 drinks bubble tea while dancing spontaneously to the music."*
| Input | Output |
|---|---|
| wan-r2v-role-1.mp4 (reference video) | Silent video |
Sample code (curl)
Step 1: Submit the task
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
-H 'X-DashScope-Async: enable' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "wan2.6-r2v-flash",
"input": {
"prompt": "character1 drinks bubble tea while dancing spontaneously to the music.",
"reference_urls":["https://cdn.wanx.aliyuncs.com/static/demo-wan26/vace.mp4"]
},
"parameters": {
"size": "1280*720",
"duration": 5,
"shot_type":"multi",
"audio": false,
"watermark": true
}
}'
Step 2: Retrieve the result
Replace <task-id> with the task_id from the Step 1 response.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Output specifications
| Property | Details |
|---|---|
| Number of videos | 1 per task |
| Format | MP4 (H.264, 30 fps) |
| Resolution | Set by size parameter. Example: 1280*720 produces 16:9 ratio. |
| URL expiration | 24 hours |
Billing and rate limits
-
For free quota and pricing, see Model invocation pricing.
-
For rate limits, see Wan series.
Billing rules
| Item | Billed? | Unit |
|---|---|---|
| Input images | No | -- |
| Input videos | Yes | Per second |
| Output videos | Yes | Per second |
| Failed tasks | No | Do not consume the free quota |
Videos with audio and silent videos are priced differently. For example, wan2.6-r2v-flash has separate rates for each.
Total billed duration = Input video duration (capped at 5s) + Output video duration
How input video duration is billed
The 5-second cap is distributed evenly across all references. Each video is billed for min(actual duration, truncation limit). Images are free.
| Number of references | Truncation limit per video |
|---|---|
| 1 | 5s |
| 2 | 2.5s |
| 3 | 1.65s |
| 4 | 1.25s |
| 5 | 1s |
Example: 3 references (1 image + 2 videos) with a 1.65s truncation limit per video:
Billed input duration = min(video 1 duration, 1.65s) + min(video 2 duration, 1.65s). The image is not billed.
API reference
For the complete parameter list, response schema, and region-specific endpoints, see the Wan video-to-video API reference.
FAQ
How do I set the video aspect ratio?
Use the size parameter. Each resolution maps to a fixed aspect ratio -- for example, size=1280*720 produces 16:9. See the size parameter reference for all options.
How do I reference characters in the prompt?
Each reference must contain one character. Use character1, character2, etc. -- identifiers map to the URL order in reference_urls:
"reference_urls": [
"https://example.com/girl.mp4", // character1
"https://example.com/clock.png" // character2
]
What happens when a task fails?
Poll the task status endpoint. If task_status is FAILED, check the message field for error details. Common causes: invalid reference URLs, unsupported file formats, or exceeded rate limits. Failed tasks are not billed.

