All Products
Search
Document Center

Alibaba Cloud Model Studio:Text-to-video/image-to-video prompt guide

Last Updated:Apr 25, 2025

This topic describes how to write prompts for text-to-video and image-to-video generation, including prompt formulas and dictionary. You can use them to quickly get started with Wan video generation models.

Scenarios:

Prompt parameters

In the text-to-video and image-to-video APIs mentioned above, parameters related to prompts are:

  • prompt: The prompt for video generation, supporting both Chinese and English. It describes the video you want to generate in text. The guidelines described in this topic are all about this parameter.

  • prompt_extend: Specifies whether to enable intelligent prompt rewriting. The default is true, which enables intelligent rewriting by an LLM. We recommend using the default value.

{
    "input": {
        "prompt": "A flower shop with exquisite windows, beautiful wooden doors, displaying flowers"
    },
    "parameters": {
        "prompt_extend": true
    }
}

You may already know that writing effective prompts is not easy. This topic summarizes two categories of prompt techniques, and you can progressively learn how to write prompts.

  • Prompt formulas: Four prompt formulas are provided to meet different requirements.

  • Prompt dictionary: Conveys video content through seven key elements, including shot size, perspective, camera, camera movement, speed, atmosphere, and style.

Prompt formulas

Prompts describe the content and motion in the video. The more complete, precise, and rich the prompt is, the higher the quality of the generated video and the closer it is to your expectation. For beginners, here are four prompt formulas for different requirements:

Basic formula

Target users: New users trying AI video for the first time, and users using AI video as source of inspiration. Simple and free prompts make more imaginative videos.

Prompt = Entity + Environment + Motion

  • Entity: The entity is the main object of the video content. It can be a person, animal, plant, object, or an imaginary object that does not physically exist.

  • Environment: The environment is where the entity is located, including background and foreground. It can be a physically existing real space or an imagined fictional scene.

  • Motion: Motion includes the specific movement of the entity and the movement state of non-entities. It can be stationary, small-amplitude movement, large-amplitude movement, partial movement, or overall moving trend.

Text-to-video

Prompt example

Video effect

A black-haired ancient-style girl wearing mecha Hanfu with her hair in a bun, turns to look at the camera, her soft and glossy hair dances lightly in the air

Image-to-video

Prompt example

Video effect

Prompt: A man flying with a paraglider

image

Using the image as the first frame of the video, then generating the video based on the prompt.

Advanced formula

Target users: Users with some experience in AI video. Adding richer and more detailed descriptions on top of the basic formula can effectively enhance the video's quality, vividness, and storytelling.

Prompt = Entity (description) + Environment (description) + Motion (description) + Camera language + Atmosphere + Stylization

  • Entity description: Entity description details the appearance features of the entity, which can be listed through adjectives or short phrases, such as "a black-haired girl wearing ethnic minority clothing" or "a fairy from another world, wearing tattered yet gorgeous clothing, with a pair of strange wings made of ruins fragments on her back".

  • Environment description: Environment description details the features of the environment where the entity is located, which can be listed through adjectives or short phrases.

  • Motion description: Motion description details the features of the motion, including the amplitude, rate, and effect of the motion, such as "shaking violently", "moving slowly", "breaking the glass".

  • Camera language: Camera language includes shot size, perspective, camera, camera movement, and more. For common camera language, see the Prompt dictionary.

  • Atmosphere: Atmosphere words describe the expected atmosphere of the image, such as "dreamy", "lonely", "magnificent". For common atmosphere words, see the Prompt dictionary.

  • Stylization: Stylization describes the style language of the image, such as "cyberpunk", "line drawing illustration", "wasteland style". For common stylization, see the Prompt dictionary.

Camera movement formula

Target users: Users who have specific requirements for camera movement and professional video output scenarios. Adding more specific camera movement descriptions on top of the basic/advanced formula can effectively enhance the dynamic feeling and narrative of the video.

Prompt = Camera movement + Entity (description) + Environment (description) + Motion (description) + Camera language + Atmosphere + Stylization

  • Camera movement: Camera movement description is a specific description of camera movement. On the timeline, effectively combining camera movement and changes in the content of the image can enhance the richness and professionalism of video storytelling. Users can imagine and write the camera movement process by taking the perspective of a director. In terms of time, it is necessary to reasonably control the duration of camera movement within 5 seconds and avoid overly complex camera movements.

Prompt example

Video effect

The camera starts with a full screen of antique wooden screens and slowly pans to the left, revealing a classical-style girl sitting behind the screen, wearing embroidery Hanfu with her hair in a high bun, conducting an online video meeting.

Transformation formula

Target users: users with specific creative needs. Adding transformation descriptions on top of the basic/advanced formula can effectively enhance the fun of the video and bring unexpected visual effects.

Prompt = Entity A (description) + Transformation process + Entity B (description) + Environment (description) + Motion (description) + Camera language + Atmosphere + Stylization

  • Entity A: Entity A refers to the features and state of the entity before transformation.

  • Transformation process: Transformation process is a description of the process of the entity transforming from form A to form B. Detailed process descriptions can effectively enhance the naturalness and vividness of the transformation.

  • Entity B: Entity B refers to the features and state of the entity after transformation.

Prompt example

Video effect

Japanese anime style. At a corner of a city street, a black cat crouches under a street lamp, staring at the distant neon lights. Suddenly, a blue light descends from the sky, quickly enveloping his body. The black cat rises into the air within the light, its black fur gradually dissipates into the air, and its body quickly elongates. Its fur transforms into a black fitted suit, outlining a slender silhouette. The cat ears disappear, the facial contours gradually become clear, and finally transform into a handsome and cold young man's face. He lands lightly on the ground, his suit fluttering slightly in the night breeze, the blue light gradually fades away, like a mysterious young man walking out of a future world, elegant and confident.

Prompt dictionary

By writing prompts in different dimensions, you can enhance the controllability and expressiveness of the generated video in specific dimensions. We have prepared common dimensions and prompt examples for you references.

1. Shot size

Shot size type

Prompt example

Video effect

Close-up

Close-up shot | In the video, a close-up shot shows an classical-style woman's face, with soft light falling on her skin, outlining delicate contours.

Close shot

Close shot | In the video, a close shot shows an classical-style woman holding a cyan folding fan, her fingertips gently sliding over the exquisite patterns on the fan, as if savoring its charm.

Medium shot

Medium shot | In the video, a medium shot shows an classical-style woman gracefully walking among flowers, her long dress fluttering in the wind, as if blending with nature.

Long shot

Long shot | In the video, a long shot shows a bustling city street, with people coming and going on the wide sidewalk, forming a vivid and lively scene.

Bird's eye view

Bird's eye view | In the video, the camera uses a bird's eye view, overlooking the entire city, showing the intertwined streets and buildings.

2. Perspective

Perspective type

Prompt example

Video effect

Low angle

The video starts with a pair of walking legs, the camera uses a low angle shot, focusing on the movement of the feet. In the frame, shoes step on the rough abandoned ground, surrounded by broken concrete and scattered weeds, showing the desolation and decadence of the wasteland style.

Overhead shot

The video shows a scene of a person walking in a post-apocalyptic world, the camera shoots from above a person slowly walking in a wasteland-style scene with the sunlight.

Drone

FPV drone perspective | At the beginning of the video, the camera uses FPV (first-person view) drone shooting, bringing an immersive feeling. The camera quickly passes through the skyscrapers of the city, showing the magnificent urban landscape. Buildings quickly flash by in the field of vision, with light and shadow interlaced, reflecting the modernity and prosperity of the city.

3. Camera

Camera type

Prompt example

Video effect

Fisheye

Fisheye lens | The camera uses a fisheye perspective, bringing a unique curved effect, making the busy city street more eye-catching. The wide field of view makes viewers feel as if the buildings are leaning towards the sky, and the high-rise buildings on both sides of the street, under the rendering of the fisheye lens, appear particularly magnificent and three-dimensional, the distorted lines form a strong visual impact, fully showing the vitality and hustle and bustle of the city.

Wide-angle

Wide-angle lens | The video captures a busy city street monitoring scene with a wide-angle lens, the frame is wide and the field of view is open. High-rise buildings stand on both sides of the street, traffic flows continuously, pedestrians come and go, showing the hustle and vitality of city life.

4. Camera movement

Camera movement type

Prompt example

Video effect

Push in

The video shows a enormous cubic stone block, standing in the center of a square, with tranquility around. The camera slowly pushes in, gradually approaching the stone block, with the rugged texture and traces of time gradually becoming clear.

Pull out

The video shows a enormous cubic stone block, standing steadily in the center of the square. The camera slowly pulls out, the magnificent outline of the stone block gradually appears, the stone bricks of the square and the surrounding lawn gradually emerge. Finally, the entire square scene is presented.

Pan

The video shows a cubic stone block, the camera focuses on the cubic stone block, with sunlight shining through, emitting a warm glow. As the camera gently moves, the details of the stone block gradually change in the field of vision, with the surrounding environment blurring into the background. Then, the camera naturally switches to the cubic iron block next to it, with the smooth metal surface glittering in the sunlight. The focus shifts smoothly, forming a sharp contrast between the coldness of the iron block and the heaviness of the stone block. The camera moves between the two, showing their unique charm.

Follow

The video shows a cubic stone block rolling in the center of the square, the camera follows the movement. As the gravel moves, the surrounding grass is lush. The camera flexibly sticks to the gravel, capturing the fine texture on the ground and the dynamic feeling of the spring breeze passing by.

Circle

The video shows a enormous cubic stone block, standing in the center of the square. The camera circles around this stone block, capturing the rugged texture on the surface and the subtle glow in the sunlight.

5. Speed

Speed type

Prompt example

Video effect

Slow

The race car moves slowly, the background gradually becomes clear in the tranquility, everything seems to be stretched by time. The car speed is gentle, bringing a leisurely experience and peace of mind, the driver seems to blend with the surrounding environment in this serenity, showing a focus that inspires awe.

Fast

The race car drives quickly, the background blurs in an instant, everything seems to become an intersection of colors. The car speed is fierce, making people feel intense excitement and a surge of adrenaline, the driver seems to control the entire world in this roar, releasing a passion that makes people's blood boil.

Slow motion

The crowd moves slowly, slow motion magnifies the steps of each pedestrian, pedestrians blend with the environment, highlighting every detail of life. The steps are slow but full of rhythm, as if telling the story of the city, with thinking and feeling, sharing the poetry of life.

Time-lapse

The video shows the amazing process of plants growing and blooming rapidly with a time-lapse effect. In the frame, the buds grow and bloom in just a few seconds, from tender buds to bright flowers, instantly dazzling the audience. Each stage is clearly captured, showing the vigor and beauty of life, making people marvel at the wonder and magic of nature.

6. Atmosphere

Atmosphere type

Prompt example

Video effect

Vitality/Cheerful/Joyful/Beautiful

The video shows a forest full of vitality, sunlight shines through the canopy, casting golden spots of light, birds cheerfully fly in the forest, crisp calls echo in the air, as if singing hymns for this vibrant world. The leaves sway gently in the breeze, as if cheerfully dancing to the rhythm of the music, the thriving scene makes people feel joyful. Small animals frolic on the grass, flowers bloom in the sunlight, the entire forest is full of life and vitality, as if telling the beauty and infinite possibilities of life, making people feel the vigor and hope of nature.

Deep/Soft/Quiet/Dreamy

The video shows a deep forest, the tranquility of the night is like a soft veil, gently covering every inch of land. The surroundings are quiet, as if time has frozen at this moment, only the breeze gently brushes the treetops, bringing a faint rustling sound, as if whispering with nature. The night sky is like deep blue velvet, stars twinkle, like distant jewels embedded in it, emitting cold and soft light. The whole scene appears quiet and harmonious, refreshing, as if entering a dreamy realm, seemingly isolated from the world, fully immersed in the embrace of nature.

Lonely/Desolate/Melancholy/Tranquil

The video shows a lonely forest, the surroundings are silent, as if time has stopped at this moment. Leaves fall gently, slowly spreading on the ground, making a faint rustling sound, as if softly telling the farewell of autumn. Only the sound of wind echoes lonely in the empty space, as if awakening this silent land. Shadows intertwine between the trees, creating a desolate and melancholy atmosphere, the whole scene exudes a hint of solitude, making people feel a slight melancholy, as if paying tribute to lost time, being in this tranquility, making people can't help but ponder.

Tense/Uneasy/Oppressive/Gloomy

The video shows a tense forest, the wind is raging, violently blowing the treetops and leaves, making a series of low rustling sounds, as if nature is whispering uneasy omens. Tree trunks sway, as if struggling against this sudden majesty. The sky is overcast, the thick clouds are like oppressive mood, covering the entire picture, revealing a gloomy and heavy atmosphere. Everything around seems to be holding its breath for the coming changes, the whole scene is full of urgency, making people's hearts slightly tighten, as if an unpredictable storm could break out at any time, making people feel the ruthlessness and power of nature.

Solemn/Powerful/Magnificent/Awe

The video shows a solemn forest, the trees are tall and straight, standing on the earth like guardians, exuding a kind of silent power. Sunlight shines through the dense treetops, casting dappled light, forming layers of light and shadow intertwining on the ground, as if drawing a natural painting. The air is filled with fresh scent, the occasional bird calls are low and solemn, adding a dignified atmosphere. The whole scene exudes a kind of magnificence and tranquility, inspiring respect. This forest seems to be a witness of time, carrying countless stories and secrets, making people feel the solemnity and mystery of nature, inspiring awe, as if being in a spiritual place.

7. Style

Style type

Prompt example

Video effect

Cyberpunk

Retro cyberpunk style - Under the flickering neon lights, a cyber warrior wearing a leather jacket walks through an abandoned electronics factory. The camera pulls back from his back view to show a night view of a city full of futuristic technology.

Wasteland style

The video shows a stunning appearance of a flying fairy from another world against a wasteland style background. She wears tattered yet gorgeous clothing, with a pair of strange wings made of ruins fragments on her back, soaring over the desolate landscape. The camera follows her flight trajectory, pulling up from low altitude to the vast sky, showing the contrast between her light figure and the wasteland world, each flap of her wings seems to tell a story of survival and hope.

Line drawing illustration

In court, a clever, agile, and eloquent fox lawyer, wearing a neat lawyer's robe, is eloquently defending her client, each of her arguments is precise and powerful, moving the audience, line drawing animation.

Chinese style anime

Chinese anime style time-traveling girl, learning etiquette in an ancient palace under candlelight, every move shows ancient elegance.

Felt style

The video shows a vivid scene of a character in a kitchen environment, especially, this character is made of felt, adding a touch of childlike fun. The felt little person is standing in a mini kitchen, holding a small spatula, as if carefully cooking delicious food. The background is a wall hung with various kitchen utensils, the whole space is filled with a warm atmosphere. The video captures every action of the felt little person through a fixed perspective, including stirring, stir-frying, etc., showing its exquisite "culinary skills".

Classic masterpiece

In Van Gogh's "Starry Night", a skateboarding teenager wearing modern clothing shuttles between twisted trees, the light and shadow effects under the starry sky intertwine with the trajectory of the skateboard.

Pixel game

A person standing in a game-style pixel world, equipped with the most gorgeous, high resolution 8K texture pack ever.