Generate high-definition long videos using EasyAnimate - Platform For AI

EasyAnimate is a video generation framework developed by Alibaba Cloud Platform for AI (PAI) based on the Diffusion Transformer (DiT) architecture. It is a framework for rapidly generating high-definition, long videos from text or images and includes model fine-tuning for personalization.

Overview

Solution	Advantages and scenarios	Billing description
Solution 1: Generate videos with DSW	Provides a cloud IDE with built-in tutorials and code. This solution is ideal for users who want to explore the model in-depth or customize its functionality.	This solution uses public resources to create a Data Science Workshop (DSW) instance, which is billed on a pay-as-you-go basis. For more billing details, see DSW billing information.
Solution 2: Generate videos using Model Gallery	Eliminates environment configuration. You can deploy or fine-tune models with a single click and invoke them via a WebUI or API. This solution is ideal for users who need to quickly validate results or integrate the model into applications.	This tutorial uses Public Resources to create an Elastic Algorithm Service (EAS) service for model deployment and a Deep Learning Containers (DLC) task for model fine-tuning. Both services use a pay-as-you-go billing model. For more billing details, see DLC Billing and EAS Billing.

Solution 1: Generate videos with DSW

Step 1: Create a DSW instance

Log on to the PAI console and select a region. In the navigation pane on the left, click Workspaces, then select and click the target workspace.
In the navigation pane on the left, choose Model Training > Data Science Workshop (DSW) to go to the DSW page.

Click Create Instance and configure the following parameters. You can keep the default values for other parameters.

Parameter	Description
Instance Name	This tutorial uses the example value AIGC_test_01.
Resource Type	Select Public Resources.
Instance Type	Under GPU Instance Type, select `ecs.gn7i-c8g1.2xlarge` or another instance type with A10 or GU100 GPUs.
Image config	Select Alibaba Cloud Image, then search for and select `easyanimate:1.1.5-pytorch2.2.0-gpu-py310-cu118-ubuntu22.04`.

Click Yes to create the instance. Wait until the instance status changes to In operation.

Step 2: Download the EasyAnimate tutorial and model

In the Actions column of the target DSW instance, click Open to enter the DSW development environment.
On the Launcher page of the Notebook tab, click DSW Gallery.
On the DSW Gallery page, search for AI video generation demo based on EasyAnimate (V5), click Open In DSW to automatically download the tutorial's resources into your DSW instance.
The AI Video Generation with EasyAnimate example has multiple versions. This guide uses V5 as an example.
Download and install the EasyAnimate-related code and models.
In the EasyAnimate tutorial file, click to run the Define functions, Download code, and Download models in sequence.

Step 3: Start the WebUI and generate a video

Click to run the cell titled Start UI to start the WebUI service.
Click the generated link to open the WebUI.
In the WebUI, select the pre-trained model path from the dropdown menu. Configure other parameters as needed.
Click Generate. After about 5 minutes, you can view or download the generated video on the right side of the page.

Solution 2: Generate videos using Model Gallery

Step 1: Deploy the pre-trained model

Log on to the PAI console nd select a region. In the navigation pane on the left, click Workspaces, then select and click the target workspace.
In the navigation pane on the left, click QuickStart > Model Gallery. Search for EasyAnimate Long Video Generation Model, click Deploy, and confirm the deployment with the default configuration. Deployment is complete when the service status is Running.

Step 2: Generate videos via WebUI or API

After deployment, you can use the WebUI or the API to generate videos.

To view deployment task details later, choose Model Gallery > Job Management > Deployment Jobs in the navigation pane on the left, and then click the Service name.

Use the WebUI

On the Service Details page, click View Web App.
In the WebUI, select the pre-trained model path. Configure other parameters as needed.
Click Generate. After about 5 minutes, you can view or download the generated video on the right side of the page.

Use the API

To get the service URL and token, on the Service Details page, in the Call Information section, click View Call Information.

Call the service to generate a video. The following is a Python request example:

import os
import requests
import json
import base64
from typing import Dict, Any


class EasyAnimateClient:
    """
    EasyAnimate EAS Service API Client.
    """

    def __init__(self, service_url: str, token: str):
        if not service_url or not token:
            raise ValueError("The service URL (service_url) and token cannot be empty")
        self.base_url = service_url.rstrip('/')
        self.headers = {
            'Content-Type': 'application/json',
            'Authorization': token
        }

    def update_model(self, model_path: str, edition: str = "v3", timeout: int = 300) -> Dict[str, Any]:
        """
        Updates and loads the specified model version and path.

        Args:
            model_path: The path of the model within the service, such as "/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512".
            edition: The model edition. Default value: "v3".
            timeout: The request timeout period in seconds. Model loading is slow, so we recommend that you set a long timeout period.
        """
        # 1. Update the edition.
        requests.post(
            f"{self.base_url}/easyanimate/update_edition",
            headers=self.headers,
            json={"edition": edition},
            timeout=timeout
        ).raise_for_status()

        # 2. Update the model path and wait for the model to load.
        print(f"Sending a request to load the model: {model_path}")
        response = requests.post(
            f"{self.base_url}/easyanimate/update_diffusion_transformer",
            headers=self.headers,
            json={"diffusion_transformer_path": model_path},
            timeout=15000
        )
        response.raise_for_status()
        return response.json()

    def generate_video(self, prompt_textbox: str, **kwargs) -> bytes:
        """
        Generates a video based on the prompt.

        Args:
            prompt: The English positive prompt.
            **kwargs: Other optional parameters. For more information, see the parameter description table below.

        Returns:
            The binary data of the video in MP4 format.
        """
        payload = {
            "prompt_textbox": prompt_textbox,
            "negative_prompt_textbox": kwargs.get("negative_prompt",
                                                  "The video is not of a high quality, it has a low resolution..."),
            "width_slider": kwargs.get("width_slider", 672),
            "height_slider": kwargs.get("height_slider", 384),
            "length_slider": kwargs.get("length_slider", 144),
            "sample_step_slider": kwargs.get("sample_step_slider", 30),
            "cfg_scale_slider": kwargs.get("cfg_scale_slider", 6.0),
            "seed_textbox": kwargs.get("seed_textbox", 43),
            "sampler_dropdown": kwargs.get("sampler_dropdown", "Euler"),
            "generation_method": "Video Generation",
            "is_image": False,
            "lora_alpha_slider": 0.55,
            "lora_model_path": "none",
            "base_model_path": "none",
            "motion_module_path": "none"
        }

        response = requests.post(
            f"{self.base_url}/easyanimate/infer_forward",
            headers=self.headers,
            json=payload,
            timeout=1500
        )
        response.raise_for_status()

        result = response.json()
        if "base64_encoding" not in result:
            raise ValueError(f"Invalid API return format: {result}")

        return base64.b64decode(result["base64_encoding"])


# --- Example ---
if __name__ == "__main__":
    try:
        # 1. Configure service information. Replace the placeholders with the actual service endpoint and token. We recommend that you set them as environment variables.
        EAS_URL = "<eas-service-url>"
        EAS_TOKEN = "<eas-service-token>"

        # 2. Create a client.
        client = EasyAnimateClient(service_url=EAS_URL, token=EAS_TOKEN)

        # 3. Load the model. After the service is deployed, no model is loaded by default. Before you send a generation request, you must call update_model at least once to specify the model to use. To switch to a different model later, call this method again.
        client.update_model(model_path="/mnt/models/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512")

        # 4. Generate a video.
        video_bytes = client.generate_video(
            prompt_textbox="A beautiful cat playing in a sunny garden, high quality, detailed",
            width_slider=672,
            height_slider=384,
            length_slider=72,
            sample_step_slider=20
        )

        # 5. Save the video file.
        with open("api_generated_video.mp4", "wb") as f:
            f.write(video_bytes)
        print("The video is saved as api_generated_video.mp4.")

    except requests.RequestException as e:
        print(f"Network request error: {e}")
    except (ValueError, KeyError) as e:
        print(f"Data or parameter error: {e}")
    except Exception as e:
        print(f"An unknown error occurred: {e}")

The following table describes the service interface input parameters.

Interface parameter description

Parameter	Description	Type	Default value
prompt_textbox	The positive prompt.	string	Required. No default value.
negative_prompt_textbox	The negative prompt.	string	"The video is not of a high quality, it has a low resolution, and the audio quality is not clear. Strange motion trajectory, a poor composition and deformed video, low resolution, duplicate and ugly, strange body structure, long and strange neck, bad teeth, bad eyes, bad limbs, bad hands, rotating camera, blurry camera, shaking camera. Deformation, low-resolution, blurry, ugly, distortion."
sample_step_slider	The number of sampling steps for diffusion model denoising. More steps may produce richer details but take longer.	int	30
cfg_scale_slider	The prompt guidance scale. A higher value makes the output adhere more closely to the prompt, potentially reducing diversity.	float	6
sampler_dropdown	The sampler type. Valid values include: `Euler`, `EulerA`, `DPM++`, `PNDM`, and `DDIM`.	string	Euler
width_slider	The width of the generated video.	int	672
height_slider	The height of the generated video.	int	384
length_slider	The number of frames in the generated video.	int	144
is_image	Specifies whether the input is an image.	bool	False
lora_alpha_slider	The weight of the LoRA model parameters.	float	0.55
seed_textbox	The random seed.	int	43
lora_model_path	The path to an additional LoRA model. This LoRA model is loaded for the current request only.	string	none
base_model_path	The path to the transformer model to be updated.	string	none
motion_module_path	The path to the motion_module model to be updated.	string	none
generation_method	The generation type. Valid values: `Video Generation`, `Image Generation`.	string	none

Step 3: (Optional) Fine-tune the pre-trained model

You can fine-tune a model on a custom dataset to generate videos with a specific style or content. Follow these steps to fine-tune the model:

Log on to the PAI console. In the navigation pane on the left, click Workspaces, then select and click the target workspace.
In the navigation pane on the left, click QuickStart > Model Gallery.
On the Model Gallery page, search for EasyAnimate Long Video Generation Model and click Train to go to the configuration page.
For Source, select Public Resources. For Instance Type, select an instance with an A10 GPU or better. You can configure hyperparameters as needed and keep the default values for other parameters.
To fine-tune the model with a custom dataset, follow these instructions:
Use a custom dataset
1. Prepare a data folder and a meta file. The data folder contains the images and videos for training. The meta file is in JSON format, where each entry consists of a file path, a text description, and a data type, represented by the fields "file_path", "text", and "type". For example:
```
[
    {
        "file_path": "00031-3640797216.png",
        "text": "1girl, black_hair",
        "type": "image"    },
    {
        "file_path": "00032-3838108680.png",
        "text": "1girl, black_hair",
        "type": "image"    }
]
```
  Specify "type":"video" for video data and "type":"image" for image data.
2. Upload and select the data folder and meta file. On the training configuration page, click OSS file or directory, then upload and select your data folder and meta file.
Click Train > OK to create the training task. With the environment configuration selected in this guide, training takes about 40 minutes. Training is complete when the task status changes to Succeeded.
To view the training task details later, you can click Model Gallery > Job Management > Training Jobs in the navigation pane on the left, and then click the task name.
In the upper-right corner, click Deploy to deploy the fine-tuned model. When the status changes to Running, the model is successfully deployed.
On the Service Details page, click View Web Application in the upper part of the page.
To view the service details later, you can click Model Gallery > Job Management > Deployment Jobs in the navigation pane on the left, and then click the Service Name.
In the WebUI, select the trained LoRA model to generate videos. For API invocation, refer to Step 2.

Going live

Stop resources to save costs: This tutorial uses Public Resources to create a DSW instance and an EAS model service. When you are finished, stop or delete these resources promptly to avoid further charges.
- To stop or delete a DSW instance:
- To stop or delete an EAS model service:
Use EAS to deploy services in production: If you need to use the model in a production environment, we recommend using Solution 2 to deploy the model to EAS with a single click. If you used Solution 1, you can deploy the model to EAS by creating a custom image. For more information, see Deploy a model as an online service.
EAS provides the following to handle complex production environments:
- To test the concurrency that your service endpoint can support, use automatic stress testing in EAS.
- To automatically scale instances up or down to handle traffic peaks and troughs and ensure stable service operation, use Auto Scaling in EAS.
- To monitor service status in real time and improve system stability and security, use Log Monitoring and Alerting in EAS.

References

EAS offers one-click deployment for AI video generation services based on ComfyUI and Stable Video Diffusion models. For more information, see AI video generation - ComfyUI deployment.