All Products
Search
Document Center

Container Compute Service:Accelerate Wan2.1 video generation with DeepGPU

Last Updated:Mar 26, 2026

Container Compute Service (ACS) provides on-demand GPU computing without requiring you to manage the underlying hardware or node configuration. ACS is easy to deploy, supports pay-as-you-go billing, and is ideal for large language model (LLM) inference tasks, which helps reduce inference costs. This guide walks you through deploying a ComfyUI service on ACS and using the deepgpu-comfyui plugin to accelerate Wan2.1 text-to-video and image-to-video generation.

By the end of this guide, you will have:

  • Downloaded the Wan2.1 model files to a persistent NAS volume

  • Deployed a ComfyUI service on an ACS GPU cluster

  • Run an accelerated text-to-video workflow using the ApplyDeepyTorch node

Background

ComfyUI

ComfyUI is an open-source, node-based UI for running and customizing Stable Diffusion pipelines. Instead of writing code, you build generation workflows by connecting nodes on a visual canvas.

Wan model

Tongyi Wanxiang, also known as Wan, is a large AI art and text-to-image (AI-Generated Content (AIGC)) model from Alibaba's Tongyi Lab. It is the visual generation branch of the Tongyi Qianwen large model series. Wan is the world's first AI art model to support Chinese prompts. It has multimodal capabilities and can generate high-quality artwork from text descriptions, hand-drawn sketches, or image style transfers.

ApplyDeepyTorch node

The ApplyDeepyTorch node is included in the deepgpu-comfyui plugin. It optimizes diffusion model inference performance by applying DeepGPU acceleration. Insert this node after the last model-loading node in your workflow — for example, after a Load Diffusion Model, Load Checkpoint, or LoraLoaderModelOnly node.

Prerequisites

Before you begin, ensure that you have:

Step 1: Prepare the model data

Run the following commands in the directory where the NAS volume is mounted.

  1. Clone the ComfyUI repository.

    git clone https://github.com/comfyanonymous/ComfyUI.git
  2. Download the three Wan2.1 model files to their corresponding ComfyUI directories. The files are hosted on the Wan_2.1_ComfyUI_repackaged project on ModelScope.

    • The diffusion model (wan2.1_t2v_14B_fp16.safetensors):

      cd ComfyUI/models/diffusion_models
      wget https://modelscope.cn/models/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/master/split_files/diffusion_models/wan2.1_t2v_14B_fp16.safetensors
    • The VAE (wan_2.1_vae.safetensors):

      cd ComfyUI/models/vae
      wget https://modelscope.cn/models/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/master/split_files/vae/wan_2.1_vae.safetensors
    • The text encoder (umt5_xxl_fp8_e4m3fn_scaled.safetensors):

      cd ComfyUI/models/text_encoders
      wget https://modelscope.cn/models/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/master/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
    The download takes around 30 minutes. If your connection is slow, increase the peak public bandwidth before starting.
  3. Download and extract the ComfyUI-deepgpu plugin.

    cd ComfyUI/custom_nodes
    wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/nodes/20250513/ComfyUI-deepgpu.tar.gz
    tar zxf ComfyUI-deepgpu.tar.gz

Step 2: Deploy the ComfyUI service

  1. Log in to the ACS console. In the left navigation pane, choose Clusters. Click the target cluster name. Then choose Workloads > Deployments and click Create from YAML.

  2. Paste the following YAML manifest and click Create.

    Replace persistentVolumeClaim.claimName with the name of your persistent volume claim (PVC). This example uses the inference-nv-pytorch 25.07 image from the cn-beijing region to minimize image pull times. To use this image from other regions, see Usage method and update the image path in the manifest. The container image in this example has the deepgpu-torch and deepgpu-comfyui plugins pre-installed. To use these plugins in a different container environment, contact a solution architect (SA) to get the installation packages.
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: wanx-deployment
      name: wanx-deployment-test
      namespace: default
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: wanx-deployment
      template:
        metadata:
          labels:
            alibabacloud.com/compute-class: gpu
            alibabacloud.com/compute-qos: default
            alibabacloud.com/gpu-model-series: L20 #Supported GPU card types: L20 (GN8IS), G49E
            app: wanx-deployment
        spec:
          containers:
          - command:
            - sh
            - -c
            - DEEPGPU_PUB_LS=true python3 /mnt/ComfyUI/main.py --listen 0.0.0.0 --port 7860
            image: acs-registry-vpc.cn-beijing.cr.aliyuncs.com/egslingjun/inference-nv-pytorch:25.07-vllm0.9.2-pytorch2.7-cu128-20250714-serverless
            imagePullPolicy: Always
            name: main
            resources:
              limits:
                nvidia.com/gpu: "1"
                cpu: "16"
                memory: 64Gi
              requests:
                nvidia.com/gpu: "1"
                cpu: "16"
                memory: 64Gi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /dev/shm
              name: cache-volume
            - mountPath: /mnt #/mnt is the path in the pod where the NAS volume claim is mapped
              name: data
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
          volumes:
          - emptyDir:
              medium: Memory
              sizeLimit: 500G
            name: cache-volume
          - name: data
            persistentVolumeClaim:
              claimName: wanx-nas #wanx-nas is the volume claim created from the NAS volume
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: wanx-test
    spec:
      type: LoadBalancer
      ports:
        - port: 7860
          protocol: TCP
          targetPort: 7860
      selector:
        app: wanx-deployment

    Key parameters in this manifest:

    ParameterDescription
    alibabacloud.com/gpu-model-seriesGPU card type. Supported values: L20 (GN8IS instance) and G49E.
    nvidia.com/gpu: "1"Requests one GPU for the container.
    resources.limits/requestsSets CPU to 16 cores and memory to 64 GiB.
    /dev/shm emptyDir (sizeLimit: 500G)Shared memory volume mounted at /dev/shm. Required for large model inference.
    mountPath: /mntThe path inside the pod where the NAS volume is mounted. ComfyUI and model files are accessed from this path.
    persistentVolumeClaim.claimNameName of your PVC. Replace wanx-nas with your actual PVC name.
  3. In the dialog that appears, click View to open the workload details page. Click the Logs tab. When the service starts successfully, the log output looks like this:

    image

Step 3: Access the ComfyUI interface

  1. On the workload details page, click the Access Method tab to get the external endpoint of the service, such as 8.xxx.xxx.114:7860.

    image

  2. Open http://8.xxx.xxx.114:7860/ in a browser.

    The first time you access the URL, it may take about 5 minutes to load.
  3. In the ComfyUI interface, right-click anywhere and click Add Node to browse the available DeepGPU nodes from the plugin. The ApplyDeepyTorch node optimizes diffusion model inference by applying GPU-level acceleration. Insert it after the last model-loading node in your workflow. The node looks like this:

    image

    image.png

Step 4: Run the accelerated workflow

Download one or both of the following pre-built Wan2.1 workflows to your local machine:

  • Image-to-video workflow: https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/wan/workflows/workflow_image_to_video_wan_1.3b_deepytorch.json

  • Text-to-video workflow: https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/wan/workflows/workflow_text_to_video_wan_deepytorch.json

The following steps use the accelerated text-to-video workflow as an example.

  1. In ComfyUI, choose Workflow > Open and select the downloaded workflow_text_to_video_wan_deepytorch.json file.

  2. Find the Apply DeepyTorch to diffusion model node. Set its enable parameter to true.

    The DeepyTorch-accelerated workflow inserts an ApplyDeepyTorch node after the Load Diffusion Model node.

    image.png

  3. Click Run and wait for the video to generate.

  4. Click the Queue button on the left to view the generation time and preview the output.

    The first run takes longer than subsequent runs as the model warms up. Run the workflow two or three more times to see stable performance.

    image

  5. (Optional) To compare generation time without acceleration, restart the ComfyUI service and run the non-accelerated workflow: https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/wan/workflows/workflow_text_to_video_wan.json