All Products
Search
Document Center

Elastic GPU Service:Install and use ComfyUI with DeepGPU

Last Updated:Apr 22, 2026

For text-to-image workloads, use ComfyUI with DeepGPU to accelerate the inference speed of FLUX and SD models. For example, using ComfyUI with DeepGPU on a GPU instance can boost text-to-image performance for the FLUX model by approximately 30% compared to a standard setup. This topic explains how to install and use ComfyUI with DeepGPU.

Performance comparison

Note

The DeepGPU inference acceleration component from Alibaba Cloud provides significant performance gains for text-to-image inference with FLUX.1, SD, or SDXL models.

Compared to a setup without the ComfyUI and DeepGPU acceleration nodes, configuring them on a GPU instance (a single-card gn8is instance is recommended) boosts performance for the flux1-dev model by approximately 30% for both bf16 and fp8 precision. The following table compares the text-to-image acceleration performance for select models.

Model weight precision

Image resolution (W x H)

Time (unaccelerated)

Time (accelerated)

Speedup rate

default(bf16)

1024 x 1024

20.83s

16.62s

25.3%

default(bf16)

1280 x 720

19.02s

15.07s

26.2%

default(bf16)

680 x 1024

14.21s

11.21s

26.8%

default(bf16)

576 x 768

8.81s

7.27s

21.2%

fp8_e4m3_fast

1024 x 1024

15.16s

11.15s

36.0%

fp8_e4m3_fast

1280 x 720

13.66s

9.97s

37.0%

fp8_e4m3_fast

680 x 1024

9.93s

7.51s

32.2%

fp8_e4m3_fast

576 x 768

6.07s

4.88s

24.4%

Prerequisites

  • You have created a GPU instance that meets the following requirements:

    • The operating system must be Ubuntu 20.04 or Ubuntu 22.04.

    • The NVIDIA driver and CUDA are installed and meet the version requirements.

      Note

      When you create a GPU instance, we recommend selecting the Install GPU Driver option after choosing an image. Then, select the required versions for CUDA, the driver, and cuDNN.

    • The instance has a fixed public IP address or is associated with an Elastic IP address (EIP). For information about how to enable public network access, see Enable public network access.

  • You have configured security group rules.

    Port 22, required for remote connections, is open by default when you create a security group. The ComfyUI server requires a specific port, such as 7860, to access its web UI. Ensure that inbound rules for ports 22 and 7860 are added to the security group. If they are not open, manually configure security group rules.

Install ComfyUI and DeepGPU

ComfyUI is an open-source project. You must install it before installing the DeepGPU inference acceleration component. For information about DeepGPU nodes and workflows in ComfyUI, see Node and workflow overview.

  1. Set up ComfyUI.

    Note

    For an overview of related nodes and workflows in ComfyUI, see Node and workflow overview.

    If you plan to load a native LoRA model in ComfyUI (by using the LoraLoaderModelOnly node) and accelerate it with deepgpu-torch, modify one line in the ComfyUI source code.

    • For ComfyUI v0.3.6 and earlier

      In the ComfyUI/comfy/sd.py file, add the parameter weight_inplace_update=True to the following line of code.

      779

      The modified code should look like this:

      return comfy.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=offload_device, weight_inplace_update=True)
    • For ComfyUI v0.3.7 and later

      In the ComfyUI/comfy/sd.py file, add the parameter weight_inplace_update=True to the following line of code.

      785

      The modified code should look like this:

      model_patcher = comfy.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=model_management.unet_offload_device(), weight_inplace_update=True)
  2. Install the DeepGPU inference acceleration component.

    1. Connect to the GPU instance by using Workbench.

    2. Check your Python version to ensure it meets the dependency requirements for the deepgpu-torch acceleration component.

      The deepgpu-torch acceleration component requires Python 3.10.

      Ubuntu 22.04

      Run python3 -V to check your Python version. As shown in the following figure, Ubuntu 22.04 comes with Python 3.10.12 by default, which meets the dependency requirement.

      python3

      Ubuntu 20.04

      Run python3 -V to check your Python version. As shown in the following figure, Ubuntu 20.04 comes with Python 3.8.10 by default, which does not meet the dependency requirement.

      python3

      You can install Miniconda to create an isolated Python 3.10 environment, or use another method.

      1. Run the following command to download the Miniconda installation script.

        wget https://repo.anaconda.com/miniconda/Miniconda3-py310_24.1.2-0-Linux-x86_64.sh
      2. Run the following commands to install Miniconda and activate the environment.

        bash ./Miniconda3-py310_24.1.2-0-Linux-x86_64.sh -b -p /workspace/miniconda
        source /workspace/miniconda/bin/activate
      3. Run python3 -V again to check the Python version.

        3

        Note

        Each time you log in, run source /workspace/miniconda/bin/activate to activate the virtual Python 3.10 environment.

    3. Run the following command to install torch.

      The deepgpu-torch acceleration component requires torch 2.5.x+cu124. This example installs torch 2.5.0. To install a different version, replace the version number accordingly.

      pip install torch==2.5.0

      Run python3 -c "import torch; print(torch.__version__)" to confirm that torch 2.5.0+cu124 is installed.

      torch

    4. Run the following commands to install deepgpu-torch.

      apt-get install which curl iputils-ping -y
      pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.html

      Run pip list | grep deepgpu-torch to view the installed version of deepgpu-torch.

      deeptorch

    5. Run the following commands to download and install the ComfyUI plugin for the DeepGPU inference acceleration component.

      Download the plugin for the DeepGPU inference acceleration component and extract it to the ComfyUI/custom_nodes/ directory.

      cd ComfyUI/custom_nodes/
      wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz
      tar zxf ComfyUI-deepgpu.tar.gz
      cd ../..
      
      pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html

      Run pip list | grep deepgpu-comfyui to verify that the plugin for the DeepGPU inference acceleration component is installed.

      chajian

Accelerate text-to-image inference

This example demonstrates how to use ComfyUI with DeepGPU to test the acceleration of text-to-image inference for a FLUX model. The setup uses a gn8is instance running Ubuntu 22.04 with DeepGPU installed.

  1. Install the basic environment, including torch, ComfyUI dependencies, and the DeepGPU acceleration component.

    1. Connect to the GPU instance by using Workbench.

    2. Run the python3 -V command to ensure your Python version is 3.10.

      Ubuntu 22.04 comes with Python 3.10.12 by default, which meets the dependency requirement for the deepgpu-torch acceleration component. If you are using a GPU instance with Ubuntu 20.04, you must create an isolated Python 3.10 environment. For more information, see Ubuntu 20.04.

    3. Run the following command to install torch.

      # This example installs torch 2.5.0. To install a different version, replace the version number accordingly.
      pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0

      Run python3 -c "import torch; print(torch.__version__)" to view the installed torch version.

      torch版本

    4. Run the following command to install ComfyUI dependencies.

      pip install PyYAML safetensors numpy Pillow einops psutil transformers scipy torchsde aiohttp comfyui-frontend-package==1.11.8 kornia spandrel av
    5. Run the following commands to install the DeepGPU acceleration component.

      apt-get install which curl iputils-ping -y
      pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.html
      pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html

      Run pip list | grep deepgpu-torch or pip list | grep deepgpu-comfyui to view the installed version of the DeepGPU acceleration component.

      deepgpu版本

  2. Run the following command to download the ComfyUI source code.

    This example uses the official ComfyUI source code. You can also use your own customized version of the ComfyUI code.

    Ubuntu 22.04

    git clone -b v0.3.26 https://github.com/comfyanonymous/ComfyUI
    sed -i "s|comfy.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=model_management.unet_offload_device())|comfy.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=model_management.unet_offload_device(), weight_inplace_update=True)|g" ComfyUI/comfy/sd.py

    Ubuntu 20.04

    apt install git
    git clone -b v0.3.26 https://github.com/comfyanonymous/ComfyUI
    sed -i "s|comfy.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=model_management.unet_offload_device())|comfy.model_patcher.ModelPatcher(model, load_device=load_device, offload_device=model_management.unet_offload_device(), weight_inplace_update=True)|g" ComfyUI/comfy/sd.py
  3. Run the following commands to download the plugin for the DeepGPU inference acceleration component.

    cd ComfyUI/custom_nodes/
    wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz
    tar zxf ComfyUI-deepgpu.tar.gz
    ls
    cd ../..
  4. Run the following commands to download the FLUX model.

    This example uses the official FLUX model. You can also use your own trained models.

    cd ComfyUI
    wget -P models/unet https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/models/flux1-dev.safetensors
    wget -P models/clip https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/models/t5xxl_fp16.safetensors
    wget -P models/clip https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/models/clip_l.safetensors
    wget -P models/vae https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/models/ae.safetensors
    Note

    The model download may take some time.

  5. Run the following command to start the ComfyUI service.

    python3 main.py --listen 0.0.0.0 --port 7860
  6. Navigate to http://IP:7860 to access the ComfyUI service.

    Replace IP with the public IP address of your GPU instance.

  7. In the ComfyUI interface, select Workflow > Open, select the JSON file provided in the sample workflow, and then click Queue Prompt.

    Important

    You must restart the ComfyUI service when switching between workflows with and without DeepGPU acceleration.

    • Without DeepGPU acceleration: Select the base flux-dev JSON file (workflow_flux.json). The following figure shows the workflow execution.

      文生图

    • With DeepGPU acceleration: Select the accelerated flux-dev DeepyTorch JSON file (workflow_flux_deepytorch.json). The following figure shows the workflow execution.

      文生图2

    After the process is complete, check the text-to-image execution time in the terminal where you started ComfyUI. This is the time displayed after Prompt executed in, as shown in the following figure. The execution time is significantly shorter when using the ComfyUI and DeepGPU configuration compared to the standard setup. For more performance details, see Performance comparison.

    时间

Node and workflow overview

DeepGPU node types

The plugin for the DeepGPU inference acceleration component includes several DeepGPU-specific nodes. In the ComfyUI interface, you can right-click a blank area and select to view the available DeepGPU node types.

deepseek节点

Navigate to the ComfyUI/custom_nodes/ComfyUI-deepgpu directory and open the __init__.py file to see the corresponding node types.

节点类型

The main node types are described below:

  • ApplyDeepyTorch node: For FLUX models, the ApplyDeepyTorch node depends on other nodes. You must insert it after a Load Diffusion Model, Load Flux LoRA, or Apply Flux IPAdapter node. For other models, this node must be inserted after a Load Checkpoint or LoraLoaderModelOnly node.

  • DeepyTorchSampler node: For FLUX models, this node is a new sampler that offers better performance than the XLabsSampler node (from x-flux-comfyui). When you use this node, you do not need to add an ApplyDeepyTorch node.

  • ApplyPulidFluxDeepyTorch node: For FLUX models, this node outperforms the ApplyPulidFlux node (from ComfyUI-PuLID-Flux-Enhanced) and replaces it. You do not need to add an ApplyDeepyTorch node when using this node.

Example workflows

This section provides example workflows for accelerating FLUX and SD model inference.

FLUX.1 models

  • For the base flux-dev model

    In the ComfyUI interface, insert an ApplyDeepyTorch node after the Load Diffusion Model node, as shown in the following figure:

    image

    Example workflow files:

  • For native ComfyUI LoRA models

    Note

    DeepGPU supports both the XLabs and native ComfyUI LoRA implementations for the FLUX.1-dev model.

    In the ComfyUI interface, insert an ApplyDeepyTorch node after the last LoraLoaderModelOnly node, as shown in the following figure:

    image

    Example workflow files:

  • For FLUX models with the Pulid plugin

    Note

    The Pulid plugin is from ComfyUI-PuLID-Flux-Enhanced.

    In the ComfyUI interface, replace the Apply Pulid Flux node with the ApplyPulidFluxDeepyTorch node. No ApplyDeepyTorch node is needed. The workflow is shown in the following figure:

    image

  • For custom FLUX LoRA models

    Before starting the ComfyUI service, you must set the following environment variable to execute the workflow in the ComfyUI interface.

    export DEEPGPU_ENABLE_FLUX_LORA=true

    In the ComfyUI interface, insert an ApplyDeepyTorch node after the last Load Flux LoRA node, as shown in the following figure:

    image

    Example workflow files:

  • For custom FLUX IP-Adapter

    Before starting the ComfyUI service, you must set the following environment variable to execute the workflow in the ComfyUI interface.

    export DEEPGPU_ENABLE_FLUX_LORA=true

    In the ComfyUI interface, insert an ApplyDeepyTorch node after the Apply Flux IPAdapter node, and replace the XLabsSampler node with the DeepyTorch Sampler node, as shown in the following figure:

    Note

    The test image used as input for this scenario is the XLabs-AI image.

    image

    Example workflow files:

  • For custom FLUX ControlNet

    In the ComfyUI interface, replace the XLabsSampler node with the DeepyTorch Sampler node, as shown in the following figure:

    Note

    The test image used as input for this scenario is the XLabs-AI image.

    image

    Example workflow files:

SD 1.5 models

  • For native ComfyUI LoRA models

    Note

    DeepGPU supports both the XLabs and native ComfyUI LoRA implementations for the FLUX.1-dev model.

    In the ComfyUI interface, insert an ApplyDeepyTorch node after the last LoraLoaderModelOnly node, as shown in the following figure:

    image

    Example workflow files:

  • For custom SD ControlNet

    In the ComfyUI interface, insert an ApplyDeepyTorch node after the Load Checkpoint node, as shown in the following figure:

    image

    Example workflow files:

SDXL models

To accelerate SDXL models, insert an ApplyDeepyTorch node after both the BASE and REFINER Load Checkpoint nodes, as shown in the following figure:

image

Example workflow files: