All Products
Search
Document Center

Function Compute:Build a text-to-image service with ComfyUI and SD/FLUX using Function Compute

Last Updated:Sep 05, 2025

This topic describes how to quickly build a text-to-image service using a ComfyUI + SD/FLUX image and the GPU function feature of Function Compute.

Solution overview

You can quickly build a text-to-image service with Alibaba Cloud Function Compute in two steps:

  1. Choose a public image, or build and push a custom image.

    You can either use a public ComfyUI + SD/FLUX image or build a custom image and push it to an image repository in Alibaba Cloud Container Registry.

  2. Create a GPU function.

    Create a GPU function in Alibaba Cloud Function Compute based on the image. After the function is created, the system provides a domain name that serves as the endpoint for your text-to-image service.

After you complete these steps, your text-to-image service is deployed. Users can access the service over the internet or an internal network. To access the function from a browser, you must configure a custom domain name for the function.

Step 1: Build a text-to-image service using Alibaba Cloud Function Compute

Image building and acceleration

Public image: You can use an existing public ComfyUI + SD/FLUX image for a quick and easy setup.

Custom image: You can build a custom image to meet your specific needs and optimize the user experience and performance.

  1. Prepare a Dockerfile

    When you build the image, follow the installation instructions in the README.md file of the ComfyUI project. Because ComfyUI depends on Python, choose a suitable Python image as the base image. You can obtain Python images from public image repositories such as Docker Hub. The following code provides a sample custom image.

    # Dockerfile
    FROM python:3.10
    
    # Install system dependencies
    RUN apt-get update && apt-get install -y \
        git \
        wget \
        && rm -rf /var/lib/apt/lists/*
    
    WORKDIR /app 
    
    # Clone the ComfyUI repository
    RUN  git  clone  https://github.com/comfyanonymous/ComfyUI.git
    
    WORKDIR /app/ComfyUI
    
    # Accelerate Python dependency package downloads
    RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple
    RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
    RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com
    
    # Install PyTorch (NVIDIA CUDA version by default, modify as needed)
    RUN pip install torch==2.5.0+cu124 torchvision==0.20.0+cu124 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124
    
    # Install project dependencies
    RUN pip install -r requirements.txt
    
    # Expose the service port
    EXPOSE 8188
    
    # Startup command
    CMD ["python", "main.py"]
  2. Accelerate image downloads

    1. If access to the public image repository is slow, you can configure Docker's registry-mirrors to increase the download speed. For example, on a Linux system, edit the /etc/docker/daemon.json file and add or modify the following configuration.

      {
          ......
          "registry-mirrors": [
              "https://docker.nju.edu.cn",
              "https://dockerproxy.com",
              "https://docker.mirrors.ustc.edu.cn",
              ......
          ]
      }
    2. Reload the file and restart Docker for the changes to take effect.

      systemctl daemon-reload  # Reload the configuration file.
      systemctl restart docker # Restart the Docker service.

    Alternatively, you can store frequently used base images in your own image repository or set up a private registry-mirror.

  3. Accelerate Python dependency package downloads

    When you install the packages according to the README.md file of the ComfyUI project, if the Python dependency packages download slowly, you can configure pip's index-url to speed up the process. For example, you can use the Alibaba Cloud or Tsinghua University Python mirror source.

    ......
    
    RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple
    RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
    RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com
    
    ......
  4. Build the image

    docker build -t comfyui:latest .

Push the image to Container Registry

In Function Compute, when you create a function that uses a custom image, you must use an image from an Alibaba Cloud Container Registry repository that is in the same region and under the same account. You can push the image to Container Registry in one of the following ways.

Create a GPU function

Function Compute provides an image acceleration feature for all GPU functions by default. This feature supports on-demand pulling and peer-to-peer (P2P) caching and requires no extra configuration. You can use this feature to quickly create large image containers and improve elasticity. For more information, see Create a GPU function. Note that Function Compute has limits on image size. Avoid including large model data in the image to prevent long image creation times. For more information about image size limits and how to request a quota increase, see Quotas and limits.

The following example shows the startup command and listener port for creating the function.

  • Listener Port: 8188. Note: The listener port must be the same as the port of the built image.

  • Startup Command: python main.py --listen 0.0.0.0

image

Configure a custom domain name for the function

The domain name that Alibaba Cloud Function Compute provides for a function is intended mainly for API access. To operate the service through the ComfyUI visualization interface, you must configure a custom domain name for the function.

image

Access the configured custom domain name in a browser to open the ComfyUI interface. The result is shown in the following figure.

image

Step 2: Model download and acceleration

You can download models from communities such as Hugging Face, ModelScope, and civitai. The ModelScope community provides many models and mirrors for organizations such as Black Forest Labs. If your network access to Hugging Face is restricted, you can use the ModelScope mirror source.

Function Compute recommends that you store model data in NAS or OSS. A Performance NAS instance provides an initial bandwidth of about 600 MB/s. OSS has a higher bandwidth limit and is less prone to bandwidth contention between function instances than NAS. You can also enable the OSS accelerator to obtain higher throughput. For more information, see Best practices for model storage on Function Compute GPU-accelerated instances in AI scenarios.

  1. In the model library of ModelScope, search for Black Forest Labs. The following example shows how to download the FLUX.1-dev model.

    image

  2. Upload the downloaded models to OSS. Place the downloaded models in the specified folders under ComfyUI/models/. For example, place the flux1-dev.safetensors model in the unet folder. The following example shows the model paths.

    image

    Folder name

    Downloaded model

    checkpoints

    dreamshaperXL_lightningDPMSDE.safetensors

    clip

    clip_l.safetensors

    t5xxl_fp8_e4m3fn.safetensors

    clip_vision

    clip_vision_g.safetensors

    clip_vision_l.safetensors

    controlnet

    flux-canny-controlnet-v3.safetensors

    loras

    FLUX1_wukong_lora.safetensors

    araminta_k_flux_koda.safetensors

    unet

    flux1-dev.safetensors

    vae

    ae.safetensors

    image

  3. On the function details page, click the Configuration tab. In the left navigation pane, click the Permissions tab and configure a role with permissions to access OSS for the function. Then, click the Storage tab. In the Object Storage Service (OSS) section, click Edit. In the panel that appears, configure the parameters and click Deploy.

    image

  4. After the deployment is complete, log on to the function instance and confirm that the models are successfully mounted to the local directory of the function.

    image

  5. (Optional) Open ComfyUI. The default workflow requires a Checkpoint loader model. You also need to upload a Checkpoint model, such as dreamshaperXL_lightningDPMSDE.safetensors, to the Checkpoint file directory in OSS.

    The path displayed after you log on to the instance is shown in the following figure.

    image

    Click Execute to view the output image.

    image

  6. Download the pre-configured workflow file FLUX-base.json. Open ComfyUI, choose Workflow > Open, and import the downloaded FLUX-base.json file. This workflow uses the t5xxl_fp8_e4m3fn.safetensors, ae.safetensors, and flux1-dev.safetensors models. Click Run. The result is shown in the following figure.

    image

    image.png

Step 3: Inference acceleration

Inference for a text-to-image service usually takes several seconds to tens of seconds. Inference acceleration not only shortens the response time and improves the user experience, but also reduces resource costs. The following sections describe two inference acceleration solutions.

Alibaba Cloud DeepGPU Toolkit (DeepGPU)

DeepGPU Toolkit (DeepGPU) also provides inference acceleration for ComfyUI + SD/FLUX. The DeepGPU Toolkit is a free toolset that enhances GPU computing services. It includes tools for rapid business deployment, GPU splitting, AI training and inference optimization, and dedicated acceleration for popular AI models. Currently, the inference component in the DeepGPU Toolkit can be used with Alibaba Cloud Function Compute for free. This allows users to use the GPU resources of Function Compute more conveniently and efficiently.

1. DeepGPU installation

Before you use DeepGPU to accelerate inference for ComfyUI + SD/FLUX, you must install the required dependency packages:

  • Install torch 2.5

    RUN pip install torch==2.5.0
  • Install deepgpu-torch

    The DeepGPU torch model acceleration package accelerates models such as FLUX.1 and VAE.

    # ubuntu
    RUN apt-get update
    RUN apt-get install which curl iputils-ping -y
    # centos
    # RUN yum install which curl iputils -y
    
    # First, install torch. deepgpu-torch depends on python3.10 and torch2.5.x+cu124 (if you need other versions, contact us).
    RUN pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.html

    Extract the downloaded plugin to the custom_nodes/ directory.

    RUN wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz
    RUN tar zxf ComfyUI-deepgpu.tar.gz  -C  /app/ComfyUI/custom_nodes
    
    RUN pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html
2. ComfyUI source code modification (Important)
  1. Dependency versions

    Update x-flux-comfyui to the latest version from GitHub.

  2. Native LoRA support

    If you use LoraLoaderModelOnly to load a native ComfyUI LoRA model and use deepgpu-torch for acceleration, you must modify one line of code in the ComfyUI source code.

  3. The following code provides a sample image for installing DeepGPU.

    # Dockerfile
    FROM python:3.10
    
    # Install system dependencies
    RUN apt-get update && apt-get install -y \
        git \
        wget \
        && rm -rf /var/lib/apt/lists/*
    
    WORKDIR /app
    
    # Clone the ComfyUI repository
    RUN  git  clone  https://github.com/comfyanonymous/ComfyUI.git
    
    WORKDIR /app/ComfyUI
    
    # Accelerate Python dependency package downloads
    RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple
    RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
    RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com
    
    # Install PyTorch (NVIDIA CUDA version by default, modify as needed)
    RUN pip install torch==2.5.0+cu124 torchvision==0.20.0+cu124 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124
    
    # Install project dependencies
    RUN pip install -r requirements.txt
    
    # ubuntu
    RUN apt-get update
    RUN apt-get install which curl iputils-ping -y
    # centos
    # RUN yum install which curl iputils -y
    
    # First, install torch. deepgpu-torch depends on python3.10 and torch2.5.x+cu124 (if you need other versions, contact the DeepGPU team).
    RUN pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.html
    
    # After downloading the plugin, extract it to the custom_nodes/ directory.
    RUN wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz
    RUN tar zxf ComfyUI-deepgpu.tar.gz  -C  /app/ComfyUI/custom_nodes
    
    RUN pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html
    
    # Expose the service port
    EXPOSE 8188
    
    # Startup command
    CMD ["python", "main.py"]
3. Environment variable configuration

When you use DeepGPU in Alibaba Cloud Function Compute, you must configure the DEEPGPU_PUB_LS=true and DEEPGPU_ENABLE_FLUX_LORA=true environment variables.

image

4. (Optional) Configure GPU idle mode

By configuring provisioned instances, you can reduce request latency caused by instance cold starts. You can also configure scaling rules for provisioned instances, such as scheduled scaling and metric-based scaling, to improve instance utilization and prevent resource waste.

5. How to use the DeepGPU ComfyUI plugin

The plugin contains four types of DeepGPU nodes. You can find them by entering DeepyTorch in the search box of the ComfyUI interface:

  • Apply DeepyTorch to diffusion model

  • Apply DeepyTorch to vae model

  • DeepTorch Sampler to replace XlabsSampler

image

Usage guide

Insertion point: For Flux, insert the Apply DeepyTorch to diffusion model node after the Load Diffusion Model, Load Flux LoRA, or Apply Flux IPAdapter node. For other models, insert it after the Load Checkpoint or LoraLoaderModelOnly node. The following figure shows an example of importing.

image

Sampler replacement: For Flux, use the DeepTorch Sampler to replace XlabsSampler node to replace the XLabsSampler node.

ComfyUI TorchCompile* nodes

Currently, several open-source inference acceleration nodes are available, including but not limited to:

  • TorchCompileModel

  • TorchCompileVAE

  • TorchCompileControlNet

  • TorchCompileModelFluxAdvanced

These nodes use the just-in-time (JIT) compilation capability of PyTorch to optimize and accelerate model execution. They also improve resource utilization by converting dynamic computation graphs into efficient static code.

Note

Currently, most of these nodes are in the beta or experimental stage.

image

Usage and performance comparison of TorchCompile* nodes and DeepGPU

We compared the performance of ComfyUI's built-in TorchCompile nodes and DeepGPU nodes for inference acceleration. We analyzed their usage, acceleration effects, and applicable scenarios to provide a reference.

Model list

Folder name

Downloaded model

clip

clip_l.safetensors

t5xxl_fp8_e4m3fn.safetensors

clip_vision_l.safetensors

loras

FLUX1_wukong_lora.safetensors

unet

flux1-dev.safetensors

vae

ae.safetensors

Configuration parameters
  • Sampler

    • steps: 20

  • Empty Latent Image

    • width: 768

    • height: 1024

Test platform

Alibaba Cloud Function Compute fc.gpu.ada.1 instance.

Inference acceleration framework scenario support matrix & Inference acceleration effects

image

image

The test results show that both the TorchCompile series nodes and DeepGPU cover most of the ComfyUI + SD/FLUX scenarios and achieve about 20% to 30% inference acceleration in Flux-related scenarios.

Test workflows

The following table provides the Workflows.json files for inference acceleration that are used by different models.

Scenario

Workflow

FLUX only

image.png

image.png

FLUX-base.json

FLUX-torchcompile.json

FLUX-DeepGPU.json

FLUX + Lora

image.png

image.png

FLUX-Lora-base.json

FLUX-Lora-torchcompile.json

FLUX-Lora-DeepGPU.json

FLUX + ComfyUI Lora

image.png

image.png

FLUX-ComfyUI-Lora.json

FLUX-ComfyUI-Lora-torchcompile.json

FLUX-ComfyUI-Lora-DeepGPU.json

SDXL

image.png

image.png

SDXL-base.json

SDXL-torchcompile.json

SDXL-DeepGPU.json

References

  • Long-running GPU instances may fail. Function Compute provides a default request-based health check mechanism and lets you configure custom instance health check logic.

  • Function Compute provides monitoring reports for functions and instances by default, which you can view without extra configuration. To collect function logs for troubleshooting, you can configure log collection.