Build a text-to-image service with ComfyUI and SD/FLUX using Function Compute - Function Compute

This topic describes how to quickly build a text-to-image service using a ComfyUI + SD/FLUX image and the GPU function feature of Function Compute.

Solution overview

You can quickly build a text-to-image service with Alibaba Cloud Function Compute in two steps:

Choose a public image, or build and push a custom image.
You can either use a public ComfyUI + SD/FLUX image or build a custom image and push it to an image repository in Alibaba Cloud Container Registry.
Create a GPU function.
Create a GPU function in Alibaba Cloud Function Compute based on the image. After the function is created, the system provides a domain name that serves as the endpoint for your text-to-image service.

After you complete these steps, your text-to-image service is deployed. Users can access the service over the internet or an internal network. To access the function from a browser, you must configure a custom domain name for the function.

Step 1: Build a text-to-image service using Alibaba Cloud Function Compute

Image building and acceleration

Public image: You can use an existing public ComfyUI + SD/FLUX image for a quick and easy setup.

Custom image: You can build a custom image to meet your specific needs and optimize the user experience and performance.

Prepare a Dockerfile

When you build the image, follow the installation instructions in the README.md file of the ComfyUI project. Because ComfyUI depends on Python, choose a suitable Python image as the base image. You can obtain Python images from public image repositories such as Docker Hub. The following code provides a sample custom image.

# Dockerfile
FROM python:3.10

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app 

# Clone the ComfyUI repository
RUN  git  clone  https://github.com/comfyanonymous/ComfyUI.git

WORKDIR /app/ComfyUI

# Accelerate Python dependency package downloads
RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com

# Install PyTorch (NVIDIA CUDA version by default, modify as needed)
RUN pip install torch==2.5.0+cu124 torchvision==0.20.0+cu124 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124

# Install project dependencies
RUN pip install -r requirements.txt

# Expose the service port
EXPOSE 8188

# Startup command
CMD ["python", "main.py"]

Accelerate image downloads
1. If access to the public image repository is slow, you can configure Docker's registry-mirrors to increase the download speed. For example, on a Linux system, edit the /etc/docker/daemon.json file and add or modify the following configuration.
```
{
    ......
    "registry-mirrors": [
        "https://docker.nju.edu.cn",
        "https://dockerproxy.com",
        "https://docker.mirrors.ustc.edu.cn",
        ......
    ]
}
```
2. Reload the file and restart Docker for the changes to take effect.
```
systemctl daemon-reload  # Reload the configuration file.
systemctl restart docker # Restart the Docker service.
```
Alternatively, you can store frequently used base images in your own image repository or set up a private registry-mirror.
Accelerate Python dependency package downloads
When you install the packages according to the README.md file of the ComfyUI project, if the Python dependency packages download slowly, you can configure pip's index-url to speed up the process. For example, you can use the Alibaba Cloud or Tsinghua University Python mirror source.
```
......

RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com

......
```
Build the image
```
docker build -t comfyui:latest .
```

Push the image to Container Registry

In Function Compute, when you create a function that uses a custom image, you must use an image from an Alibaba Cloud Container Registry repository that is in the same region and under the same account. You can push the image to Container Registry in one of the following ways.

Create a GPU function

Function Compute provides an image acceleration feature for all GPU functions by default. This feature supports on-demand pulling and peer-to-peer (P2P) caching and requires no extra configuration. You can use this feature to quickly create large image containers and improve elasticity. For more information, see Create a GPU function. Note that Function Compute has limits on image size. Avoid including large model data in the image to prevent long image creation times. For more information about image size limits and how to request a quota increase, see Quotas and limits.

The following example shows the startup command and listener port for creating the function.

Listener Port: 8188. Note: The listener port must be the same as the port of the built image.
Startup Command: python main.py --listen 0.0.0.0

Configure a custom domain name for the function

The domain name that Alibaba Cloud Function Compute provides for a function is intended mainly for API access. To operate the service through the ComfyUI visualization interface, you must configure a custom domain name for the function.

Access the configured custom domain name in a browser to open the ComfyUI interface. The result is shown in the following figure.

Step 2: Model download and acceleration

You can download models from communities such as Hugging Face, ModelScope, and civitai. The ModelScope community provides many models and mirrors for organizations such as Black Forest Labs. If your network access to Hugging Face is restricted, you can use the ModelScope mirror source.

Function Compute recommends that you store model data in NAS or OSS. A Performance NAS instance provides an initial bandwidth of about 600 MB/s. OSS has a higher bandwidth limit and is less prone to bandwidth contention between function instances than NAS. You can also enable the OSS accelerator to obtain higher throughput. For more information, see Best practices for model storage on Function Compute GPU-accelerated instances in AI scenarios.

In the model library of ModelScope, search for Black Forest Labs. The following example shows how to download the FLUX.1-dev model.

Upload the downloaded models to OSS. Place the downloaded models in the specified folders under ComfyUI/models/. For example, place the flux1-dev.safetensors model in the unet folder. The following example shows the model paths.

Folder name	Downloaded model
checkpoints	dreamshaperXL_lightningDPMSDE.safetensors
clip	clip_l.safetensors
clip	t5xxl_fp8_e4m3fn.safetensors
clip_vision	clip_vision_g.safetensors
clip_vision	clip_vision_l.safetensors
controlnet	flux-canny-controlnet-v3.safetensors
loras	FLUX1_wukong_lora.safetensors
loras	araminta_k_flux_koda.safetensors
unet	flux1-dev.safetensors
vae	ae.safetensors

On the function details page, click the Configuration tab. In the left navigation pane, click the Permissions tab and configure a role with permissions to access OSS for the function. Then, click the Storage tab. In the Object Storage Service (OSS) section, click Edit. In the panel that appears, configure the parameters and click Deploy.
After the deployment is complete, log on to the function instance and confirm that the models are successfully mounted to the local directory of the function.
(Optional) Open ComfyUI. The default workflow requires a Checkpoint loader model. You also need to upload a Checkpoint model, such as dreamshaperXL_lightningDPMSDE.safetensors, to the Checkpoint file directory in OSS.
The path displayed after you log on to the instance is shown in the following figure.
Click Execute to view the output image.
Download the pre-configured workflow file FLUX-base.json. Open ComfyUI, choose Workflow > Open, and import the downloaded FLUX-base.json file. This workflow uses the t5xxl_fp8_e4m3fn.safetensors, ae.safetensors, and flux1-dev.safetensors models. Click Run. The result is shown in the following figure.

Step 3: Inference acceleration

Inference for a text-to-image service usually takes several seconds to tens of seconds. Inference acceleration not only shortens the response time and improves the user experience, but also reduces resource costs. The following sections describe two inference acceleration solutions.

Alibaba Cloud DeepGPU Toolkit (DeepGPU)

DeepGPU Toolkit (DeepGPU) also provides inference acceleration for ComfyUI + SD/FLUX. The DeepGPU Toolkit is a free toolset that enhances GPU computing services. It includes tools for rapid business deployment, GPU splitting, AI training and inference optimization, and dedicated acceleration for popular AI models. Currently, the inference component in the DeepGPU Toolkit can be used with Alibaba Cloud Function Compute for free. This allows users to use the GPU resources of Function Compute more conveniently and efficiently.

1. DeepGPU installation

Before you use DeepGPU to accelerate inference for ComfyUI + SD/FLUX, you must install the required dependency packages:

Install torch 2.5
```
RUN pip install torch==2.5.0
```

Install deepgpu-torch

The DeepGPU torch model acceleration package accelerates models such as FLUX.1 and VAE.

# ubuntu
RUN apt-get update
RUN apt-get install which curl iputils-ping -y
# centos
# RUN yum install which curl iputils -y

# First, install torch. deepgpu-torch depends on python3.10 and torch2.5.x+cu124 (if you need other versions, contact us).
RUN pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.html

Extract the downloaded plugin to the custom_nodes/ directory.

RUN wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz
RUN tar zxf ComfyUI-deepgpu.tar.gz  -C  /app/ComfyUI/custom_nodes

RUN pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html

2. ComfyUI source code modification (Important)

Dependency versions
Update x-flux-comfyui to the latest version from GitHub.
Native LoRA support
If you use LoraLoaderModelOnly to load a native ComfyUI LoRA model and use deepgpu-torch for acceleration, you must modify one line of code in the ComfyUI source code.
- If your ComfyUI version is earlier than v0.3.6
  Code path:
  https://github.com/comfyanonymous/ComfyUI/blob/v0.3.6/comfy/sd.py#L779
  Add the parameter weight_inplace_update=True to this line.
```
return comfy.model_patcher.ModelPatcher(model, load_device=load_device, off
load_device=offload_device, weight_inplace_update=True)
```
- If your ComfyUI version is v0.3.7 or later
  Code path:https://github.com/comfyanonymous/ComfyUI/blob/v0.3.7/comfy/sd.py#L785
  Add the parameter weight_inplace_update=True to this line.
```
model_patcher = comfy.model_patcher.ModelPatcher(model, load_device=load_de
vice, offload_device=model_management.unet_offload_device(), weight_inplace
_update=True)
```

The following code provides a sample image for installing DeepGPU.

# Dockerfile
FROM python:3.10

# Install system dependencies
RUN apt-get update && apt-get install -y \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Clone the ComfyUI repository
RUN  git  clone  https://github.com/comfyanonymous/ComfyUI.git

WORKDIR /app/ComfyUI

# Accelerate Python dependency package downloads
RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com

# Install PyTorch (NVIDIA CUDA version by default, modify as needed)
RUN pip install torch==2.5.0+cu124 torchvision==0.20.0+cu124 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124

# Install project dependencies
RUN pip install -r requirements.txt

# ubuntu
RUN apt-get update
RUN apt-get install which curl iputils-ping -y
# centos
# RUN yum install which curl iputils -y

# First, install torch. deepgpu-torch depends on python3.10 and torch2.5.x+cu124 (if you need other versions, contact the DeepGPU team).
RUN pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.html

# After downloading the plugin, extract it to the custom_nodes/ directory.
RUN wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz
RUN tar zxf ComfyUI-deepgpu.tar.gz  -C  /app/ComfyUI/custom_nodes

RUN pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html

# Expose the service port
EXPOSE 8188

# Startup command
CMD ["python", "main.py"]

3. Environment variable configuration

When you use DeepGPU in Alibaba Cloud Function Compute, you must configure the DEEPGPU_PUB_LS=true and DEEPGPU_ENABLE_FLUX_LORA=true environment variables.

4. (Optional) Configure GPU idle mode

By configuring provisioned instances, you can reduce request latency caused by instance cold starts. You can also configure scaling rules for provisioned instances, such as scheduled scaling and metric-based scaling, to improve instance utilization and prevent resource waste.

5. How to use the DeepGPU ComfyUI plugin

The plugin contains four types of DeepGPU nodes. You can find them by entering DeepyTorch in the search box of the ComfyUI interface:

Apply DeepyTorch to diffusion model
Apply DeepyTorch to vae model
DeepTorch Sampler to replace XlabsSampler

Usage guide

Insertion point: For Flux, insert the Apply DeepyTorch to diffusion model node after the Load Diffusion Model, Load Flux LoRA, or Apply Flux IPAdapter node. For other models, insert it after the Load Checkpoint or LoraLoaderModelOnly node. The following figure shows an example of importing.

Sampler replacement: For Flux, use the DeepTorch Sampler to replace XlabsSampler node to replace the XLabsSampler node.

**ComfyUI TorchCompile* nodes**

Currently, several open-source inference acceleration nodes are available, including but not limited to:

TorchCompileModel
TorchCompileVAE
TorchCompileControlNet
TorchCompileModelFluxAdvanced

These nodes use the just-in-time (JIT) compilation capability of PyTorch to optimize and accelerate model execution. They also improve resource utilization by converting dynamic computation graphs into efficient static code.

Note

Currently, most of these nodes are in the beta or experimental stage.

**Usage and performance comparison of TorchCompile* nodes and DeepGPU**

We compared the performance of ComfyUI's built-in TorchCompile nodes and DeepGPU nodes for inference acceleration. We analyzed their usage, acceleration effects, and applicable scenarios to provide a reference.

Model list

Folder name	Downloaded model
clip	clip_l.safetensors
	t5xxl_fp8_e4m3fn.safetensors
	clip_vision_l.safetensors
loras	FLUX1_wukong_lora.safetensors
unet	flux1-dev.safetensors
vae	ae.safetensors

Configuration parameters

Sampler
- steps: 20
Empty Latent Image
- width: 768
- height: 1024

Test platform

Alibaba Cloud Function Compute fc.gpu.ada.1 instance.

Inference acceleration framework scenario support matrix & Inference acceleration effects

The test results show that both the TorchCompile series nodes and DeepGPU cover most of the ComfyUI + SD/FLUX scenarios and achieve about 20% to 30% inference acceleration in Flux-related scenarios.

Test workflows

The following table provides the Workflows.json files for inference acceleration that are used by different models.

Scenario	Workflow
FLUX only	FLUX-base.json FLUX-torchcompile.json FLUX-DeepGPU.json
FLUX + Lora	FLUX-Lora-base.json FLUX-Lora-torchcompile.json FLUX-Lora-DeepGPU.json
FLUX + ComfyUI Lora	FLUX-ComfyUI-Lora.json FLUX-ComfyUI-Lora-torchcompile.json FLUX-ComfyUI-Lora-DeepGPU.json
SDXL	SDXL-base.json SDXL-torchcompile.json SDXL-DeepGPU.json

References

Long-running GPU instances may fail. Function Compute provides a default request-based health check mechanism and lets you configure custom instance health check logic.
Function Compute provides monitoring reports for functions and instances by default, which you can view without extra configuration. To collect function logs for troubleshooting, you can configure log collection.