This topic describes how to quickly build a text-to-image service using a ComfyUI + SD/FLUX image and the GPU function feature of Function Compute.
Solution overview
You can quickly build a text-to-image service with Alibaba Cloud Function Compute in two steps:
Choose a public image, or build and push a custom image.
You can either use a public ComfyUI + SD/FLUX image or build a custom image and push it to an image repository in Alibaba Cloud Container Registry.
Create a GPU function.
Create a GPU function in Alibaba Cloud Function Compute based on the image. After the function is created, the system provides a domain name that serves as the endpoint for your text-to-image service.
After you complete these steps, your text-to-image service is deployed. Users can access the service over the internet or an internal network. To access the function from a browser, you must configure a custom domain name for the function.
Step 1: Build a text-to-image service using Alibaba Cloud Function Compute
Image building and acceleration
Public image: You can use an existing public ComfyUI + SD/FLUX image for a quick and easy setup.
Custom image: You can build a custom image to meet your specific needs and optimize the user experience and performance.
Prepare a Dockerfile
When you build the image, follow the installation instructions in the README.md file of the ComfyUI project. Because ComfyUI depends on Python, choose a suitable Python image as the base image. You can obtain Python images from public image repositories such as Docker Hub. The following code provides a sample custom image.
# Dockerfile FROM python:3.10 # Install system dependencies RUN apt-get update && apt-get install -y \ git \ wget \ && rm -rf /var/lib/apt/lists/* WORKDIR /app # Clone the ComfyUI repository RUN git clone https://github.com/comfyanonymous/ComfyUI.git WORKDIR /app/ComfyUI # Accelerate Python dependency package downloads RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com # Install PyTorch (NVIDIA CUDA version by default, modify as needed) RUN pip install torch==2.5.0+cu124 torchvision==0.20.0+cu124 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124 # Install project dependencies RUN pip install -r requirements.txt # Expose the service port EXPOSE 8188 # Startup command CMD ["python", "main.py"]Accelerate image downloads
If access to the public image repository is slow, you can configure Docker's registry-mirrors to increase the download speed. For example, on a Linux system, edit the /etc/docker/daemon.json file and add or modify the following configuration.
{ ...... "registry-mirrors": [ "https://docker.nju.edu.cn", "https://dockerproxy.com", "https://docker.mirrors.ustc.edu.cn", ...... ] }Reload the file and restart Docker for the changes to take effect.
systemctl daemon-reload # Reload the configuration file. systemctl restart docker # Restart the Docker service.
Alternatively, you can store frequently used base images in your own image repository or set up a private registry-mirror.
Accelerate Python dependency package downloads
When you install the packages according to the README.md file of the ComfyUI project, if the Python dependency packages download slowly, you can configure pip's
index-urlto speed up the process. For example, you can use the Alibaba Cloud or Tsinghua University Python mirror source....... RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com ......Build the image
docker build -t comfyui:latest .
Push the image to Container Registry
In Function Compute, when you create a function that uses a custom image, you must use an image from an Alibaba Cloud Container Registry repository that is in the same region and under the same account. You can push the image to Container Registry in one of the following ways.
Create a GPU function
Function Compute provides an image acceleration feature for all GPU functions by default. This feature supports on-demand pulling and peer-to-peer (P2P) caching and requires no extra configuration. You can use this feature to quickly create large image containers and improve elasticity. For more information, see Create a GPU function. Note that Function Compute has limits on image size. Avoid including large model data in the image to prevent long image creation times. For more information about image size limits and how to request a quota increase, see Quotas and limits.
The following example shows the startup command and listener port for creating the function.
Listener Port: 8188. Note: The listener port must be the same as the port of the built image.
Startup Command: python main.py --listen 0.0.0.0

Configure a custom domain name for the function
The domain name that Alibaba Cloud Function Compute provides for a function is intended mainly for API access. To operate the service through the ComfyUI visualization interface, you must configure a custom domain name for the function.

Access the configured custom domain name in a browser to open the ComfyUI interface. The result is shown in the following figure.

Step 2: Model download and acceleration
You can download models from communities such as Hugging Face, ModelScope, and civitai. The ModelScope community provides many models and mirrors for organizations such as Black Forest Labs. If your network access to Hugging Face is restricted, you can use the ModelScope mirror source.
Function Compute recommends that you store model data in NAS or OSS. A Performance NAS instance provides an initial bandwidth of about 600 MB/s. OSS has a higher bandwidth limit and is less prone to bandwidth contention between function instances than NAS. You can also enable the OSS accelerator to obtain higher throughput. For more information, see Best practices for model storage on Function Compute GPU-accelerated instances in AI scenarios.
In the model library of ModelScope, search for Black Forest Labs. The following example shows how to download the FLUX.1-dev model.

Upload the downloaded models to OSS. Place the downloaded models in the specified folders under ComfyUI/models/. For example, place the
flux1-dev.safetensorsmodel in theunetfolder. The following example shows the model paths.
Folder name
Downloaded model
checkpoints
dreamshaperXL_lightningDPMSDE.safetensors
clip
clip_l.safetensors
t5xxl_fp8_e4m3fn.safetensors
clip_vision
clip_vision_g.safetensors
clip_vision_l.safetensors
controlnet
flux-canny-controlnet-v3.safetensors
loras
FLUX1_wukong_lora.safetensors
araminta_k_flux_koda.safetensors
unet
flux1-dev.safetensors
vae
ae.safetensors

On the function details page, click the Configuration tab. In the left navigation pane, click the Permissions tab and configure a role with permissions to access OSS for the function. Then, click the Storage tab. In the Object Storage Service (OSS) section, click Edit. In the panel that appears, configure the parameters and click Deploy.

After the deployment is complete, log on to the function instance and confirm that the models are successfully mounted to the local directory of the function.

(Optional) Open ComfyUI. The default workflow requires a Checkpoint loader model. You also need to upload a Checkpoint model, such as
dreamshaperXL_lightningDPMSDE.safetensors, to the Checkpoint file directory in OSS.The path displayed after you log on to the instance is shown in the following figure.

Click Execute to view the output image.

Download the pre-configured workflow file FLUX-base.json. Open ComfyUI, choose , and import the downloaded FLUX-base.json file. This workflow uses the
t5xxl_fp8_e4m3fn.safetensors,ae.safetensors, andflux1-dev.safetensorsmodels. Click Run. The result is shown in the following figure.

Step 3: Inference acceleration
Inference for a text-to-image service usually takes several seconds to tens of seconds. Inference acceleration not only shortens the response time and improves the user experience, but also reduces resource costs. The following sections describe two inference acceleration solutions.
Alibaba Cloud DeepGPU Toolkit (DeepGPU)
DeepGPU Toolkit (DeepGPU) also provides inference acceleration for ComfyUI + SD/FLUX. The DeepGPU Toolkit is a free toolset that enhances GPU computing services. It includes tools for rapid business deployment, GPU splitting, AI training and inference optimization, and dedicated acceleration for popular AI models. Currently, the inference component in the DeepGPU Toolkit can be used with Alibaba Cloud Function Compute for free. This allows users to use the GPU resources of Function Compute more conveniently and efficiently.
1. DeepGPU installation
Before you use DeepGPU to accelerate inference for ComfyUI + SD/FLUX, you must install the required dependency packages:
Install torch 2.5
RUN pip install torch==2.5.0Install deepgpu-torch
The DeepGPU torch model acceleration package accelerates models such as FLUX.1 and VAE.
# ubuntu RUN apt-get update RUN apt-get install which curl iputils-ping -y # centos # RUN yum install which curl iputils -y # First, install torch. deepgpu-torch depends on python3.10 and torch2.5.x+cu124 (if you need other versions, contact us). RUN pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.htmlExtract the downloaded plugin to the custom_nodes/ directory.
RUN wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz RUN tar zxf ComfyUI-deepgpu.tar.gz -C /app/ComfyUI/custom_nodes RUN pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html
2. ComfyUI source code modification (Important)
Dependency versions
Update x-flux-comfyui to the latest version from GitHub.
Native LoRA support
If you use LoraLoaderModelOnly to load a native ComfyUI LoRA model and use deepgpu-torch for acceleration, you must modify one line of code in the ComfyUI source code.
If your ComfyUI version is earlier than v0.3.6
Code path:
https://github.com/comfyanonymous/ComfyUI/blob/v0.3.6/comfy/sd.py#L779

Add the parameter
weight_inplace_update=Trueto this line.return comfy.model_patcher.ModelPatcher(model, load_device=load_device, off load_device=offload_device, weight_inplace_update=True)If your ComfyUI version is v0.3.7 or later
Code path:https://github.com/comfyanonymous/ComfyUI/blob/v0.3.7/comfy/sd.py#L785

Add the parameter
weight_inplace_update=Trueto this line.model_patcher = comfy.model_patcher.ModelPatcher(model, load_device=load_de vice, offload_device=model_management.unet_offload_device(), weight_inplace _update=True)
The following code provides a sample image for installing DeepGPU.
# Dockerfile FROM python:3.10 # Install system dependencies RUN apt-get update && apt-get install -y \ git \ wget \ && rm -rf /var/lib/apt/lists/* WORKDIR /app # Clone the ComfyUI repository RUN git clone https://github.com/comfyanonymous/ComfyUI.git WORKDIR /app/ComfyUI # Accelerate Python dependency package downloads RUN pip config set global.index-url https://mirrors.cloud.aliyuncs.com/pypi/simple RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple RUN pip config set install.trusted-host mirrors.cloud.aliyuncs.com # Install PyTorch (NVIDIA CUDA version by default, modify as needed) RUN pip install torch==2.5.0+cu124 torchvision==0.20.0+cu124 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124 # Install project dependencies RUN pip install -r requirements.txt # ubuntu RUN apt-get update RUN apt-get install which curl iputils-ping -y # centos # RUN yum install which curl iputils -y # First, install torch. deepgpu-torch depends on python3.10 and torch2.5.x+cu124 (if you need other versions, contact the DeepGPU team). RUN pip install deepgpu-torch==0.0.15+torch2.5.0cu124 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/deepytorch/index.html # After downloading the plugin, extract it to the custom_nodes/ directory. RUN wget https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/flux/20250102/ComfyUI-deepgpu.tar.gz RUN tar zxf ComfyUI-deepgpu.tar.gz -C /app/ComfyUI/custom_nodes RUN pip install deepgpu-comfyui==1.0.8 -f https://aiacc-inference-public-v2.oss-cn-hangzhou.aliyuncs.com/deepgpu/comfyui/index.html # Expose the service port EXPOSE 8188 # Startup command CMD ["python", "main.py"]
3. Environment variable configuration
When you use DeepGPU in Alibaba Cloud Function Compute, you must configure the DEEPGPU_PUB_LS=true and DEEPGPU_ENABLE_FLUX_LORA=true environment variables.

4. (Optional) Configure GPU idle mode
By configuring provisioned instances, you can reduce request latency caused by instance cold starts. You can also configure scaling rules for provisioned instances, such as scheduled scaling and metric-based scaling, to improve instance utilization and prevent resource waste.
5. How to use the DeepGPU ComfyUI plugin
The plugin contains four types of DeepGPU nodes. You can find them by entering DeepyTorch in the search box of the ComfyUI interface:
Apply DeepyTorch to diffusion model
Apply DeepyTorch to vae model
DeepTorch Sampler to replace XlabsSampler

Usage guide
Insertion point: For Flux, insert the Apply DeepyTorch to diffusion model node after the Load Diffusion Model, Load Flux LoRA, or Apply Flux IPAdapter node. For other models, insert it after the Load Checkpoint or LoraLoaderModelOnly node. The following figure shows an example of importing.

Sampler replacement: For Flux, use the DeepTorch Sampler to replace XlabsSampler node to replace the XLabsSampler node.
ComfyUI TorchCompile* nodes
Currently, several open-source inference acceleration nodes are available, including but not limited to:
TorchCompileModel
TorchCompileVAE
TorchCompileControlNet
TorchCompileModelFluxAdvanced
These nodes use the just-in-time (JIT) compilation capability of PyTorch to optimize and accelerate model execution. They also improve resource utilization by converting dynamic computation graphs into efficient static code.
Currently, most of these nodes are in the beta or experimental stage.

Usage and performance comparison of TorchCompile* nodes and DeepGPU
We compared the performance of ComfyUI's built-in TorchCompile nodes and DeepGPU nodes for inference acceleration. We analyzed their usage, acceleration effects, and applicable scenarios to provide a reference.
Model list
Folder name | Downloaded model |
clip | clip_l.safetensors |
t5xxl_fp8_e4m3fn.safetensors | |
clip_vision_l.safetensors | |
loras | FLUX1_wukong_lora.safetensors |
unet | flux1-dev.safetensors |
vae | ae.safetensors |
Configuration parameters
Sampler
steps: 20
Empty Latent Image
width: 768
height: 1024
Test platform
Alibaba Cloud Function Compute fc.gpu.ada.1 instance.
Inference acceleration framework scenario support matrix & Inference acceleration effects


The test results show that both the TorchCompile series nodes and DeepGPU cover most of the ComfyUI + SD/FLUX scenarios and achieve about 20% to 30% inference acceleration in Flux-related scenarios.
Test workflows
The following table provides the Workflows.json files for inference acceleration that are used by different models.
Scenario | Workflow |
FLUX only |
|
FLUX + Lora |
|
FLUX + ComfyUI Lora |
|
SDXL |
|
References
Long-running GPU instances may fail. Function Compute provides a default request-based health check mechanism and lets you configure custom instance health check logic.
Function Compute provides monitoring reports for functions and instances by default, which you can view without extra configuration. To collect function logs for troubleshooting, you can configure log collection.






