All Products
Search
Document Center

Platform For AI:Deploy a LoRA SD model using Kohya_ss in EAS

Last Updated:Nov 28, 2025

This topic describes how to deploy open source Kohya_ss and train a Low-Rank Adaptation (LoRA) model using Kohya_ss in the Elastic Algorithm Service (EAS) of Platform for AI (PAI). In AI painting scenarios, you can apply the trained LoRA model in the Stable Diffusion (SD) service as an auxiliary model to enhance SD painting performance.

Prerequisites

  • EAS is activated and the default workspace is created. For more information, see Activate PAI and create the default workspace.

  • If you use a RAM user to deploy the model, make sure that the RAM user has the management permissions on EAS. For more information, see Grant the permissions that are required to use EAS.

  • An Object Storage Service (OSS) bucket is created in the region where the PAI workspace resides. The OSS bucket is used to store training files, output model files, and logs. For information about how to upload objects, see Upload objects.

Preparations

  1. Log on to the OSS console. Navigate to the path of the bucket that you created for training. The bucket must be in the same region as the PAI workspace. Example: oss://kohya-demo/kohya/.

  2. Create a project folder in the bucket path. Example: KaraDetroit_loar. Create the following folders in this project folder: Image, Log, and Model. If you have a JSON configuration file, you can also upload it to this project folder.

    image.png

    • Image: stores the source files used for training.

    • Model: stores the output model file.

    • Log: stores the logs.

    • SS_config.json: a JSON file used to configure multiple parameters simultaneously. This file is optional. You can modify related parameters in the JSON file, such as the folder path or output model name. For more information about the configuration, see GitHub. The sample file SS_config.json provides a reference.

  3. Upload the images you want to use for training to the Image folder. The sample file named 100_pic.zip is used in this example. After you download and extract the file, upload the folder to OSS. The following figure shows the result:

    image.png

    Important
    • The images must be in one of the following formats: .png, .jpg, .jpeg, .webp, or .bmp.

    • Each image must have a description file with the same name. The description file can be in the .txt format. The description must be on the first line of the file. Separate multiple descriptions with commas (,).

    • The name of the packaged folder must be in the number_name format. Example: 100_pic. The name must be a string that meets the requirements of file names in OSS. The number indicates the number of training sessions for each image. The value must be greater than or equal to 100. The total number of training sessions must be greater than 1500.

      • For example, if the folder contains 10 images, each image must be trained 1500/10=150 times. The value of the Number parameter is 150.

      • For example, if the folder contains 20 images, each image must be trained 1500/20=75(<100) times. In this case, the value of the Number parameter is increased to 100 because the calculation value is 75, which is less than 100.

Deploy a Kohya_ss service

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).

  2. Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

  3. On the Custom Deployment page, configure the parameters using a form or a JSON script.

    Configure parameters using a form

    Parameter

    Description

    Basic Information

    Service Name

    The name of the service. The name kohya_ss_demo is used in this example.

    Environment Information

    Deployment Method

    Select Image-based Deployment, and select Enable Web App.

    Image Configuration

    Select Kohya_ss > kohya_ss:2.2 from the Alibaba Cloud Image list.

    Note

    You can select the latest version for the image when you deploy the model service.

    Mount storage

    Select Mount OSS Path, and configure the following parameters:

    • Uri: Select an OSS path in the same region as the workspace. The path oss://kohya-demo/kohya/ is used in this example.

    • Mount Path: You can use a custom mount path. The path /workspace is used in this example.

      Important

      Turn off Enable Read-only Mode. Otherwise, the model file cannot be exported to OSS.

    Command

    After you select an image, the system automatically configures the command to run. Example: python -u kohya_gui.py --listen=0.0.0.0 --server_port=8000 --headless.

    • --listen: Associates the program to the specified on-premises IP address to receive and process external requests.

    • --server_port: the port number for listening.

    Resource Information

    Resource Type

    Select Public Resources.

    Deployment Resources

    In terms of cost-effectiveness, we recommend that you use the GPU > ml.gu7i.c16m60.1-gu30 instance type. In this example, the ml.gu7i.c8m30.1-gu30 instance type is used.

    Configure parameters using a JSON script

    Click Edit in the Service Configuration section, and configure the JSON script.

    Sample JSON file:

    Important

    Replace the value of "name" in line 4 and the value of "oss" in line 18 with actual values.

    {
        "metadata":
        {
            "name": "kohya_ss_demo",
            "instance": 1,
            "enable_webservice": true
        },
        "cloud":
        {
            "computing":
            {
                "instance_type": "ecs.gn6e-c12g1.12xlarge",
                "instances": null
            }
        },
        "storage": [
        {
            "oss":
            {
                "path": "oss://kohya-demo/kohya/",
                "readOnly": false
            },
            "properties":
            {
                "resource_type": "model"
            },
            "mount_path": "/workspace"
        }],
        "containers": [
        {
            "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/kohya_ss:1.2",
            "script": "python -u kohya_gui.py --listen=0.0.0.0 --server_port=8000 --headless",
            "port": 8000
        }]
    }
  4. Check the form configuration, and then click Deploy. The model deployment requires a few minutes to complete. If the Service Status is Running, the service is deployed.

Train a LoRA model

  1. Click Web Application in the Overview of the service you want to view.

  2. Click LoRA (LoRA).

    image

  3. Click Configuration file to specify the path of the configuration file. Skip this step if no SS_config.json file is available.

    image

    Note

    The path of the configuration file consists of the Mount Path that you specified in the Configure parameters using a form step, the path of the folder that you created in OSS, and the SS_config.json file. Example: /workspace/KaraDetroit_loar/SS_config.json.

  4. Configure parameters on the Source Model tab. In this example, Save trained model as parameter is set to safetensors, which provides better security than checkpoint.

    image

  5. Configure parameters on the Folders tab. Use the name of the output file and the paths of the Image, Log, and Model folders that you created in OSS.

    image

    Parameter

    Description

    Image folder

    The folder path of the images that you want to use for the training. The path consists of the Mount Path that you specified in the Configure parameters using a form step and the path of the Image folder that you created in OSS. Example: /workspace/KaraDetroit_loar/Image.

    Logging folder

    The folder path of the output logs. The path consists of the Mount Path that you specified in the Configure parameters using a form step and the path of the Log folder that you created in OSS. Example: /workspace/KaraDetroit_loar/Log.

    Output folder

    The folder path of the output model. The path consists of the Mount Path that you specified in the Configure parameters using a form step and the path of the Model folder that you created in OSS. Example: /workspace/KaraDetroit_loar/Model.

    Model output name

    The name of the output model. Example: my_model.

  6. Configure parameters on the Parameters tab. The following example uses the content of the SS_config.json file in the Preparations step.

    Parameter

    Description

    LoRA type

    LoRA type:

    • LoCON: You can adjust each layer of the SD model, such as: Res, Block, and Transformer.

    • LoHA: The model can process more information without the need to increase memory.

    LoRA network weights

    Optional. The weight of the LoRA network. If you want to resume training based on previous training results, select the most recently trained LoRA.

    Train batch size

    The size of the training batch. A larger value requires higher video memory.

    Epoch

    The number of training epochs. All data is trained once in one epoch. Configure the parameter as needed. In most cases:

    • Total number of training sessions in Kohya = Number of images used for training × Number of repetitions × Number of training epochs / Number of training batches.

    • Total number of training sessions in web UI = Number of images used for training × Number of repetitions.

    If you use images in the same directory, the total number of training sessions is multiplied by 2, and the number of times that the model is saved is halved in Kohya.

    Save every N epochs

    The training results are saved every N training epochs. If you set the value to 2, the training results are saved every two epochs of training.

    Caption Extension

    Optional. The file name extension of the caption file. Example: .txt.

    Mixed precision

    The precision for mixed-precision training. Configure the parameter based on the GPU performance. Valid values: no, fp16, and bf16. If the memory of the GPU that you use is larger than 30 GB, we recommend that you set the value to bf16.

    Save precision

    Storage precision is the same as described above.

    Number of CPU threads per core

    The number of threads per vCPU. Configure the parameter based on the instance that you purchased and your business requirements.

    Learning rate

    The learning rate. Default value: 0.0001.

    LR Scheduler

    The learning rate scheduler. Configure the parameter as needed. You can select functions such as cosine or cosine with restart.

    LR Warmup (% Of Steps)

    The warm-up steps of the learning rate. Configure the parameter as needed. The default value is 10. You can set the value to 0 if no warm-up is required.

    Optimizer

    The optimizer. Configure the parameter as needed. The default value is AdamW8bit. The value DAdaptation indicates that automatic optimization is enabled.

    Max Resolution

    The maximum resolution. Configure the parameter based on your image requirements.

    Network Rank (Dimension)

    The complexity of the model. In most cases, you can set the value to 128.

    Network Alpha

    In most cases, the value of this parameter is less than or equal to the value of the Network Rank (Dimension) parameter. We recommend that you set Network Rank to 128 and Network Alpha to 64.

    Conv dims

    & Conv alphas

    The convolution, which indicates the degree to which the model is fine-tuned by LoRA. Configure the parameter based on the LoRA Type.

    Based on the official guide of Kohya:

    • LoCon: Set dim <= 64 and alpha = 1 (or lower).

    • LoHA: Set dim <= 32 and alpha = 1.

    Clip skip

    The number of times the CLIP model is used. Valid values: 1 to 12. A smaller value indicates that the generated image is closer to the image or input image.

    • Realism: Set to 1.

    • Anime, comics, and games (ACG): Set to 2.

    Sample every n steps

    The sample is saved every N epochs.

    Sample prompts

    The sample prompt. Valid parameters:

    • --n: the prompts or negative prompts.

    • --w: the width of the image.

    • --h: the height of the image.

    • --d: the seed of the image.

    • --l: the Classifier Free Guidance (CFG) scale, which indicates the relevance of the image generation to the prompt.

    • --s: the number of iteration steps.

  7. At the bottom of the page, click Start training to begin training. image

  8. On the Elastic Algorithm Service (EAS) page, in the service list, click the corresponding Service Name. Click Service Logs to view the training progress in real time.

    image

    When model saved appears in the log, the training is complete.

    image.png

  9. After the training is completed, obtain the LoRA model file from the directory of the Model folder that you specified. Example: my_model.safetensors.

    image.png

Use a trained LoRA model for AI image generation based on Stable Diffusion

After you train a LoRA model, you can upload the model to the directory of the SD web application. This lets you use the trained LoRA model to generate images. For information about how to deploy SD, see Deploy Stable Diffusion for AI image generation with EAS in a few clicks.

The following section describes how to upload a LoRA model to the SD Web Application.

SD web application (cluster edition)

  1. Configure the SD web application image. You must select a -cluster version (such as stable-diffusion-webui:4.2-cluster-webui). After the service is started, the /data-{User_ID}/models/Lora path is automatically created in the mounted OSS path.

  2. Add the following parameters to Command To Run:

    • --lora-dir: optional.

      • If you do not specify the --lora-dir parameter, the model files of users are isolated. Only the model files in the {OSS path}/data-{User_ID}/models/Lora directory are loaded.

      • If you specify the --lora-dir parameter, the files in the specified directory and the directory {OSS path}/data-{User_ID}/models/Lora are loaded. Example: --lora-dir /code/stable-diffusion-webui/data-oss/models/Lora.

    • --data-dir {OSS mount path}. Example: --data-dir /code/stable-diffusion-webui/data-oss.

  3. Upload the LoRA model file to the {OSS path}/data-{User_ID}/models/Lora directory. Example: oss://bucket-test/data-oss/data-1596******100/models/Lora.

    image.png

    Note

    After the service is started, the /data-{User_ID}/models/Lora path is automatically created in OSS. You must upload the LoRA model file after the service is started.

    You can obtain the {User_ID} in the upper-right corner of the page next to the profile picture.

    image.png

SD web application (basic edition)

  1. Configure the SD web application image. You must select a non-cluster version (such as stable-diffusion-webui:4.2-standard). After the service is started, the /models/Lora path is automatically created in the mounted OSS path.

  2. Add the --data-dir {OSS mount path} parameter to the Command. Example: --data-dir /code/stable-diffusion-webui/data-oss.

  3. Upload the LoRA model file to the {OSS path}/models/Lora directory. Example: oss://bucket-test/data-oss/models/Lora.

    image.png

    Note

    After the service is started, the /models/Lora path is automatically created in the mounted OSS bucket. You do not need to create a path. You must upload the LoRA model file after the service is started.