All Products
Search
Document Center

Platform For AI:Train a LoRA model by using PAI ArtLab

Last Updated:Aug 14, 2025

This topic explains how to train a Low-Rank Adaptation (LoRA) model using Kohya.

Log on to the PAI ArtLab console.

Background information

Stable Diffusion (SD) is an open-source deep learning model that generates images from text. SD WebUI is a web-based user interface for SD that supports text-to-image and image-to-image operations. It is also highly customizable through extensions and model imports.

Generating images with SD WebUI requires various models, each with unique features and applications. Each model requires specific training datasets and strategies. LoRA is a lightweight model fine-tuning method that is fast, produces small files, and has low hardware requirements.

Kohya is a popular open-source service for training LoRA models. The Kohya GUI package provides a dedicated training environment and user interface, which prevents interference from other programs. Although SD WebUI also supports model training through extensions, this method can cause conflicts and errors.

For more information about other model fine-tuning methods, see Models.

Introduction to LoRA models

LoRA (Low-Rank Adaptation of Large Language Models) is a method for training a stylized model based on a foundation model and a dataset. This process enables highly customized image generation.

The file specifications are as follows:

  • File size: Typically ranges from a few to several hundred MB. The exact size depends on the trained parameters and the complexity of the foundation model.

  • File format: Uses .safetensors as the standard file name extension.

  • File application: Must be used with a specific Checkpoint foundation model.

  • File version: You must distinguish between Stable Diffusion v1.5 and Stable Diffusion XL versions. Models are not interchangeable between these versions.

LoRA fine-tuning models

Foundation models, such as the Stable Diffusion v1.5 Model, v2.1 Model, or Stable Diffusion XL base 1.0 Model, can be considered basic ingredients. A LoRA model acts as a special seasoning, adding unique style and creativity. The LoRA model helps overcome the limitations of the foundation model, making content creation more flexible, efficient, and personalized.

For example, the Stable Diffusion v1.5 model has the following limitations:

  1. Imprecise details: When generating images with specific details or complex content, the model may struggle to reproduce all details accurately. This can result in images that lack detail or realism.

  2. Inconsistent logical structure: The layout, proportions, and lighting of objects in generated images may not adhere to real-world principles.

  3. Inconsistent style: The highly complex and random generation process makes it hard to maintain a consistent style or perform neural style transfer reliably.

The open-source community provides many excellent models created by fine-tuning foundation models. Compared to the original foundation models, these fine-tuned models generate images with richer details, more distinct stylistic features, and more controllable content. For example, the following image compares the results of the Stable Diffusion v1.5 model with a fine-tuned model, showing a significant improvement in image quality.

image

Different types of LoRA models

  • LyCORIS (predecessor to LoHa/LoCon)

    LyCORIS is an enhanced version of LoRA that can fine-tune 26 layers of a neural network, compared to LoRA's 17 layers. This results in better performance. LyCORIS is more expressive, has more parameters, and can handle more information than LoRA. The core components of LyCORIS are LoHa and LoCon. LoCon adjusts each level of the SD model, while LoHa doubles the amount of information processed.

    It is used in the same way as LoRA. You can achieve advanced results by adjusting the weights of the text encoder, U-Net, and DyLoRa.

  • LoCon

    Conventional LoRA only adjusts the cross-attention layers. LoCon uses the same method to adjust the ResNet matrix. LoCon has now been merged into LyCORIS, so the old LoCon extensions are no longer needed. For more information, see LoCon-LoRA for Convolution Network.

  • LoHa

    LoHa (LoRA with Hadamard Product) replaces the matrix dot product in the original method with the Hadamard Product. In theory, it can hold more information under the same conditions. For more information, see FedPara Low-Rank Hadamard Product For Communication-Efficient Federated Learning.

  • DyLoRA

    For LoRA, a higher rank is not always better. The optimal value depends on the specific model, dataset characteristics, and the task. DyLoRA can explore and learn various LoRA rank configurations within a specified dimension (rank). This simplifies the process of finding the most suitable rank and improves the efficiency and accuracy of model fine-tuning.

Prepare a dataset

Determine the LoRA type

First, determine the type of LoRA model you want to train, such as a character type or a style type.

For example, you might need to train a style model for Alibaba Cloud 3D product icons based on the Alibaba Cloud Evolving Design language system.image.png

Dataset content requirements

A dataset consists of two types of files: images and corresponding text files for annotation.

Prepare dataset content: Images

  • Image requirements

    • Quantity: 15 or more images.

    • Quality: Moderate resolution and clear image quality.

    • Style: A set of images with a consistent style.

    • Content: The images must highlight the subject to be trained. Avoid complex backgrounds and other irrelevant content, especially text.

    • Size: The resolution must be a multiple of 64, ranging from 512 to 768. If video memory is low, crop images to 512 × 512. If video memory is high, crop images to 768 × 768.

  • Image pre-processing

    • Quality adjustment: A moderate resolution is sufficient as long as the image quality is clear. If the image resolution is low, use the Extras tab in SD WebUI or an image editing tool to increase the resolution.

    • Size adjustment: Use a batch cropping tool to crop the images.

  • Example of prepared images

    image.png

    Store the images in an on-premises folder.

    image.png

Create a dataset and upload files

Before uploading, note the file attribute and naming requirements. If you only use the platform to manage dataset files or annotate images, you can upload files or folders directly. There are no special naming requirements for these files and folders.

To use Kohya on the platform to train a LoRA model after the dataset is annotated, the uploaded files must meet the following attribute and naming requirements.

  • Naming format: Number_CustomName

  • User-defined.

  • For example, if a folder contains 10 images, each image is trained 1500 / 10 = 150 times. In this case, the number in the folder name can be 150. If a folder contains 20 images, each image is trained 1500 / 20 = 75 times. Because 75 is less than 100, the number in the folder name should be set to 100.

  • CustomName: A descriptive name for your dataset. This topic uses 100_ACD3DICON as an example.

  1. Log in to PAI ArtLab and select Kohya (Exclusive Edition) to open the Kohya-SS page.

  2. Create a dataset.

    On the dataset page, click Create Dataset and enter a dataset name. For example, enter acd3dicon.

    image

  3. Upload dataset files.

    Click the name of the dataset you created. Then, drag the prepared image folder from your local computer to the upload area.

    image

    After the upload is successful, the folder appears on the page.

    image.png

  4. Click the folder to view the uploaded images.

    image.png

Prepare dataset content: Image annotations

Image annotation refers to the text description for each image. The annotation file is a TXT file with the same name as the image.

  • Image annotation requirements

    Elements with a clear structural layout, standard perspective, and specific lighting, such as product icons, require a different annotation process than portraits or landscapes. Use basic descriptive annotations. Focus on the simple geometric shapes of the main elements, such as "sphere" or "cube".

    Category

    Keywords

    Service

    Product/Service

    database, cloud security, computing platform, container, cloud-native, etc. (in English)

    Cloud computing elements

    Data processing, Storage, Computing, Cloud computing, Elastic computing, Distributed storage, Cloud database, Virtualization, Containerization, Cloud security, Cloud architecture, Cloud services, Server, Load balancing, Automated management, Scalability, Disaster recovery, High availability, Cloud monitoring, Cloud billing

    Design (Texture)

    Environment & Composition

    viewfinder, isometric, hdri environment, white background, negative space

    Material

    glossy texture, matte texture, metallic texture, glass texture, frosted glass texture

    Lighting

    studio lighting, soft lighting

    Color

    alibaba cloud orange, white, black, gradient orange, transparent, silver

    Emotion

    rational, orderly, energetic, vibrant

    Quality

    UHD, accurate, high details, best quality, 1080P, 16k, 8k

    Design (Atmosphere)

    ...

    ...

  • Add annotations to images

    You can manually add a text description to each image. However, for large datasets, manual annotation is inefficient. To save time, you can use a neural network to generate text descriptions for all images in a batch. In Kohya, you can use the BLIP image annotation model and then manually refine the results to meet your requirements.

Annotate the dataset

  1. On the Kohya-SS page, select Utilities > Captioning > BLIP Captioning.

  2. Select the uploaded image folder in the created dataset.

  3. In the prefix field, enter keywords that will be added to the beginning of each annotation. These keywords should be based on the key features of your dataset images. The annotation features vary for different types of images.

  4. Click Caption Image to start annotating.

    image

  5. In the log at the bottom, you can view the annotation progress and completion status.image.png

  6. Return to the dataset page. A corresponding annotation file now exists for each image.

  7. (Optional) Manually modify any inappropriate annotations.

Train the LoRA model

  1. On the Kohya-SS page, go to LoRA > Training > Source Model.

  2. Configure the following parameters:

    • For Model Quick Pick, select runwayml/stable-diffusion-v1-5.

    • Set Save Trained Model As to safetensors.

    Note

    If you cannot find the model you want in the Model Quick Pick drop-down list, you can select custom and then choose your model. In the custom path, you can find either base models that you added from the Model Gallery to My Models or models that you uploaded locally to My Models.

  3. On the Kohya-SS page, go to LoRA > Training > Folders.

  4. Select the dataset that contains the dataset folder and configure the training parameters.

    image

    Note

    When you annotate dataset files, you select the specific image folder within the dataset. When you train the model, you select the parent dataset that contains this folder.

  5. Click Start training.

    For more information about the parameters, see Frequently used training parameters.

  6. In the log at the bottom, you can view the model training progress and completion status.image.png

Frequently used training parameters

Parameters

Number of images × Repeats × Epochs / Batch size = Total training steps

For example: 10 images × 20 repeats × 10 epochs / 2 (batch size) = 1000 steps.

On the Kohya-SS page, go to LoRA > Training > Parameters to configure the parameters for model training. The following are the common parameters:

  • Basic tab

    image

    Parameter

    Function

    Settings

    repeat

    Number of times to read an image

    Set the number of times to read an image in the folder name. A higher number improves the learning effect. Recommended settings for initial training:

    • Animation and Comics: 7–15

    • Portrait: 20 to 30

    • Real object: 30 to 100

    LoRA type

    LoRA type to use

    Keep the default selection, Standard.

    LoRA network weights

    LoRA network weights

    Optional. To continue training, select the last trained LoRA.

    Train batch size

    Training batch size

    Select a value based on your graphics card performance. The maximum is 2 for 12 GB of video memory and 1 for 8 GB of video memory.

    Epoch

    Number of training rounds. One round is one full training pass over all data.

    Calculate as needed. Generally:

    • Total training steps in Kohya = Number of training images × Repeats × Epochs / Training batch size

    • Total training steps in WebUI = Number of training images × Repeats

    When using category images, the total training steps in Kohya or WebUI are doubled. In Kohya, the number of model saves is halved.

    Save every N epochs

    Save the result every N training epochs

    If set to 2, the training result is saved after every 2 training epochs.

    Caption Extension

    Annotation file name extension

    Optional. The format for annotation/prompt files in the training dataset is .txt.

    Mixed precision

    Mixed precision

    Determined by graphics card performance. Valid values:

    • no

    • fp16 (default)

    • bf16 (can be selected for RTX 30 series or later graphics cards)

    Save precision

    Save precision

    Determined by graphics card performance. Valid values:

    • no

    • fp16 (default)

    • bf16 (can be selected for RTX 30 series or later graphics cards)

    Number of CPU threads per core

    Number of CPU threads per core

    This depends mainly on CPU performance. Adjust it based on the purchased instance and your requirements. You can keep the default value.

    Seed

    Random number seed

    Can be used for image generation verification.

    Cache latents

    Cache latents

    Enabled by default. After training, image information is cached as latents files.

    LR Scheduler

    Learning rate scheduler

    In theory, there is no single best learning point. To find a good hypothetical value, you can generally use Cosine.

    Optimizer

    Optimizer

    The default is AdamW8bit. If you train based on the sd1.5 foundation model, keep the default value.

    Learning rate

    Learning rate

    For initial training, set the learning rate to a value from 0.01 to 0.001. The default value is 0.0001.

    You can adjust the learning rate based on the loss function (loss). When the loss value is high, you can moderately increase the learning rate. If the loss value is low, gradually decreasing the learning rate can help fine-tune the model.

    • A high learning rate speeds up training but can cause overfitting due to rough learning. This means the model adapts too much to the training data and has poor generalization ability.

    • A low learning rate allows for detailed learning and reduces overfitting, but it can lead to long training times and underfitting. This means the model is too simple and fails to capture the data's characteristics.

    LR Warmup (% of steps)

    Learning rate warmup (% of steps)

    The default value is 10.

    Max Resolution

    Maximum resolution

    Set based on the images. The default value is 512,512.

    Network Rank (Dimension)

    Model complexity

    A setting of 64 is generally sufficient for most scenarios.

    Network Alpha

    Network Alpha

    Set a small value. The Rank and Alpha settings affect the final size of the output LoRA.

    Clip skip

    Number of layers to skip in the text encoder

    Select 2 for anime and 1 for realistic models. Anime model training initially skips one layer. If the training material is also anime images, skip another layer for a total of 2.

    Sample every n epochs

    Sample every n training epochs

    Saves a sample every few rounds.

    Sample prompts

    Sample prompts

    Sample of prompts. Requires using a command with the following parameters:

    • --n: Negative prompt.

    • --w: Image width.

    • --h: Image height.

    • --d: Image seed.

    • --l: Prompt relevance (CFG Scale).

    • --s: Iteration steps (steps).

  • Advanced tab

    image

    Parameter

    Function

    Settings

    Clip skip

    Number of layers to skip in the text encoder

    Select 2 for anime and 1 for realistic models. Anime model training initially skips one layer. If the training material is also anime images, skip another layer for a total of 2.

  • Samples tab

    image

    Parameter

    Function

    Settings

    Sample every N epochs

    Sample every N training epochs

    Saves a sample every few rounds.

    Sample prompts

    Sample prompts

    Sample of prompts. Requires using a command with the following parameters:

    • --n: Negative prompt.

    • --w: Image width.

    • --h: Image height.

    • --d: Image seed.

    • --l: Prompt relevance (CFG Scale).

    • --s: Iteration steps (steps).

Loss value

During the LoRA model fine-tuning process, the Loss value is an important metric for evaluating model quality. Ideally, the Loss value gradually decreases as training progresses, which indicates that the model is learning effectively and fitting the training data. A Loss value between 0.08 and 0.1 generally indicates that the model is well-trained. A Loss value around 0.08 suggests that the model training was highly effective.

LoRA learning is a process where the Loss value decreases over time. Assume you set the number of training epochs to 30. If your goal is to obtain a model with a Loss value between 0.07 and 0.09, this target is likely to be reached between the 20th and 24th epochs. Setting an appropriate number of epochs helps prevent the Loss value from dropping too quickly. For example, if the number of epochs is too low, the Loss value might drop from 0.1 to 0.06 in a single step, causing you to miss the optimal range.

image