All Products
Search
Document Center

Platform For AI:Train a LoRA model in PAI ArtLab

Last Updated:May 27, 2026

Train LoRA models with Kohya in PAI ArtLab to generate customized Stable Diffusion images.

Log on to the PAI ArtLab console.

Overview

Stable Diffusion (SD) is an open-source text-to-image model. SD WebUI provides a web interface for text-to-image and image-to-image generation with extensions and model support.

SD WebUI supports multiple models, each trained on specific datasets. LoRA is a lightweight fine-tuning method: fast training, small output files, and low hardware requirements.

Kohya is an open-source LoRA training tool with a standalone GUI, avoiding extension conflicts in SD WebUI.

Other fine-tuning methods: Models.

LoRA models

LoRA (Low-Rank Adaptation) trains stylized models on top of foundation models, enabling highly customized image generation.

File specifications:

  • File size: A few to several hundred MB, depending on trained parameters and foundation model complexity.

  • File format: .safetensors.

  • File application: Requires a specific Checkpoint foundation model.

  • File version: SD v1.5 and SDXL models are not interchangeable.

Why use LoRA

Foundation models (SD v1.5, v2.1, SDXL base 1.0) provide baseline generation. LoRA adds targeted style and creativity, enabling more personalized content.

For example, SD v1.5 has these limitations:

  1. Imprecise details: Struggles with specific details and complex content, producing less realistic images.

  2. Inconsistent logical structure: Object layout, proportions, and lighting may not adhere to real-world principles.

  3. Inconsistent style: The random generation process makes consistent style transfer difficult.

Fine-tuned community models produce richer details and more controllable styles than raw foundation models:

image

LoRA model types

  • LyCORIS (predecessor to LoHa/LoCon)

    LyCORIS fine-tunes 26 neural network layers (vs. LoRA's 17), offering greater expressiveness. Its core components, LoCon and LoHa, adjust SD model levels and double information capacity respectively.

    Used like LoRA, with additional text encoder, U-Net, and DyLoRA weight adjustments.

  • LoCon

    Conventional LoRA adjusts only cross-attention layers. LoCon extends this to the ResNet matrix. LoCon has been merged into LyCORIS, making old LoCon extensions obsolete. LoCon-LoRA for Convolution Network.

  • LoHa

    LoHa (LoRA with Hadamard Product) replaces the matrix dot product with the Hadamard Product, theoretically holding more information under the same conditions. FedPara Low-Rank Hadamard Product For Communication-Efficient Federated Learning.

  • DyLoRA

    Optimal LoRA rank varies by model, dataset, and task. DyLoRA explores multiple rank configurations within a specified dimension, simplifying rank selection and improving fine-tuning efficiency.

Prepare a dataset

Determine the LoRA type

Determine the LoRA model type to train, such as character or style.

For example, train a style model for Alibaba Cloud 3D product icons based on the Alibaba Cloud Evolving Design language system:image.png

Dataset content requirements

A dataset consists of images and corresponding text annotation files.

Prepare dataset content: Images

  • Image requirements

    • Quantity: 15 or more images.

    • Quality: Moderate resolution, clear quality.

    • Style: Consistent style across all images.

    • Content: Highlight the training subject. Avoid complex backgrounds and text.

    • Size: Resolution must be a multiple of 64 (512-768). Crop to 512×512 for low GPU memory, 768×768 for high GPU memory.

  • Image pre-processing

    • Quality adjustment: Use moderate resolution. Upscale low-resolution images with the SD WebUI Extras feature or other tools.

    • Size adjustment: Use batch cropping tools to crop images.

  • Example of prepared images

    image.png

    Store the images in an on-premises folder.

    image.png

Create a dataset and upload files

Note the naming requirements before uploading. If you only use the platform to manage or annotate files, no special naming is needed.

For Kohya LoRA training, uploaded files must follow this naming convention:

  • Naming format: Number_CustomName

  • Number: User-defined.

  • For example, if a folder contains 10 images, each image is trained 1500 / 10 = 150 times. The folder name number can be 150. If a folder contains 20 images, each image is trained 1500 / 20 = 75 times. Since 75 < 100, set the folder name number to 100.

  • CustomName: A descriptive dataset name. This topic uses 100_ACD3DICON as an example.

  1. Log in to PAI ArtLab and select Kohya (Exclusive Edition) to open the Kohya-SS page.

  2. Create a dataset.

    On the dataset page, click Create Dataset and enter a dataset name. This example uses acd3dicon.

    image

  3. Upload dataset files.

    Click the dataset name, then drag the image folder from your computer to the upload area.

    image

    After upload, the folder appears on the page.

    image.png

  4. Click the folder to view the uploaded images.

    image.png

Prepare dataset content: Image annotations

Each image needs a TXT annotation file with the same name.

  • Image annotation requirements

    For elements with clear structure and standard perspective (e.g., product icons), use basic descriptive annotations focusing on simple geometric shapes like "sphere" or "cube".

    Category

    Keywords

    Service

    Product/Service

    database, cloud security, computing platform, container, cloud-native, etc. (in English)

    Cloud computing elements

    Data processing, Storage, Computing, Cloud computing, Elastic computing, Distributed storage, Cloud database, Virtualization, Containerization, Cloud security, Cloud architecture, Cloud services, Server, Load balancing, Automated management, Scalability, Disaster recovery, High availability, Cloud monitoring, Cloud billing

    Design (Texture)

    Environment & Composition

    viewfinder, isometric, hdri environment, white background, negative space

    Material

    glossy texture, matte texture, metallic texture, glass texture, frosted glass texture

    Lighting

    studio lighting, soft lighting

    Color

    alibaba cloud orange, white, black, gradient orange, transparent, silver

    Emotion

    rational, orderly, energetic, vibrant

    Quality

    UHD, accurate, high details, best quality, 1080P, 16k, 8k

    Design (Atmosphere)

    ...

    ...

  • Add annotations to images

    Manual annotation works but is slow for large datasets. Kohya's BLIP model batch-generates text descriptions that you can refine.

Annotate the dataset

  1. On the Kohya-SS page, choose Utilities > Captioning > BLIP Captioning.

  2. Select the uploaded image folder in the created dataset.

  3. In the prefix field, enter keywords to prepend to each annotation based on key features of your dataset images.

  4. Click Caption Image to start annotating.

    image

  5. View annotation progress in the log at the bottom.image.png

  6. Return to the dataset page. Each uploaded image now has a corresponding annotation file.

  7. (Optional) Manually modify any inappropriate annotations.

Train the LoRA model

  1. On the Kohya-SS page, go to LoRA > Training > Source Model.

  2. Configure the following parameters:

    • Model Quick Pick: runwayml/stable-diffusion-v1-5

    • Save trained model as: safetensors

    Note

    If your desired model is not in the Model Quick Pick dropdown list, select custom and then select your model. Base models added from the Models to My Models and base models locally uploaded to My Models can be found in the custom path.

  3. On the Kohya-SS page, go to LoRA > Training > Folders.

  4. Select the dataset that contains the dataset folder and configure the training parameters.

    image

    Note

    For annotation, select the image folder within the dataset. For training, select the parent dataset containing this folder.

  5. Click Start training.

    Frequently used training parameters.

  6. View training progress in the log at the bottom.image.png

Training parameters

Common parameters

Number of images × Repeats × Epochs / Batch size = Total training steps

For example: 10 images × 20 repeats × 10 epochs / 2 (batch size) = 1000 steps.

On the Kohya-SS page, go to LoRA > Training > Parameters to configure training:

  • Basic tab

    image

    Parameter

    Function

    Settings

    repeat

    Number of times to read an image

    Set in the folder name. Higher values improve learning. Recommended initial values:

    • Animation and Comics: 7–15

    • Portrait: 20 to 30

    • Real object: 30 to 100

    LoRA type

    LoRA type to use

    Keep the default: Standard.

    LoRA network weights

    LoRA network weights

    Optional. To continue training, select the last trained LoRA.

    Train batch size

    Training batch size

    Based on GPU memory: max 2 for 12 GB, max 1 for 8 GB.

    Epoch

    Number of training rounds. One round is one full training pass over all data.

    Calculate as needed. Generally:

    • Total training steps in Kohya = Number of training images × Repeats × Epochs / Training batch size

    • Total training steps in WebUI = Number of training images × Repeats

    When using category images, the total training steps in Kohya or WebUI are doubled. In Kohya, the number of model saves is halved.

    Save every N epochs

    Save the result every N training epochs

    If set to 2, the training result is saved after every 2 training epochs.

    Caption Extension

    Annotation file name extension

    Optional. Annotation file format: .txt.

    Mixed precision

    Mixed precision

    Determined by graphics card performance. Valid values:

    • no

    • fp16 (default)

    • bf16 (can be selected for RTX 30 series or later graphics cards)

    Save precision

    Save precision

    Determined by graphics card performance. Valid values:

    • no

    • fp16 (default)

    • bf16 (can be selected for RTX 30 series or later graphics cards)

    Number of CPU threads per core

    Number of CPU threads per core

    Adjust based on CPU performance. Default is sufficient for most cases.

    Seed

    Random number seed

    Can be used for image generation verification.

    Cache latents

    Cache latents

    Enabled by default. Caches image data as latent files after training.

    LR Scheduler

    Learning rate scheduler

    No single best learning rate exists. Cosine is a good general-purpose choice.

    Optimizer

    Optimizer

    Default: AdamW8bit. Keep the default for SD 1.5-based training.

    Learning rate

    Learning rate

    For initial training, set to 0.01–0.001. Default: 0.0001.

    Adjust based on the loss function: increase when loss is high, decrease when loss is low.

    • High learning rate: faster training but risks overfitting (poor generalization).

    • Low learning rate: finer learning and less overfitting, but longer training and risk of underfitting.

    LR Warmup (% of steps)

    Learning rate warmup (% of steps)

    Default: 10.

    Max Resolution

    Maximum resolution

    Default: 512,512. Adjust based on image size.

    Network Rank (Dimension)

    Model complexity

    64 is sufficient for most scenarios.

    Network Alpha

    Network Alpha

    Set a small value. Rank and Alpha affect the output LoRA file size.

    Clip skip

    Number of layers to skip in the text encoder

    Select 2 for anime and 1 for realistic models. Anime model training initially skips one layer. If the training material is also anime images, skip another layer for a total of 2.

    Sample every n epochs

    Sample every n training epochs

    Saves a sample every few rounds.

    Sample prompts

    Sample prompts

    Prompts for sample generation. Use these parameters:

    • --n: Negative prompt.

    • --w: Image width.

    • --h: Image height.

    • --d: Image seed.

    • --l: Prompt relevance (CFG Scale).

    • --s: Iteration steps (steps).

  • Advanced tab

    image

    Parameter

    Function

    Settings

    Clip skip

    Number of layers to skip in the text encoder

    Select 2 for anime and 1 for realistic models. Anime model training initially skips one layer. If the training material is also anime images, skip another layer for a total of 2.

  • Samples tab

    image

    Parameter

    Function

    Settings

    Sample every N epochs

    Sample every N training epochs

    Saves a sample every few rounds.

    Sample prompts

    Sample prompts

    Prompts for sample generation. Use these parameters:

    • --n: Negative prompt.

    • --w: Image width.

    • --h: Image height.

    • --d: Image seed.

    • --l: Prompt relevance (CFG Scale).

    • --s: Iteration steps (steps).

Loss value

Loss measures model quality during training and should decrease gradually. A value between 0.08 and 0.1 indicates good training; around 0.08 is highly effective.

With 30 epochs, Loss typically reaches 0.07-0.09 between epochs 20 and 24. Sufficient epochs prevent Loss from dropping too fast (e.g., 0.1 to 0.06), missing the optimal range.

image