Deploy, fine-tune, evaluate, and process data for LLMs on PAI. Choose a workflow based on your goal: one-click deployment through Model Gallery, custom training with DSW and DLC, or large-scale distributed training on Lingjun.
Deploy models
PAI provides multiple deployment paths. Use Model Gallery for one-click deployment of popular open-source models, or use Elastic Algorithm Service (EAS) for custom deployment with advanced configurations such as accelerated inference engines and auto-scaling.
Model Gallery: one-click deployment
Model Gallery supports deploying models with built-in inference optimization. Each tutorial covers the full workflow from deployment to API invocation. Select a model series:
-
Quick start: Deploy, fine-tune, and evaluate Qwen3 models - Deploy using SGLang, vLLM, or BladeLLM. Includes online debugging and API invocation examples.
-
Deploy, fine-tune, and evaluate QwQ-32B - A reasoning model optimized for math, coding, and scientific reasoning tasks.
-
Deploy, fine-tune, and evaluate Qwen2.5 models - Multiple sizes (0.5B to 72B) with improvements in coding, math, and structured data handling.
-
Train, evaluate, compress, and deploy Qwen2.5-Coder models - Specialized for code generation, completion, and reasoning. Supports training, quantization, and deployment.
-
Train, evaluate, and deploy DistilQwen2 models - A distilled model series that reduces model size while preserving performance through knowledge distillation.
-
Deploy and fine-tune Llama 3 series models - Meta's open-source models trained on 15+ trillion tokens. Supports SFT and DPO fine-tuning algorithms.
-
Deploy and fine-tune a Mixtral-8x7B MoE model - A sparse Mixture-of-Experts (MoE) model that activates only 2 of 8 experts per token for efficient inference.
EAS: custom deployment
-
Quickly deploy LLMs in EAS - Deploy open-source LLMs via EAS with standard or accelerated inference modes. Supports WebUI and API access.
Lingjun: distributed serving at scale
-
Fully managed Qwen on Lingjun - End-to-end workflow for distributed training, three-stage instruction tuning, offline inference, and online deployment of Qwen models (7B to 72B) on serverless GPU clusters.
Fine-tune and train models
Adapt pre-trained LLMs to your domain or task. Model Gallery provides a no-code fine-tuning interface. For more control, use DSW notebooks for parameter-efficient fine-tuning (PEFT) or DLC for distributed full-parameter training.
Model Gallery: no-code fine-tuning
The Model Gallery tutorials listed in the Deploy models section also cover fine-tuning for each model. Select a model series above to get started.
DSW and DLC: custom training
-
Fine-tune a Llama3-8B model - Use PEFT techniques in a DSW notebook for cost-effective domain adaptation while preserving the base model's capabilities.
Advanced training techniques
-
Continued pre-training for LLMs - Adapt models to specific domains using unlabeled text data. Unlike fine-tuning (supervised), continued pre-training uses unsupervised learning to extend a model's domain knowledge.
-
Data augmentation and model distillation for LLMs - Transfer knowledge from large teacher models to smaller student models. Combines data augmentation, instruction refinement, and distillation to create efficient models that preserve performance.
Evaluate models
Compare the performance of foundation models, fine-tuned versions, and quantized versions to determine which meets your requirements. PAI supports automated evaluation using custom or public datasets such as MMLU and C-Eval.
-
Best practices for LLM evaluation - Set up evaluation tasks with 10+ NLP metrics across custom and public benchmarks. Compare model variants side by side.
Process training data
Machine Learning Designer provides algorithms for processing text, video, and image data to improve training data quality. Built-in templates are available and can be extended through secondary development.
Text data
-
Clean GitHub code data for LLM training - Deduplicate, filter, and transform raw GitHub repository data into clean training samples.
-
LLM data processing: Wikipedia web text - Process Wikipedia dumps for pre-training: extract, clean, and deduplicate web-crawled text.
-
LLM data processing: arXiv thesis data - Clean and prepare academic papers from arXiv for scientific domain pre-training.
-
LLM data processing: Alpaca-CoT SFT data - Process instruction-following datasets in Alpaca format for supervised fine-tuning.
-
LLM data processing: Alpaca-CoT SFT data (DLC) - Run the Alpaca-CoT SFT data pipeline on DLC for large-scale distributed processing.
-
LLM data processing: GitHub code (DLC) - Run the GitHub code data pipeline on DLC for large-scale distributed processing.
Image and video data
-
Filter images and generate captions for model training - Automatically filter low-quality images and generate descriptive captions for multimodal model training.
-
Filter and label video data for model training - Clean, filter, and label video data with metadata extraction for video understanding model training.
Build LLM applications
Apply fine-tuned LLMs to production use cases. These tutorials demonstrate end-to-end workflows from data preparation through deployment.
-
Develop an LLM-based intent recognition solution - Build an intent recognition system for voice assistants or customer service chatbots. Covers data labeling with iTAG, fine-tuning Qwen1.5, evaluation, and service deployment.