AutoML in Platform for AI (PAI) automates hyperparameter search across multiple trials. Define a search space and let AutoML find the optimal hyperparameter combination—no tuning code required.
How it works
Each experiment runs multiple trials in parallel or sequence. For each trial, AutoML selects a hyperparameter combination from your configured search space, then submits either a Deep Learning Containers (DLC) job or one or more MaxCompute jobs to train the model. After all trials complete, AutoML compares evaluation metrics across trials and identifies the optimal combination.
For a deeper explanation, see How AutoML works.
Prerequisites
Before you begin, make sure you have:
Granted the permissions required to use AutoML (required for first-time use)
If you plan to use DLC jobs, also complete the following:
If you plan to use MaxCompute jobs, also complete the following:
Create an experiment
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace you want to manage.
In the left-side navigation pane, choose Model Training > AutoML.
On the AutoML page, click Create Experiment.
On the Create Experiment page, configure the following sections:
Click Submit.
The experiment appears in the experiment list.
Basic information
Parameter | Description |
Name | The name of the experiment. |
Description | A brief description to distinguish this experiment from others. |
Visibility | Who can see the experiment. Visible to Me: visible to your account and workspace administrators. Visible to Current Workspace: visible to all users in the workspace. |
Execution configurations
Select a Job Type: DLC or MaxCompute.
DLC
DLC jobs use Deep Learning Containers to run the training process. Configure the following parameters:
Parameter | Description |
Resource Group | The resource group for running DLC jobs. Select a public resource group or a dedicated resource group you have purchased. See Create a dedicated resource group and purchase general computing resources and Create resource quotas. |
Framework | The training framework. Valid values: Tensorflow, PyTorch. |
Datasets | The datasets to use for training. |
Code | The code repository containing the training script. DLC downloads code to a specified working directory, so make sure you have access to the repository. |
Node Image | The container image for worker nodes. Options: Alibaba Cloud Image (PAI-provided images for different resource types, Python versions, and deep learning frameworks—see Public images), Custom Image (images you have added to PAI), or Image Address (a Docker registry URL for a custom, community, or PAI image). |
Instance Type | The compute instance type for the job. Pricing varies by instance type—see Billing of Deep Learning Containers (DLC). |
Nodes | The number of compute nodes. Each node is billed separately, so factor in per-node costs when setting this value. |
vCPUs | Available when you select a dedicated resource group. Set based on the purchased resource specifications. |
Memory (GiB) | Available when you select a dedicated resource group. Set based on the purchased resource specifications. |
Shared Memory (GiB) | Available when you select a dedicated resource group. Set based on the purchased resource specifications. |
GPUs | Available when you select a dedicated resource group. Set based on the purchased resource specifications. |
Advanced Settings | Additional settings for PyTorch jobs to improve training flexibility. See Configure advanced settings. |
Node Startup Command | The command to start each node. Include |
Hyperparameter | The hyperparameter list, auto-populated from the variables in your startup command. For each hyperparameter, set Constraint Type and Search Space. |
Node startup command example
Reference hyperparameter variables using ${variable_name} syntax. In the following example, ${batch_size} and ${lr} are the hyperparameter variables AutoML searches over:
python /mnt/data/examples/search/dlc_mnist/mnist.py \
--data_dir=/mnt/data/examples/search/data \
--save_model=/mnt/data/exmaples/search/model/model_${exp_id}_${trial_id} \
--batch_size=${batch_size} \
--lr=${lr} \
--metric_filepath=/mnt/data/examples/search/metric/metric_${exp_id}_${trial_id}After you enter the command, AutoML automatically loads batch_size and lr into the Hyperparameter list. Set the Constraint Type and Search Space for each.
Constraint Type: the constraint applied to the hyperparameter. Hover over the
icon next to Constraint Type to view the supported types and their descriptions.Search Space: the value range for the hyperparameter. The configuration method varies by constraint type. Click the
icon and add values as prompted.
MaxCompute
MaxCompute jobs run SQL commands or PAI commands from Machine Learning Designer components to perform hyperparameter tuning. Configure the following parameters:
Parameter | Description |
Command | The SQL command or PAI command to run. Include |
Hyperparameter | The hyperparameter list, auto-populated from the variables in your command. For each hyperparameter, set Constraint Type and Search Space. |
Command example
In the following example, ${centerCount} and ${distanceType} are the hyperparameter variables:
pai -name kmeans
-project algo_public
-DinputTableName=pai_kmeans_test_input
-DselectedColNames=f0,f1
-DappendColNames=f0,f1
-DcenterCount=${centerCount}
-Dloop=10
-Daccuracy=0.01
-DdistanceType=${distanceType}
-DinitCenterMethod=random
-Dseed=1
-DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id}
-DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id}
-DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id}
-DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};Constraint Type: the constraint applied to the hyperparameter. Hover over the
icon next to Constraint Type to view the supported types and their descriptions.Search Space: the value range for the hyperparameter. The configuration method varies by constraint type. Click the
icon and add values as prompted.
Trial configuration
The trial configuration defines how AutoML evaluates each trial's result.
Parameter | Description |
Metric Type | The source and format of the evaluation metric. Valid values: |
Method | How to compute the final metric from intermediate values logged during the job. Valid values: |
Metric Weight | When optimizing multiple metrics simultaneously, configure a name and weight for each metric. AutoML uses the weighted sum as the final score. The weight can be negative, and the weights do not need to sum to 1. |
Metric Source | The location or identifier of the metric data. Configuration varies by Metric Type—see the table below. |
Optimization | Whether to maximize or minimize the final metric. Valid values: Maximize, Minimize. |
Model Name | The path where the trained model is saved. The path must include |
Metric source configuration by metric type
Metric type | Metric source format | Example |
| OSS file path |
|
| SQL statement that returns the metric value |
|
| Command keyword | Set to |
Search configurations
Parameter | Description |
Search Algorithm | The algorithm AutoML uses to select the next hyperparameter combination based on prior trial results. See Supported search algorithms. |
Maximum Trials | The total number of trials to run in the experiment. |
Maximum Concurrent Trials | The number of trials that can run at the same time. Higher concurrency speeds up exploration but reduces the benefit of sequential algorithms like TPE and GP, which rely on prior results to guide the next selection. If you use a sequential algorithm such as TPE or GP, set this to 1 or a small number to preserve the benefit of guided search. |
Supported search algorithms
AutoML supports six search algorithms. Choose based on your dataset size, compute budget, and whether you want sequential or parallel exploration.
Algorithm | When to use |
TPE (Tree-structured Parzen Estimator) | Good default choice. Uses prior trial results to focus the search on promising regions of the hyperparameter space. Best for sequential exploration with a limited trial budget. Set Maximum Concurrent Trials to a low value (1–2) to preserve the benefit of guided search. |
Random | Use when running a large number of parallel trials, or as a baseline. Each trial is independent, so it scales well with high concurrency. |
GridSearch | Use when the search space is small and discrete. Exhaustively evaluates every combination. |
Evolution | Use when the search space is large and complex. Applies evolutionary strategies to iteratively improve the hyperparameter population. |
GP (Gaussian Process) | Similar to TPE—uses a probabilistic model to balance exploration and exploitation. Suitable for small to medium search spaces. Like TPE, works best with low concurrency. |
PBT (Population Based Training) | Use for training jobs that support periodic checkpointing. PBT dynamically reallocates resources and updates hyperparameters during training, rather than between trials. |
For full details, see the "Supported search algorithms" section in Limits and usage notes of AutoML.
What's next
View experiment details to track progress and identify the optimal hyperparameter combination from trial results.
Manage experiments to clone, stop, or delete experiments.
Appendix: References
The following example shows two chained MaxCompute commands for K-means clustering hyperparameter tuning, using the K-means Clustering and Clustering model evaluation components. The commands run in the listed order. For the full procedure, see Best practices for MaxCompute k-means clustering.
cmd1 — K-means clustering
pai -name kmeans
-project algo_public
-DinputTableName=pai_kmeans_test_input
-DselectedColNames=f0,f1
-DappendColNames=f0,f1
-DcenterCount=${centerCount}
-Dloop=10
-Daccuracy=0.01
-DdistanceType=${distanceType}
-DinitCenterMethod=random
-Dseed=1
-DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id}
-DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id}
-DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id}
-DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};cmd2 — Clustering model evaluation
PAI -name cluster_evaluation
-project algo_public
-DinputTableName=pai_cluster_evaluation_test_input
-DselectedColNames=f0,f1
-DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id}
-DoutputTableName=pai_ft_cluster_evaluation_out_${exp_id}_${trial_id};