AutoML is an enhanced machine learning service that is provided by Platform for AI (PAI). It integrates multiple algorithms and distributed computing resources. If you use AutoML, you do not need to write code. You can create experiments to fine-tune model hyperparameters and improve the efficiency and performance of machine learning. This topic describes how to create an experiment.
Background information
How AutoML works:
An experiment generates multiple hyperparameter combinations based on the configured algorithm. The experiment creates a trial for each hyperparameter combination. Each trial may correspond to one Deep Learning Containers (DLC) job or one or more MaxCompute jobs. The job type varies based on the execution configuration of the experiment. The system runs a trial based on the configured job. The experiment schedules and runs multiple trials and compares the results of these trials to find the optimal hyperparameter combination. For more information about how AutoML works, see How AutoML works.
Prerequisites
The permissions required to use AutoML are granted to your account. This prerequisite must be met if you use AutoML for the first time.
The following operations are performed before you create a DLC job:
MaxCompute resources are prepared and associated with the created workspace. This prerequisite must be met if you want to create a MaxCompute job.
Procedure
Go to the AutoML page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane, choose
.
On the AutoML page, click Create Experiment.
On the Create Experiment page, configure the parameters.
Parameters in the Basic Information section
Parameter
Description
Name
The name of the experiment. You can specify this parameter as prompted.
Description
The brief description of the experiment that you want to create. The description is used to distinguish between different experiments.
Visibility
The visibility of the experiment. Valid values:
Visible to Me: The experiment is visible to your account and the administrator in the current workspace.
Visible to Current Workspace: The experiment is visible to all users in the workspace.
Parameters in the Execution Configurations section
Job Type: the execution environment of the trial. You can select DLC or MaxCompute.
DLC: DLC jobs are executed for hyperparameter fine-tuning. For more information about DLC jobs, see Create a training job.
MaxCompute: SQL commands or PAI commands of Machine Learning Designer components are run by consuming MaxCompute computing resources to perform hyperparameter fine-tuning. For more information about Machine Learning Designer components and the PAI commands supported by each component, see Component reference: Overview of all components.
DLC
If you select DLC for Job Type, configure the following parameters.
Parameter
Description
Resource Group
The public resource group or a dedicated resource group that you have purchased. For more information about how to prepare a resource group, see Create a dedicated resource group and purchase general computing resources and Lingjun resource quotas.
Framework
The supported framework. Valid values:
Tensorflow
PyTorch
Datasets
The datasets that you prepared.
Code
The repository where the code file of the job is stored. In this example, you must specify the repository of the code file that you prepared.
NoteDLC downloads code to a specified working directory. Therefore, you must have the permissions to access the code repository.
Node Image
The image used by the worker nodes. Valid values:
Alibaba Cloud Image: images provided by Alibaba Cloud PAI. PAI images support different types of resources, Python versions, and deep learning frameworks (TensorFlow and PyTorch). For more information about PAI images, see Public images.
Custom Image: custom images that you add to PAI. Before you select a custom image, you must add the custom images to PAI.
Image Address: the address of a custom, community, or PAI image. If you select Image Address, you must also specify the URL of the Docker registry image that you want to access over the Internet.
Instance Type
The instance type that is required to run the job. The prices of instances vary based on their types. For billing details of each instance type, see Billing of DLC.
Nodes
The number of compute nodes on which the DLC job is executed.
ImportantIf you configure multiple nodes, each node is separately billed instead of being billed based on the same instance type. When you specify this parameter, you must determine the cost of each node and take the compromise between costs and performance into account.
vCPUs
If you select a purchased dedicated resource group from the Resource Group drop-down list, you can specify these parameters based on the specifications of the purchased resources.
Memory (GiB)
Shared Memory (GiB)
GPUs
Advanced Settings
Advanced settings help you improve training flexibility or adapt to specific training scenarios. If you use the PyTorch framework, you can configure advanced settings.
Node Startup Command
The command that is run to start a node. You must specify
${Custom hyperparameter variables}
in the command to configure hyperparameter variables. Example:python /mnt/data/examples/search/dlc_mnist/mnist.py --data_dir=/mnt/data/examples/search/data --save_model=/mnt/data/exmaples/search/model/model_${exp_id}_${trial_id} --batch_size=${batch_size} --lr=${lr} --metric_filepath=/mnt/data/examples/search/metric/metric_${exp_id}_${trial_id}
In the preceding command,
${batch_size}
and${lr}
are the hyperparameter variables that you define.Hyperparameter
The hyperparameter list is automatically loaded based on the hyperparameter variables configured in the startup command. You must specify Constraint Type and Search Space for each hyperparameter.
Constraint Type: the constraint that is imposed on the hyperparameter. You can move the pointer over the
icon next to Constraint Type to view the supported constraint types and relevant descriptions.
Search Space: the value range of the hyperparameter. The method to configure a search space varies based on the constraint type of a hyperparameter. You can click the
icon and add a value as prompted.
MaxCompute
If you select MaxCompute for Job Type, configure the following parameters.
Parameter
Description
Command
The SQL command or PAI command of a specific Machine Learning Designer component. You must specify
${Custom hyperparameter variables}
in the command to configure hyperparameter variables. Example:pai -name kmeans -project algo_public -DinputTableName=pai_kmeans_test_input -DselectedColNames=f0,f1 -DappendColNames=f0,f1 -DcenterCount=${centerCount} -Dloop=10 -Daccuracy=0.01 -DdistanceType=${distanceType} -DinitCenterMethod=random -Dseed=1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id} -DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id} -DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};
In the preceding command,
${centerCount}
and${distanceType}
are the hyperparameter variables that you define.For more configuration examples, see Appendix: References in this topic.
Hyperparameter
The hyperparameter list is automatically loaded based on the hyperparameter variables configured in the command. You must specify Constraint Type and Search Space for each hyperparameter.
Constraint Type: the constraint that is imposed on the hyperparameter. You can move the pointer over the
icon next to Constraint Type to view the supported constraint types and relevant descriptions.
Search Space: the value range of the hyperparameter. The method to configure a search space varies based on the constraint type of a hyperparameter. You can click the
icon and add a value as prompted.
Parameters in the Trial Configuration section
If you need to run a job by using a specific hyperparameter combination, configure the following parameters.
Parameter
Description
Metric Type
The type of the metric that is used to evaluate the trial. Valid values:
summary: The final metric values are extracted from the Tensorflow summary file that is obtained from Object Storage Service (OSS).
table: The final metric values are extracted from a MaxCompute table.
stdout: The final metric values are extracted from stdout in the running process.
json: The final metric values are stored in OSS as JSON files.
Method
The method that is used to calculate the final metric values after multiple intermediate metric values are gradually generated during the job execution process. Valid values:
final: The last metric value is used as the final metric value of an entire trial.
best: The optimal metric value that is obtained during the job execution process is used as the final metric value of an entire trial.
avg: The average value of all intermediate metric values that are obtained during the job execution process is used as the final metric value of an entire trial.
Metric Weight
If you need to optimize multiple metrics at the same time, you can configure the names and weights of the metrics. The system then uses the weighted sum value as the final metric value.
key: the name of a metric. Regular expressions are supported.
value: the weight of a metric.
NoteThe weight can be a negative value, and the sum of weights can be a value other than 1. You can configure a custom value.
Metric Source
The metric source.
If you select summary or json from the Metric Type drop-down list, you must configure a file path. Example:
oss://examplebucket/examples/search/pai/model/model_${exp_id}_${trial_id}
.If you select table from the Metric Type drop-down list, you must configure an SQL statement that can return a specific result. Example:
select GET_JSON_OBJECT(summary, '$.calinhara') as vrc from pai_ft_cluster_evaluation_out_${exp_id}_${trial_id}
.If you select stdout from the Metric Type drop-down list, you must configure a command keyword. You must set this parameter to
cmdx
orcmdx;xxx,such as cmd1;worker
.
Optimization
The optimization goal that is used to evaluate the trial result. Valid values:
Maximize
Minimize
Model Storage Path
The path where the model is stored. The path must contain
${exp_id}_${trial_id}
to distinguish between models that are generated by using different hyperparameter combinations. Example:oss://examplebucket/examples/search/pai/model/model_${exp_id}_${trial_id}
.Parameters in the Search Configurations section
Parameter
Description
Search Algorithm
The automated machine learning algorithm. Based on the algorithm, the system finds the optimal hyperparameter combination for the running of the next trial based on the hyperparameter search space and the execution results and metrics of the completed trial. Valid values:
TPE
Random
GridSearch
Evolution
GP
PBT
For more information about the search algorithms, see the "Supported search algorithms" section in Limits and usage notes of AutoML.
Maximum Trials
The maximum number of trials that can be run in the experiment.
Maximum Concurrent Trials
The maximum number of trials that can be concurrently run in the experiment.
Click Submit.
You can view the created experiment in the experiment list.
What to do next
You can view the experiment details at any time to obtain the progress of the experiment. You can view the execution result of each trial to obtain the optimal hyperparameter combination.
You can manage experiments.
Appendix: References
Configuration example for fine-tuning hyperparameters of MaxCompute jobs:
Machine Learning Designer components: K-means Clustering and Clustering Model Evaluation.
The following code shows the configurations of the cmd1 and cmd2 commands that are used for the two components. The two commands are listed in the execution sequence. For the detailed procedure, see Best practice for running the K-means Clustering component.
cmd1
pai -name kmeans -project algo_public -DinputTableName=pai_kmeans_test_input -DselectedColNames=f0,f1 -DappendColNames=f0,f1 -DcenterCount=${centerCount} -Dloop=10 -Daccuracy=0.01 -DdistanceType=${distanceType} -DinitCenterMethod=random -Dseed=1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DidxTableName=pai_kmeans_test_output_idx_${exp_id}_${trial_id} -DclusterCountTableName=pai_kmeans_test_output_couter_${exp_id}_${trial_id} -DcenterTableName=pai_kmeans_test_output_center_${exp_id}_${trial_id};
cmd2
PAI -name cluster_evaluation -project algo_public -DinputTableName=pai_cluster_evaluation_test_input -DselectedColNames=f0,f1 -DmodelName=pai_kmeans_test_output_model_${exp_id}_${trial_id} -DoutputTableName=pai_ft_cluster_evaluation_out_${exp_id}_${trial_id};