AutoML is available only in Machine Learning Studio. This topic describes the algorithms used by AutoML for automatic parameter tuning and how to use AutoML.

Background information

Procedure

  1. Log on to the Machine Learning Platform for AI (PAI) console and go to Machine Learning Studio for a project. For more information about how to go to Machine Learning Studio, see Use DataWorks tasks to schedule experiments in Machine Learning Studio.
  2. In the left-side navigation pane, click Experiments and select an experiment.

    The Air Quality Prediction experiment is used in this example.

  3. On the tab that appears, choose Auto ML > Auto Parameter Tuning.
  4. In the Auto Parameter Tuning dialog box, set the Select Algorithm parameter and click Next.
    Select Algorithm
    Note If multiple algorithms are used in the experiment, select only one algorithm.
  5. In the Configure Parameter Tuning step, set the Parameter Tuning Method parameter and click Next.
    PAI provides the following tuning methods:
    • EVOLUTIONARY_OPTIMIZER
      1. Randomly selects A parameter candidate sets. A is the value of the Exploration Samples parameter.
      2. Takes the N parameter candidate sets that have higher evaluation metrics as the parameter candidate sets of the next iteration.
      3. Continues the exploration within R times as the standard deviation range around these parameters to explore new parameter sets. R is the value of the Convergence Coefficient parameter. The new parameter sets replace the (A - N) parameter sets that have lower evaluation metrics in the previous round. (A - N) is the result of A minus N.
      4. Iterates the exploration for M rounds based on the preceding logic until the optimal parameter set is found. M is the value of the Explorations parameter.

      Based on the preceding principle, the final number of models is calculated by using the following formula: A + (A - N) × M.

      Important The first value of N is A/2 - 1. During iteration, the default value is N/2 - 1. If the result of N/2 - 1 is a decimal number, the number is rounded up.
      Configure Parameter Tuning 1
      • Data Splitting Ratio: the ratio of training sets to evaluation sets. Input data sources are divided into training and evaluation sets. A value of 0.7 indicates that 70% of data is used for model training and 30% for evaluation.
      • Exploration Samples: the number of parameter sets for each iteration. A greater value indicates higher accuracy and larger calculation. This parameter must be set to a positive integer in the range of 5 to 30.
      • Explorations: the number of iterations. A greater value indicates higher accuracy and larger calculation. This parameter must be set to a positive integer in the range of 1 to 10.
      • Convergence Coefficient: the convergence coefficient used to tune exploration ranges. A smaller value indicates faster convergence. However, optimal parameters may be missed. This parameter must be set to a value in the range of 0.1 to 1.
      • Custom Range: the custom tuning range for a single parameter. If no range is specified for a parameter, the default range of the parameter is used, and this parameter is not included in automatic parameter tuning.
    • RANDOM_SEARCH

      1. Randomly selects a value for each parameter within the parameter range.
      2. Enters random values into a set of parameters for model training.
      3. Performs M rounds and then sorts the output models. M is the value of the Iterations parameter.
      Configure Parameter Tuning 2
      • Iterations: the number of searches in the specified range. This parameter must be set to a positive integer in the range of 2 to 50.
      • Data Splitting Ratio: the ratio of training sets to evaluation sets. Input data sources are divided into training and evaluation sets. A value of 0.7 indicates that 70% of data is used for model training and 30% for evaluation.
      • Custom Range: the custom tuning range for a single parameter. If no range is specified for a parameter, the default range of the parameter is used, and this parameter is not included in automatic parameter tuning.
    • GRID_SEARCH

      1. Splits the value range of each parameter into N segments. N is the value of the Grids parameter.
      2. Randomly selects a value from each of the N segments. If M parameters exist, N^M parameter sets can be combined.
      3. Generates N^M models by training based on the N^M parameter sets and then sorts the models.
      Configure Parameter Tuning 3
      • Grids: the number of split grids. This parameter must be set to a positive integer in the range of 2 to 10.
      • Data Splitting Ratio: the ratio of training sets to evaluation sets. Input data sources are divided into training and evaluation sets. A value of 0.7 indicates that 70% of data is used for model training and 30% for evaluation.
      • Custom Range: the custom tuning range for a single parameter. If no range is specified for a parameter, the default range of the parameter is used, and this parameter is not included in automatic parameter tuning.
    • UserDefine

      Configure Parameter Tuning 4

      Custom Range: The system calculates the score based on all combinations of the parameter values that you specify. If no range is specified for a parameter, the default range of the parameter is used.

    Note In PAI V2.0, the number of parameter tuning algorithms increases from 4 to 7. The following table describes the algorithms.
    Algorithm Description
    GAUSE This algorithm indicates Gaussian processes. Gaussian processes are examples of Bayesian nonparametric models and are widely used for hyperparameter tuning. Gaussian processes allow the system to observe hyperparameter configuration performance to fit the proxy model, and strengthen decision-making by using model prediction. This way, the system can purposefully select the appropriate hyperparameter results from a limited number of attempts.
    SAMPLE This algorithm is developed by the PAI team and DAMO Academy. If large amounts of data are processed, this algorithm allows the system to estimate the final result that a set of hyperparameters can obtain based on partial data. The SAMPLE algorithm, in combination with the population-based training (PBT) algorithm, allows the system to gradually improve the sampling ratio with the increase in the number of hyperparameters. This way, broader and faster explorations can be performed.
    EVOLUTIONARY_OPTIMIZER This algorithm is developed by the PAI team based on the techniques of the PBT algorithm. In the EVOLUTIONARY_OPTIMIZER algorithm logic, parameter tuning is considered a multi-round iteration to progressively explore an optimal solution. The Exploration Samples parameter specifies the number of samples in each iteration. The Explorations parameter specifies the number of iterations. The Convergence Coefficient parameter specifies the step size of each iteration. The EVOLUTIONARY_OPTIMIZER algorithm allows the system to discard the exploration samples that have unsatisfactory results after each iteration. Then, the system expands exploration samples in the set of exploration samples that have satisfactory results to form the sample set for the next iteration. This process is repeated until all the iterations are complete.
    PBT PBT is an evolutionary algorithm based on the concept of population. In the PBT algorithm logic, hyperparameter configurations are considered a population and the search process is considered a dynamic environment. In iterations, this algorithm allows the system to filter hyperparameter configurations based on the principle of survival of the fittest. This way, the hyperparameter configurations with the best performance can be obtained. This algorithm is simple and adapts to different data structures to help you obtain optimal results in deep learning model training.
    GRID_SEARCH This algorithm allows the system to equally split the parameters that are used in parameter tuning into several sets based on a specific ratio. Then, the system randomly combines the split parameter sets to generate parameter candidate sets for calculation and comparison.
    RANDOM_SEARCH The RANDOM_SEARCH algorithm allows the system to randomly obtain and combine samples from each parameter space to form parameter candidate sets for calculation and comparison.
    UserDefine Users define parameter sets.
  6. In the Configure Model Output step, set model output parameters and click Next.
    Configure Model Output
    • Evaluation Criteria: Valid values are AUC, F1-score, Precision, and Recall.
    • Saved Models: Valid values are 1, 2, 3, 4, and 5. The system ranks models based on the value of the Evaluation Criteria parameter that you set and saves the top-ranked models based on the value of the Saved Models parameter.
    • Pass Down Model: By default, this switch is turned on. If the switch is turned off, the models generated by the default parameters of the current component are passed down to the node of the subsequent component. If the switch is turned on, the optimal model generated by automatic parameter tuning is passed down to the node of the subsequent component.
  7. After the configuration is complete, click Run in the upper-left corner of the canvas.
    After the preceding configuration is complete, the Auto ML switch of the related algorithm is turned on. You can turn the switch off or on as needed.
  8. Optional: Right-click the model component and select Edit AutoML Parameters to modify AutoML parameters.

Result

Model output:

  1. During parameter tuning, right-click the model component and select Parameter Tuning Details.
  2. In the AutoML-Parameter Tuning Details dialog box, click the Indicator Data tab to view the tuning progress and the status of each model. Parameter Tuning Details
  3. Sort candidate models based on metrics. The metrics include AUC, F1 Score, PRECISION, and RECALL.
  4. In the View Details column, click Log or Parameters to view the logs or parameters of each candidate model. Algorithm parameters

Display of parameter tuning effects:

You can view the growth trend of the evaluation metrics of updated parameters in the Hyperparameter Iteration Result Comparison chart.

Hyperparameter Iteration Result Comparison

Model storage:

  1. In the left-side navigation pane, click Models.
  2. Click Experiment Models.
  3. Click the required experiment folder to view the model saved by using AutoML.