×
Community Blog Databricks Data Insight Open Course - Use Databricks + MLFlow to Train and Deploy Machine Learning Models

Databricks Data Insight Open Course - Use Databricks + MLFlow to Train and Deploy Machine Learning Models

This article describes how to use Databricks and MLflow to build a machine learning lifecycle management platform.

This article describes how to use Databricks and MLflow to build a machine learning lifecycle management platform and implement the entire process from data preparation, model training, parameter and performance metrics tracking, and model deployment.

By Li Jingui (Development Engineer of Alibaba Cloud Open-Source Big Data Platform)

Pain Points of MLFlow

1

There are three pain points in MLFlow:

  1. It is difficult to track machine learning experiments. There are a large number of configurable parameters in the machine learning algorithm. When doing machine learning experiments, it is difficult to track which parameters and versions of code and data will produce a specific result.
  2. The results of machine learning experiments are difficult to reproduce. There is no standard way to package the environment. Even with the same code, parameters, and data, it is difficult to reproduce the experimental results because the experimental results also depend on the code library used.
  3. There is no standard way to manage the lifecycle of models. Algorithm teams usually create a large number of models that need to be managed by a central platform, especially metadata information (such as the stage of the model's version and annotations), what code, data, and parameters generate the model, and what the model's performance metric is. There is no uniform way to deploy these models.

MLFlow was created to solve the pain points in machine learning workflow. It can implement the whole process of experiment parameter tracking, environment packaging, model management, and model deployment through simple API.

The First Core Function of MLFlow: MLFlow Tracking

2

It can track learning-based experimental parameters, model performance metrics, and various files of the model. It is usually necessary to record some parameter configurations and model performance metrics during machine learning experiments. MLFlow can help users avoid manual recording. It can record parameters (and any file), including models, pictures, and source code.

As shown in the code on the left side of the preceding figure, an experiment can be started using the start_run of MLFlow. log_param can record the parameter configuration of the model. log_metric can record the performance metrics of the model, including scalar performance metrics and vector performance metrics. log_model can record the t rained model. log_artifact can record any file you want to record. For example, the source code is recorded in the figure.

The Second Core Feature of MLFlow: MLFlow Project

3

It packages the training code based on the code specification and specifies the execution environment, execution entry, and parameters to reproduce the experimental results. Moreover, this standard packaging method can facilitate code sharing and platform migration.

As shown in the preceding figure, the MLFlow -training project contains two important files: content.yaml and MLproject. The content.yaml file specifies the runtime environment of the project, which contains all the code libraries it depends on and the versions of these code libraries. The MLproject specifies the runtime environment, which is conda.yaml. It specifies the runtime entry, which is how to run the project. The entry information contains the corresponding runtime parameters, which are alpha and l1_ratio.

In addition, MLFlow provides command-line tools that enable users to run MLFlow projects easily. For example, if the project is packaged and uploaded to the Git repository, the user only needs to run the MLFlow run command to execute the project and transmit the alpha parameter through -P.

The Third Core Feature of MLFlow: MLFlow Models

4

It supports packaging records and deploying multiple algorithmic framework models in a unified manner. After the model is trained, you can use the log_model of MLFlow to record the model. MLFlow automatically stores the model on your local computer or OSS. Then, you can view the relationship between the model and the code version, parameters and metrics, and the storage path of the model on the MLFlow WebUI.

In addition, MLFlow provides an API to deploy the model. After using the MLFlow models serve to deploy the model, you can obtain the predicted result by the rest API to call the model.

The Fourth Core Feature of MLFlow: MLFlow Registry

5

MLFlow stores models and provides WebUI to manage models. The WebUI page displays the version and stage of the model. The detail page of the model displays the description, label, and schema of the model. The label of the model can retrieve and label the model. The schema of the model is used to represent the format of the model input and output. In addition, MLFlow establishes the relationship between the model and the operating environment, code, and parameters, namely, the lineage of the model.

The four core features of MLFlow have solved the pain points in the machine learning workflow. They can be divided into three aspects:

  1. MLFlow Tracking solves the problem when machine learning experiments are difficult to track.
  2. The MLFlow Project solves the problem when there is no standard way to package the environment in machine learning workflow, which makes the experimental results difficult to reproduce.
  3. MLFlow Models and Model Registry solve the problem where there is no standard way to manage the lifecycle of models.

Demo

Next, I will describe how to use MLFlow and DDI to build a machine learning platform to manage its lifecycle.

6

As you can see in the architecture diagram, the main components include DDI clusters, OSS, and ECS. The DDI cluster is responsible for some machine learning training. It needs to start an ECS to build the MLFlow tracking server to provide the UI interface. In addition, you need to install MySQL on the ECS to store metadata (such as training parameters, performance, and tags). OSS is used to store training data and model source code.

Please watch the demonstration video (only available in Chinese):
https://developer.aliyun.com/live/248988

0 1 0
Share on

Alibaba EMR

57 posts | 5 followers

You may also like

Comments

Alibaba EMR

57 posts | 5 followers

Related Products