This article will focus on the machine learning (ML) pipeline of machine/deep learning infrastructure operations (MLOps).
The following content and technical points will be introduced in this article:
- The definition of the requirements for ML pipelines in the production environment
- The Serverless ML Pipeline solution GitHub based on Alibaba Cloud Serverless Workflow (FnF) and Function Compute (FC)
- The combination guidance of FC and Alibaba Cloud Container Service for Kubernetes (ACK) provided by GitHub. Introductions to task triggering, prediction and inference service deployment, and expose services are also provided.
- Analysis and comparison of this solution and similar solutions. The Serverless ML Pipeline can improve R&D efficiency, reduce O&M costs, and help ML generate value faster.
- Discussion on the selection of the ML infrastructure. FC can complement Kubernetes clusters.
As the commercial values of machine/deep learning have risen, the software technologies for ML have also changed each day. Concepts, such as training, model, algorithm, predictions, inference, together with software frameworks, such as Spark MLlib and Tensorflow, are frequently referenced. The Jupyter notebook can be used on a local machine to call Tensorflow to train tens of thousands of images. After continuous parameter adjustments, the output model inference/prediction results are accurate. There is a figure in the paper published in the NIPS entitled, Hidden Technical Debt in Machine Learning Systems, which shows one thing very accurately. In the process of generating commercial value, the workloads of MLOps are much larger than the core development of machine learning for the development of peripheral settings for machine learning. MLOps in the production environment vary depending on the business scenario. Additionally, it involves many modules, which cannot be explained in one article. This article will focus on one of the modules, the machine learning pipeline. This article will introduce how to use Alibaba Cloud Serverless services to improve the efficiency of R&D and O&M and automatically convert algorithms to trained models. By doing so, it is expected to finally generate business value after testing and approval through prediction/inference in the production environment.
Note: This a figure from Hidden Technical Debt in Machine Learning Systems
Scenario Abstraction and Problem Definition
As shown in the following figure, the complex MLOps is abstracted and simplified into a closed feedback loop, including algorithm development, model build, training, serving, testing and approval, and release.
Although the logic is clear and simple, the pipeline system must meet the following requirements before being used in the production environment:
Support for Long-Term Execution: Model training time ranges from minutes to hours (sometimes longer) according to the data volume and algorithm execution time.
Process Visualization: The technology stack used by engineers from ML/data scientist is different from what is used by DevOps engineers. The logic description needs to be decoupled from implementation details through visualization.
State Observability: ML/data scientist and DevOps engineers need to communicate, coordinate, and cooperate in different steps of the pipeline and observe the progress of the process.
Rich Description Capability: Compared with the CI/CD pipeline, the ML pipeline has more flexible and complex business logic. For example, results are needed to determine whether to enter the next step or to loop back to the previous step.
No Event Loss: Pipeline events, such as code push, build, start, and end time of deployment, can be received and processed again in the case of various machine crashes or process exceptions.
Steps Retest: All tasks of the pipeline can encounter random failure. Retests for steps can improve the success rate of the entire process.
State Persistence: In the case of a machine crash or process exception, the pipeline re-execution can continue to execute from the latest successful step without restarting.
High Availability, Low Latency, and Extensibility: No more repeated descriptions
Cost-Effectiveness: Compared with computing resources, pipeline orchestration occupies a low proportion of the total costs.
Low O&M Cost: Compared with computing resources, pipeline orchestration occupies a low proportion of the total costs.
The concept of pipeline and the requirements above are not novel. Some open source solutions are also widely used, such as Jenkins, which is commonly used in CI/CD systems, workflow engine Airflow, and Uber Cadence. However, there is no popular ML pipeline solution on Alibaba Cloud platforms. This article introduces a solution that combines Alibaba Cloud Serverless cloud services with ACK.
This article assumes that ACK clusters or self-built Kubernetes clusters based on Elastic Compute Service (ECS) are used for training and prediction/inference. The Fashion MNIST dataset is used in training. The prediction service accepts the image and produces corresponding prediction results, such as clothes, hats, and shoes. There are different Docker images for training and serving. FnF and FC are used in the pipeline orchestration. Both FC and FnF are Serverless cloud services. They are fully hosted, free of maintenance, and charged based on usage. They can also be unlimitedly scaled. The pipeline logic diagram is below:
Image building of Alibaba Cloud Container Registry (ACR) is triggered by algorithms, parameter adjustment, or other code modifications.
FC HTTP trigger is triggered by the webhook configured after the image is successfully built, and the function implements the execution of the function workflow process.
- FnF calls the FC function to send a request to the Kubernetes apiserver in user VPC to create jobs. The apiserver config is obtained through the DescribeClusterUserKubeconfig interface of ACR. After the model training is completed, the model file is uploaded to OSS.
- FnF calls the FC function to send a request to the Kubernetes apiserver to create deployments. Kubernetes enables the container to download the model from OSS and monitors local port 8501 to receive prediction and inference requests. The serving service cannot be accessed from outside.
- FnF calls the FC function to send a request to the Kubernetes apiserver to create a service and specify the service spec type, LoadBalancer. Kubernetes creates an SLB that can be accessed from the public network based on the spec and then generates an Internet IP to accept HTTP requests.
- FnF calls FC to send an HTTP Restful API call to the serving service to determine the prediction accuracy of the model. After the test is completed, manual approval will be initiated.
- If the manual approval is passed, the solution continues to deploy models in the production environment, as shown in Steps 3, 4, and 5.
- Complete the deployment according to the README instruction of the GitHub awesome-fnf/machine-learning-pipeline project.
- Execute the process from the Function Workflow Console. See the Github README.md for "input." Click the following explainer video about the pipeline effect (in Chinese):
As mentioned earlier, FC and FnF are used to coordinate the ML pipeline. What are the advantages of FC and FnF compared with existing open source workflow engines? The answer lies in the capabilities required by the ML pipelines. The following table is the comparative analysis of capabilities listed in the "Question Definition" section:
Compared with the open source workflow/pipeline solution, the FnF and FC solution meets more of the requirements for ML pipelines in the production environment mentioned above. It also has the following outstanding features:
- High availability, low latency, and unlimited horizontal extensibility
Native Integration With Alibaba Cloud Services: FnF is the first Alibaba Cloud product that focuses on the workflow. It covers orchestration services of almost all Alibaba Cloud products through native integration with services, such as FC and MNS. With its flexible FDL description, it meets a variety of simple and complex workflow scenarios.
Cost-Effectiveness: No pipeline service clusters are left unused, which suggests the highest resource utilization.
Low O&M Cost: It is fully-hosted and free of O&M. The R&D and delivery efficiency are the highest. There are relatively more machine learning model training and other computing resources, such as Kubernetes clusters and GPU instances. The O&M pressure is relatively high. The ML pipeline systems are separated from primary computing resources through Serverless services. This not only simplifies the deployment of Kubernetes clusters but also saves O&M for pipeline systems. Thus, more time can be used in model training and algorithm optimization.
The Combination of Flexible Description and Process Visualization: The ML pipelines described above are implemented in an excessively simplified way. The process logic in real-world scenarios is more complex. For example, if a step fails, the system needs to automatically roll back to the previous step. Compared with DAG, the FnF FDL description is more flexible and can adapt to more complex pipeline scenarios. The built-in visualization feature reduces the difficulty in process development and enhances the observability during operation.
In the solution mentioned in this article, the training and prediction stages are implemented through jobs and deployments on Kubernetes. The Kubernetes cluster contains relatively more resources with a common problem of low utilization rate. One of the advantages of the FnF and FC solution is its extremely high flexibility, as shown in the following section:
FC Prediction Service: For users who have high requirements for cost control and resource utilization, prediction services can be exposed with the FC HTTP trigger. Thus, the number of Kubernetes cluster nodes can be reduced, and the Serverless prediction service can be available.
FC Model Training: If the model training is fast and needs no GPU resources, the training tasks can also be completed by FC. This way, Serverless ML is truly implemented.
FC Data Flush and ETL Preprocessing: In scenarios where data needs to be flushed in large scale or map-reduce operations, the FnF and FC solution can be used for data flush and ETL preprocessing efficiently and reliably.
Heterogeneous Computing Resources: For the training steps of FnF, Elastic Container Instance (ECI) tasks can also be initiated by FC. Training steps can also be submitted to Alibaba Cloud Machine Learning Platform for PAI.
More features are expected to be developed by developers.