Binpack Scheduling that supports batch tasks
Why is the Binpack feature needed?
The resource scheduling strategy enabled by Kubernetes by default is LeastRequestedPriority. The node that consumes the least resources will be scheduled first, so that the resource usage of the overall cluster is relatively evenly distributed among all nodes. However, this scheduling strategy often generates more resource fragments on a single node.
Let's take a simple example to illustrate this problem. As shown in the figure below, resources are used equally between nodes, so each node uses 3 GPU cards, and the resources of 1 GPU are left for each of the two nodes. This is a new job that has applied for 2GPU. When it is submitted to the scheduler, the scheduling fails because it cannot provide enough resources.
In the above situation, each node has a GPU card that is idle but cannot be used, resulting in the waste of expensive resources such as the resource GPU. If the resource scheduling strategy used is Binpack, the resource fragmentation problem in the above figure will be solved after the node resources are filled first, and then the next node will be scheduled. Jobs that apply for 2GPU are normally scheduled to the nodes, which improves the resource utilization of the cluster.
Implementation plan
The Binpack implementation has been abstracted into the Score plug-in RequestedToCapacityRatio of the Kubernetes Scheduler Framework, which is used to score nodes in the optimization stage. Score the nodes according to the configuration you define. The specific implementation can be divided into two parts, constructing scoring function and scoring.
Build scoring function
The process of building a scoring function is relatively easy to understand, that is, the user can define the value of the score corresponding to different utilization ratios, so as to affect the scheduling decision-making process.
If the corresponding method set by the user is as follows, that is, if the resource utilization rate is 0, the score is 0 points, and when the resource utilization rate is 100, the score is 10 points, so the higher the resource utilization rate obtained, the higher the score High, this behavior is Binpack's resource allocation method.
The user can also set that when the utilization rate is 0, the score is 10 points, and when the utilization rate is 100, the score is 0 points. This means that the lower the resource utilization, the higher the score. This behavior is the resource allocation method of spreading.
In addition to 2 points, users can also add more points, and the corresponding relationship may not be linear. For example, when the resource utilization rate is 50, the score is 8, and the score will be divided into two intervals: 0- 50 and 50-100.
scoring
Users can define the resources and weight values to be referenced in Binpack calculations, for example, they can only set the values and weights of GPU and CPU.
Then in the scoring process, the utilization rate of the corresponding resource will be obtained by calculating the result of (pod.Request + node.Allocated)/node.Total, and the utilization rate will be brought into the scoring function described above to obtain the corresponding score . Finally, all resources are weighted according to the weight value to obtain the final score.
Binpack uses
Demo demo
Next, we will demonstrate the effect of Binpack by running the distributed job of Tensorflow. The current test cluster has two GPU machines with 4 cards.
Deploy tf-operator in an existing Kubernetes cluster through Kubeflow's arena
Arena is one of the sub-projects in Kubeflow, an open source community for Kubernetes-based machine learning systems. Arena supports the main life cycle management of machine learning tasks (including environment installation, data preparation, model development, model training, model prediction, etc.) in the form of command line and SDK, effectively improving the work efficiency of data scientists.
Check whether the deployment is successful
postscript
Above we introduced how to use Kubernetes' native scheduling policy RequestedToCapacityRatio to support the function of Binpack Scheduling, reduce resource fragmentation, and improve GPU utilization. It is simple to use, but the effect is obvious. Aiming at the topic of improving GPU resource utilization, we will introduce in the next article of this series how to greatly improve GPU utilization through GPU shared scheduling under the inference service.
The resource scheduling strategy enabled by Kubernetes by default is LeastRequestedPriority. The node that consumes the least resources will be scheduled first, so that the resource usage of the overall cluster is relatively evenly distributed among all nodes. However, this scheduling strategy often generates more resource fragments on a single node.
Let's take a simple example to illustrate this problem. As shown in the figure below, resources are used equally between nodes, so each node uses 3 GPU cards, and the resources of 1 GPU are left for each of the two nodes. This is a new job that has applied for 2GPU. When it is submitted to the scheduler, the scheduling fails because it cannot provide enough resources.
In the above situation, each node has a GPU card that is idle but cannot be used, resulting in the waste of expensive resources such as the resource GPU. If the resource scheduling strategy used is Binpack, the resource fragmentation problem in the above figure will be solved after the node resources are filled first, and then the next node will be scheduled. Jobs that apply for 2GPU are normally scheduled to the nodes, which improves the resource utilization of the cluster.
Implementation plan
The Binpack implementation has been abstracted into the Score plug-in RequestedToCapacityRatio of the Kubernetes Scheduler Framework, which is used to score nodes in the optimization stage. Score the nodes according to the configuration you define. The specific implementation can be divided into two parts, constructing scoring function and scoring.
Build scoring function
The process of building a scoring function is relatively easy to understand, that is, the user can define the value of the score corresponding to different utilization ratios, so as to affect the scheduling decision-making process.
If the corresponding method set by the user is as follows, that is, if the resource utilization rate is 0, the score is 0 points, and when the resource utilization rate is 100, the score is 10 points, so the higher the resource utilization rate obtained, the higher the score High, this behavior is Binpack's resource allocation method.
The user can also set that when the utilization rate is 0, the score is 10 points, and when the utilization rate is 100, the score is 0 points. This means that the lower the resource utilization, the higher the score. This behavior is the resource allocation method of spreading.
In addition to 2 points, users can also add more points, and the corresponding relationship may not be linear. For example, when the resource utilization rate is 50, the score is 8, and the score will be divided into two intervals: 0- 50 and 50-100.
scoring
Users can define the resources and weight values to be referenced in Binpack calculations, for example, they can only set the values and weights of GPU and CPU.
Then in the scoring process, the utilization rate of the corresponding resource will be obtained by calculating the result of (pod.Request + node.Allocated)/node.Total, and the utilization rate will be brought into the scoring function described above to obtain the corresponding score . Finally, all resources are weighted according to the weight value to obtain the final score.
Binpack uses
Demo demo
Next, we will demonstrate the effect of Binpack by running the distributed job of Tensorflow. The current test cluster has two GPU machines with 4 cards.
Deploy tf-operator in an existing Kubernetes cluster through Kubeflow's arena
Arena is one of the sub-projects in Kubeflow, an open source community for Kubernetes-based machine learning systems. Arena supports the main life cycle management of machine learning tasks (including environment installation, data preparation, model development, model training, model prediction, etc.) in the form of command line and SDK, effectively improving the work efficiency of data scientists.
Check whether the deployment is successful
postscript
Above we introduced how to use Kubernetes' native scheduling policy RequestedToCapacityRatio to support the function of Binpack Scheduling, reduce resource fragmentation, and improve GPU utilization. It is simple to use, but the effect is obvious. Aiming at the topic of improving GPU resource utilization, we will introduce in the next article of this series how to greatly improve GPU utilization through GPU shared scheduling under the inference service.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00