Community Blog Learning about Distributed Systems - Part 6: Saving Costs through Resource Scheduling

Learning about Distributed Systems - Part 6: Saving Costs through Resource Scheduling

Part 6 of this series discusses how to save costs through resource scheduling.

Disclaimer: This is a translated work of Qinxia's 漫谈分布式系统. All rights are reserved to the original author.

Section 1

As mentioned in the previous two articles on distributed storage, We need to save the data and do so cost-effectively.

Massive amounts of data equal massive costs. Similarly, it is not enough to calculate fast, and it needs to push the computing performance as much as possible to save costs.

Different from the distributed storage engine, which focuses on the ways of storage, distributed computing frameworks try to optimize resource scheduling because the space for computing logic optimization is limited. Also, the application layer needs to be considered. Distributed computing frameworks contributed to the creation of a series of Resource Managers (such as the Google Borg and K8s and Apache YARN and Mesos).

Section 2

In the past, each company had its own dedicated servers and clusters for different businesses. This is easy to understand, as each business hopes to operate independently without interference from other programs and security hazards.

However, the business load differs dynamically, with regular peaks and troughs. Irregular business grows and declines, and frequent expansion and contraction follow. In order to reduce the impact of the changing workload on the business caused by expansion and contraction, there is generally a margin of resources.

The so-called margin, to put it another way, is a waste.

Therefore, how to use these wasted resources flexibly and in a timely manner and reduce the maintenance cost caused by expansion and contraction along the way are the main problems to be solved by various resource managers.

Section 3

The most intuitive fix is to mix. Computing machines will no longer be exclusive to a certain business but shared by everyone. When Jack's business load is not occupied, Harry can use it, and vice versa. This way, the overall utilization rate will come up.

This is called multi-tenancy.

How do multi-tenants share resources? If everyone is submitting their tasks, how should resource scheduling be done?

The simplest way is the queued first-in-first-out (FIFO). After the previous tasks are finished and the resources are released, the resources can be allocated to the following tasks.

The benefit of multi-tenancy is the increase in the overall resource rate, which is beneficial to all. However, the disadvantage is that it sacrifices the individual. That means all of them will be impacted in the end.

How can we ensure the needs of individual resources?

Section 4

There is only one way to ensure the needs of individuals, isolation.

Therefore, various resource scheduling systems proposed concepts (such as pool and queue) to allocate resources logically. You can set a quota of computing resources for each pool and only allow a certain business to use this pool. Nested and multi-layered pools also facilitate isolation within the business.

After isolation, resources can no longer be shared, and the overall utilization rate drops.

Therefore, it cannot be hard-isolated, only soft-isolated. The quota for each pool is dynamically balanced and supports borrowing and returning.

If the borrowed share is not returned for a long time (such as a resource being borrowed by a program that runs for three days), should we wait for three days?

So, preemption comes into play. “Excuse me, I can't wait that long, so please return the part that exceeds your quota immediately.” In order not to cause a lot of missions to hang up, preemption is only activated when exceeding a certain percentage and/or amount of time.

Setting a resource limit for the pool means you can only use the set resources.

There are various schedulers (such as Capacity Scheduler and Fair Scheduler). After learning from each other and improving, they gradually converge.

  • For example, Fair Scheduler also supports setting weight quotas, so relative fairness is feasible fairness.
  • For example, the fairness strategy has been improved from prioritizing tasks with low memory usage to a Dominant Resource Fairness (DRF) strategy that considers memory and CPU.
  • Another example is how Google's Borg supports the concept of priority to support the streaming batch mixing department.
  • A third example is how different allocation strategies are used on weekdays and weekends, and business peaks are used to effectively utilize resources.

All of them are doing trade-offs between improving the overall utilization rate and ensuring individual quotas.

Section 5

The solutions mentioned above are ideas that avoid the structure and principle of the specific framework. Since those are all implementations, they will be mentioned later when necessary. You need to understand that the process of designing ideas and solving problems is more important than the implementation.

Having said so much, it is all technical. How do we set up so many mechanisms and parameters?

Let's take quotas as an example.

This is a problem of resource allocation. Just like resource allocation in real life, one needs to have convincing rules. Then, all businesses will discuss and come to the same conclusion (which is usually hard) or let the boss decide.

If you are in an infrastructure team (such as a data platform), remember you are only an administrator of resources, and you are holding management responsibilities, not distribution rights.

You have to explain the rules, use technical means to ensure the overall utilization, and provide various indicators to help with decision-making. Don't go overboard and let yourself be on the cusp.

Below is a brief list of some indicators worthy of attention and reference. (This list is incomplete, just for reference.)

  • The overall computing resource utilization trend of the platform
  • The trend of the number of running and pending tasks on the platform as a whole
  • The total amount of computing resources used by each business/queue, including CPU and memory, or replace it with computing as a whole like Alibaba Cloud (it is better to convert the currency to RMB, which is more impactful)
  • Distribution of computing resource utilization ratio by business/queue
  • Trends in the number of running and pending tasks in each business/queue
  • Amount and frequency of preempted/preempted resources of each business/queue

Section 6

Here are two previous examples from our operating system to illustrate how to use some indicators to guide the allocation of resources.

The following figure intercepts the distribution of the overall utilization rate of each queue in a certain time interval.


The figure above shows that most of the queues have a usage rate of less than 50% most of the time, indicating that the quota setting at that time is inappropriate, and idle resources should be allocated to the busier queues.

The following figure shows the details of the calculation resource usage on some queues. The calculation formula is actual resource usage/resource quota.


The figure shows that the first queue constantly has insufficient resources, so it uses two times the quota through preemption, but the last queue only uses a little over half the quota (on average). It is because there is a situation like the last queue, a queue that doesn't use up its resource, which makes the first queue possible to be preempted.

The preemption is a lagging process and will be constrained by the upper limit of the hard cap, resulting in low overall utilization. Therefore, the solution should be to reduce the quota for the last queue and increase the quota for the first queue.

In summary, from the first graph, we know that the allocation of resources is unreasonable, and there is a general phenomenon of not having enough and having more than enough. Furthermore, from the second graph, we know which queues don't have enough resources to use and which one does, so we know how to adjust the quota.

Now, we only need to figure out how to implement it into our business scenarios.


  • Distributed computing frameworks try to optimize the scheduling of resources to save costs.
  • The way to improve the overall utilization of resources is to mix parts and use resources off-peak.
  • Co-location will cause an influence among each other, so resource isolation is required.
  • Isolation will lead to limited resource borrowing, so soft isolation is required.
  • There is a time limit for resource borrowing, so it is necessary to support preemption.
  • Resource borrowing requires a capacity limit, so a hard cap must be set.
  • There is no absolute fairness. Weighted and DRF are both a compromise and an improvement
  • The platform department's role should be an administrator and does not have the right to allocate resources but should provide sufficient indicators to assist in decision-making


In summary, after reading the recent articles, you should have a basic understanding of distributed storage engines and distributed computing frameworks.

You should understand why there is a distributed system, how the distributed system is designed, and how to use the distributed system to save costs.

However, there is no free lunch in the world, and with benefits, there must be problems. Distributed systems are far from being operated hands-free while waiting for the API to be adjusted.

In the following articles, we will take a look at a series of problems introduced by the distributed system and how to solve them. Stay tuned.

This is a carefully conceived series of 20-30 articles. I hope to give everyone a core grasp of the distributed system in a story-telling way. Stay tuned for the next one!

0 1 0
Share on

Alibaba Cloud_Academy

61 posts | 47 followers

You may also like