Five Capabilities Accelerate Enterprise FinOps Progress

Cloud native technology and cost reduction and efficiency enhancement

In 2020, the COVID-19 epidemic will sweep across the world. A large number of enterprises will stop working, factories will stop production, and supply chain will be interrupted, which will have a huge impact on the global economy. 65% of enterprises have begun to consider improving their IT informatization capabilities through cloud access to address other systemic risks that may arise in the future. Cloud native technology, as the most advanced way to go to the cloud at present, has become the best choice for most enterprises to conduct IT information transformation.

According to the "Cloud Native Comes of Age" survey conducted by Capgemini, a well-known consulting firm, in 2020, only 15% of enterprises have established new applications in a cloud native environment, but this proportion will increase to 52% in the next three years. In the report, enterprises that deploy more than 20% of applications in cloud native environments are defined as leaders. How do they view cloud native technology?

87% of enterprises surveyed said that cloud native improved efficiency and reduced costs. 84% of enterprises surveyed said that cloud native has promoted a better customer experience. 80% of the enterprises surveyed said that the waiting time for the implementation of new products and services has significantly decreased.

In the 2021 CNCF "FinOps Kubernetes Report" research report, after migrating to the Kubernetes platform, 68% of respondents said that their enterprise's computing resource costs have increased, while 36% of respondents said that costs have soared by more than 20%. Even with the cost reduction and efficiency enhancement features commonly recognized by most leading enterprises, many enterprises still face numerous obstacles and even pay more costs in the process of cloud native transformation. Why is it that cloud native technology has been adopted, but it is still so far from the ideal?

Starting from a real case

Raymond is the head of the IT platform for an internet e-commerce company. Over the past two years, he has led the team to transform all of the company's businesses into cloud based biotransformation. Raymond's original intention of choosing cloud native technology as the platform architecture method is very simple, because cloud native technologies such as microservices, containers, and DevOps can uniformly deliver and maintain different types of applications, reducing management costs; Automated construction and delivery can be achieved through assembly lines to improve the speed of research and development; Container technology can be used to achieve resource sharing and flexibility between applications, reducing resource waste; You can further exploit the utilization rate of cluster resources through mixing and preemption between different types of applications.

Raymond's team is responsible for the stable operation of the company's five major platforms. Based on business characteristics, ease of operation and maintenance, security levels, and cost considerations, Raymond divides its business into three clusters:

• Cluster A - master station/transcoding cluster

The service stability requirements of the master station are high. The entire cluster planning is mainly based on static node pools, with the ability to scale regularly and expand capacity in advance before the arrival of business peaks. When the capacity is low during the day, the space of the cluster is time-shared and reused through mixed transcoding services, thereby achieving resource efficiency improvement.

• Cluster B-Live/Big Data Set

The reason for placing live streaming services and big data services in a cluster is that both ad hoc queries on data lakes, live streaming services, and ETL jobs on big data consume significant computing resources per unit of time. However, there is a large randomness in the size of the service capacity, and highly resilient scenarios are more suitable for both services.

• Cluster C - Micro merchant cluster

The main reason for placing micro merchant businesses in a cluster independently is to isolate tenant data from business data for security reasons. In addition, independent clusters can also provide better cost accounting.

As a very senior expert in the cloud native field, Raymond's technology selection, cluster splitting, and optimization strategies are impeccable. In the first month of cloud native business, everything seems to be moving towards the expected results, stable and efficient.

"Last month's expenses increased by 70%?" Raymond muttered to himself after receiving the latest bill, wondering what the problem was?

Difficulties in enterprise cloud native IT cost governance

Previously, Raymond's team adopted a more traditional and mature static enterprise IT cost governance model. The cycle of this model is usually monthly or quarterly. Through the implementation of four stages: resource planning, cost estimation, cost budgeting, and cost control, IT assets are purchased to achieve the goal of enterprise IT cost governance.

The advantage of this model is that the cost budget obtained from each IT cost governance is fixed, and it is very friendly from the perspective of IT asset management. However, the disadvantages are also obvious. When there are frequent changes in business capacity, it may cause significant deviations in the cost estimation stage, resulting in a large amount of waste.

The methods commonly used in cloud native technology to reduce costs and increase efficiency, such as intelligent scheduling, elastic scaling, mixing, and time-sharing preemption, essentially turn the exclusive enjoyment of resources into sharing, and turn the static supply of resources into dynamic. The adoption of any new technology will inevitably transform and optimize the architecture of existing systems, while the introduction of dynamic transformation of cloud native technology architecture often breaks the traditional IT cost management system in enterprises, Cause IT cost management to get out of control. When IT cost management is out of control, various optimization strategies become rootless trees.

When Raymond tried to find the clues to the problem through the bill, he got a monthly bill with hundreds of pages of details. It was almost impossible to trace the applications and departments that caused abnormal expenses from the bill details. The challenges Raymond has encountered are challenges that almost every person responsible for cloud native architecture must overcome.

So, what causes the difficulty of enterprise cloud native IT cost governance?

• Differences in the lifecycle of business units and billing units

In traditional enterprise IT cost governance models, there is a certain matching relationship between business units and billing units. For example, a portal site includes two ECSs, an access layer gateway SLB, and a database RDS. Its business unit and billing unit are one-on-one, and the bill is the cost.

However, in a cloud native scenario, when the application is deployed in a container cluster such as Kubernetes, all resources are pooled, and the minimum unit of measurement for the business is a Pod. However, the lifecycle of the Pod does not match the lifecycle of the node that actually generates the bill. In most scenarios, as the application is redeployed, the Pod of the service will be rescheduled to other nodes, which may result in a one-to-one matching relationship between the business unit and the billing unit in the three dimensions of logic, space, and time.

This leads to the difficulty of obtaining specific results when business departments of enterprises want to measure, plan, and estimate the budget of a business.

• Conflicts between dynamic resource delivery and static capacity planning

In traditional enterprise IT governance models, the relationship between planning/budgeting and resource delivery is static. Business departments can submit budgets based on monthly, quarterly, and annual cycles, and then IT departments can conduct unified procurement and allocation. In order to solve the problem of resource waste in static capacity planning models, containers have adopted technologies and solutions such as elastic scaling. Control capacity costs through dynamic resource delivery.

However, dynamic resource delivery models may introduce other cost traps in actual production. Typically, traditional static planning models mostly use a monthly package billing method, while dynamic resource delivery models mix multiple models such as monthly package and pay-as-you-go. Even in some scenarios, special payment strategies such as Saving Plan, reserved instance coupons, and bidding instances will be introduced. In contrast, the billing unit price for a monthly package is around 30-50% of that for models such as pay-as-you-go models. When the proportion of dynamically delivered resources is unreasonable, it may cause a significant waste of IT costs.

In addition, the budgeting and procurement of traditional static capacity planning models are implemented in one stage, so IT cost governance does not need to focus on cost trends. However, when a large number of dynamic resource delivery models are implemented, IT administrators in enterprises need to pay attention not only to the overall cost changes, but also to the cost trends. In some scenarios, expenses need to be predicted to ensure that the cluster's expenses do not exceed the budget due to unexpected large scale.

• Adaptation of enterprise IT cost governance models to cloud native architectures

In terms of cost control, traditional IT cost governance models focus more on the dimension of efficiency enhancement, which reduces the cost of the next capacity planning phase by improving the utilization rate of machines. In the scenario of cloud native IT cost governance, efficiency gains and cost reductions occur simultaneously. Enterprises can adjust resource quotas through monitoring, intelligent recommendation, and other methods to achieve improved resource utilization; Reduce resource costs through elastic scaling, dynamic resource delivery, and other methods. The simultaneous implementation of cost reduction and efficiency enhancement will greatly shorten the cycle of enterprise IT cost governance models, and put forward more requirements for budget management quota management, cost trend prediction, and cost trend alarm.

• The side effects of misuse of inappropriate cost optimization schemes

The optimization methods of traditional IT cost governance models are relatively simple, usually guided by indicators such as resource utilization to achieve cost reduction and efficiency improvement. In the cloud native scenario, various optimization methods emerge endlessly. However, any optimization scheme will pose challenges to the stability of the existing architecture, such as:

When using elastic scaling, it is necessary to consider the matching degree between the scaling sensitivity and the peak of business flow; It is necessary to consider the elegant offline of the business when downsizing; It is necessary to consider whether it will cause a cost black hole (a large amount of resource waste caused by abnormal reasons, such as excessive use of CDN resources caused by DDOS), and so on.

• When using elastic provisioning of big data, it is necessary to consider whether there are idle resources in the cluster that can be reused; It is necessary to consider whether the running time of temporary data jobs is too long, resulting in unreasonable resource billing models; It is necessary to consider whether the utilization rate of resources during elastic supply meets expectations, and so on.

In essence, the optimization of cloud native scenarios mainly focuses on the dynamic nature of scheduling/resources. By means of mobility, time sharing, preemption, scaling, and other means, resource utilization is improved, as well as the overall cluster water level or total core time cost is reduced. Most optimizations are targeted at domain scenarios. Before implementing cloud native IT cost optimization schemes, enterprises need to measure and evaluate the risks caused by architectural changes and the expected benefits of the optimization scheme.

The four issues mentioned above are obstacles that every enterprise cannot overcome when implementing IT cost management during cloud native transformation, restricting the pace of enterprise cloud native transformation, and also troubling a large number of cloud native technology leaders such as Raymond. To address the above issues, cloud native IT cost governance solutions have emerged.

Alibaba Cloud Enterprise Cloud Native IT Cost Governance Party

Alibaba Cloud Container Service ranks first alongside AWS, making it the most complete cloud service provider for container products in the world. As early as 2006, Alibaba Group began to promote the implementation of cloud native technology within the Alibaba Group. Sixteen years of experience in cloud native practice have enabled Alibaba Cloud to better empower enterprises with its thinking and understanding of cloud native technology and help them achieve IT informatization transformation.

In recent years, with the acceleration of enterprises' cloud adoption, the concept of cloud financial management (FinOps) has been mentioned and adopted by more and more enterprises. Cloud financial management (FinOps) is a cloud operating model that combines systems, best practices, and culture to improve the ability of organizations to understand cloud costs. This is an approach that brings financial responsibility to cloud spending, enabling teams to make informed business decisions. Cloud Financial Management (FinOps) enhances collaboration among IT, engineering, finance, procurement, and enterprises. It enables IT to evolve into a service organization focused on leveraging cloud technology to add value to the business. When cloud native technology and the concept of cloud financial management (FinOps) are intertwined, the concept of cloud native IT cost governance (Cloud Native FinOps) is conceived. It is an evolution and evolution of the concept of cloud financial management (FinOps) in the cloud native scenario.

Alibaba Cloud Container Service has launched an enterprise cloud native IT cost management solution to assist enterprises in providing enterprise IT cost management, enterprise IT cost visualization, and enterprise IT cost optimization in the cloud native cloud scenario. Alibaba Cloud's enterprise cloud native IT cost governance solution has five core functions:

Core function 1: Unique cloud native container scenario cost allocation and estimation model

In order to solve the problem of inconsistency between the life cycles of business units and billing units in container scenarios, container services have proposed a unique cost estimation model that combines billing and metering, and include considerations such as cost strategies (payment types, savings plans, vouchers, user discounts, and bidding waves), allocation factors (CPU, memory, GPU cards, GPU graphics, etc.), and resource patterns (ECS EDI HPC), Realize cost estimation for the Pod dimension and cost allocation for cluster share. Through bill analysis, all resource costs of the cluster in a single phase are aggregated, and a complete cloud native container scenario cost allocation and estimation model is implemented in combination with the cost allocation capability of the Pod dimension.

Core function 2: multi-dimensional cost insight, trend prediction, root cause drilling

Support cost insights in four dimensions: cluster, namespace, node pool, and application (label wildcard matching). The cluster dimension focuses on the distribution of cloud resources, trend changes in resource costs, the ratio of cluster water level to waste, and the trend and prediction of cluster costs. It can assist IT administrators in accurately determining the trend of cost consumption and preventing scenarios that exceed budget; Namespace focuses on cost allocation, supports short-term cost estimation and long-term cost allocation, supports correlation analysis of scheduling water levels, resource usage, and cost trends, assists department administrators in cost estimation, drill down to analyze cost waste, and improves department resource utilization; The node pool dimension focuses on resource cost planning and governance, and assists IT asset managers in optimizing resource portfolios and payment strategies through correlation analysis of instance types, unit kernel times, scheduling water levels, and utilization water levels. The application (label wildcard matching) dimension focuses on cost optimization in domain scenarios, such as big data, AI, offline operations, online applications, and other upper level application scenarios. Real-time cost estimation and task level cost accounting can be performed through application dimension cost insights.

Through four dimensions of cost insight, it is possible to provide data to support cost optimization functions and solutions in the entire scenario, and to reduce costs and increase efficiency with reasonable evidence.

Core function 3: cost optimization capability for the entire scenario, and coverage of solutions

For actual business scenarios of different enterprises, Alibaba Cloud Container Service provides full scenario resource portrait creation, cost optimization capabilities, and solutions (see the end of the article for details):

• Elastic expansion

• Mixing part

• Intelligent resource portrait

• Cloud native big data/AI

• Cloud native workflow

In addition, most of the cost optimization strategies of enterprises need to be supported by business scenarios, and customization and secondary development may exist in many scenarios. Therefore, the cost insight provided by Alibaba Cloud Container Services' enterprise cloud native IT cost governance solution is completely decoupled from the upper level optimization solution, and can be measured and evaluated through four dimensions of cost insight, covering the entire scenario of cost optimization methods.

Core function 4: Multi cluster/multi cloud/hybrid cloud full type cloud cost management capability

Multi cloud is a new trend for enterprises to use the cloud at present. There are significant differences in the billing models of different cloud vendors, such as the common monthly package payment methods used by domestic cloud service providers, the common credit card withholding/post payment methods used by international cloud service providers, savings plans supported by some cloud service providers, and reserved instances. These all pose more challenges to the cost analysis capabilities of cloud management planes. Alibaba Cloud Container Service's enterprise cloud native IT cost management solution supports the access of cost data from mainstream cloud service providers and IDC self-built computer rooms by providing unified billing and inquiry access and default implementation for cloud service providers. And conduct cost management through a consistent cloud native container scenario cost allocation and estimation model. Cooperate with the enterprise level cloud native distributed cloud container platform ACK One (Alibaba Cloud Distributed Cloud Container Platform) to achieve a unified control plane for multi cloud cloud management and asset management.

Core Function 5: Expert Services for Enterprise Cloud Native IT Cost Governance

Enterprise cloud native IT cost governance is not only a product capability or solution, but also an evolution of enterprise IT management, organizational processes, and culture in the cloud native era. The Alibaba Cloud container service team cooperates with the Alibaba Cloud space-based team to provide complete products and expert services covered by the FinOps concept through Alibaba Cloud Asset Manager.

Alibaba Cloud Asset Manager, as a cloud product evaluated through the "Common Maturity Model for Cloud Resource Oriented Financial Operation Capability" in China, assists enterprises in implementing: cost process governance, cost insight, cost optimization, cost operation, etc., helping enterprises establish a cloud native overall IT cost platform, and accelerating IT innovation and IT decision-making after comprehensive cloudization.

Go back to the real scene

In the face of Raymond's dilemma, how can we optimize costs through the enterprise cloud native IT cost governance solution provided by Alibaba Cloud container services?

Step 1: Raymond first uses the cluster's cost analysis capabilities to view the differences between the cluster's cost trend and cost budget, and can draw preliminary conclusions about cost anomalies.

According to the cost situation of the cluster, it can be seen that the main waste is in cluster B. Then, we can mainly conduct drill down analysis for cluster B.

Step 2: View the cost composition of the cluster, determine the optimization direction and drill down strategy.

In this cluster, it can be seen that computing resources are the main component of costs. Therefore, the direction of the drilldown problem can be oriented towards resource utilization and unit price cost for further analysis.

Step 3: View the resource utilization and unit price cost of the cluster

From the perspective of the scheduling water level of the cluster, it has reached 78%, which is a relatively ideal situation, with a certain amount of space to continue scheduling without excessive waste. From the actual resource utilization rate, only 3% of the actual utilization rate indicates that there are scenarios where resources have been allocated but not fully utilized. In addition, based on the core time unit price of the node pool, the unit price of one of the node pools containing bidding instances approximates the unit price of pay as you go, indicating that the specifications of the selected bidding instance are unreasonable, causing the price per core time to be too high.

Step 4: Drill down the application dimension and locate the problem application

Through the namespace dimension, it can be located that some namespaces have significant peak and trough capacity changes, and after capacity expansion, there is no significant fluctuation or change in resource utilization, indicating that regular scaling does not bring any benefits to the business.

Through the resource waste list provided in the namespace, you can see the names of applications that have experienced a large amount of waste. When filling in the application label, it can be seen that the current application is basically idle, but accounts for 34.74% of the overall consumption of the cluster.

After confirming with his research and development classmates, Raymond found that the timing scaling was configured for a test service that was not yet online, and the number of replicas configured for scaling was relatively large, resulting in a significant waste of resources. In addition, the cost of bidding instance combinations in the cluster has skyrocketed due to inventory issues, and it is necessary to configure the availability zones and specifications of new bidding instances. At this point, Raymond has reconfigured the timing scaling rules and corrected the configuration combination of bidding instances. The problem that has plagued him for a long time has been resolved.

In fact, when we look back at Raymond's problems, they are all small things that may occur in actual production, and it is these insignificant things that may cause significant losses in enterprise IT cost governance. The higher the complexity of the IT system, the more automated the operation and maintenance system is required. Similarly, the richer the means of cloud native cost reduction and efficiency enhancement are, the more data-based and transparent the IT cost management solution is required. Reducing costs and increasing efficiency is the goal, emphasizing results rather than processes. Relying on enterprise cloud native IT cost governance solutions, the goal of optimizing enterprise IT costs can be achieved transparently, digitally, and automatically.

Future prospects for IT cost governance in cloud native enterprises

It is foreseeable that in the future, the concept of cloud financial management (FinOps) will be mentioned and adopted by more and more enterprises, and the ability and solutions to reduce costs and increase efficiency will also spring up like mushrooms. However, from a practical perspective, the concept of IT cost governance in most enterprises has not kept pace with the evolution of the architecture, which has virtually brought a greater burden to the transformation of enterprises into cloud computing. To fully drive and implement the strategy of cloud native IT cost optimization, it is necessary to put the concepts, tools, and processes of cloud native IT cost governance first. Only observable, quantifiable, and measurable optimization schemes can truly prove their value.

Alibaba Cloud's cloud native IT cost governance solution helps enterprises implement the concepts, tools, and processes of enterprise IT cost governance, enabling them to digitally achieve enterprise IT cost management and optimization during the cloud native process, and becoming a practitioner and leader in the field of FinOps.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us