Community Blog Best Practices of Spring Cloud Applications in Kubernetes: High Availability (Capacity Assessment)

Best Practices of Spring Cloud Applications in Kubernetes: High Availability (Capacity Assessment)

This article will introduce how to ensure high business availability through accurate bottleneck location and capacity assessment.

Step up the digitalization of your business with Alibaba Cloud 2020 Double 11 Big Sale! Get new user coupons and explore over 16 free trials, 30+ bestselling products, and 6+ solutions for all your needs!

By Alibaba Container Service


This is the latest article of the Best Practices of Spring Cloud Applications in Kubernetes blog series. This article will introduce how to ensure high business availability from another perspective, that is, to locate bottlenecks and assess the capacity online and in advance in the business preparation stage. By doing so, users can reduce costs, and discover system bottlenecks more efficiently and precisely, as well as achieving the most accurate capacity assessment.

Highly-Available Systems

First, let's take a look at the highly-available system. What are the high-availability policies during the application lifecycle and what capabilities do they have?



As shown above, responsive high availability policies are configured throughout the application lifecycle. For example, as introduced in previous articles, the traffic protection is the online management policy during online operation. The chaos engineering is the policy for system drills. Moreover, the full-procedure stress testing is an important policy in the planning stage, including online stress testing (environment selection), capacity planning (stress testing implementation), and elastic scaling (intra-ecosystem collaboration).

Following sections focus on the importance of capacity assessment and its implementation through stress testing.

Why Is Capacity Assessment Needed

The importance and necessity of the capacity assessment is an echoing issue. Today, I would like to summarize its importance and necessity from perspectives of technology and business strategy.


The purpose of capacity assessment is to solve capacity problems, like preparations for new business launch and large marketing activities. System performance uncertainty caused by peak traffic in large activities is the typical scenario that needs the capacity assessment the most. The cycle of an ideal marketing activity should have a closed-loop process as followings:


Performance testing is the core means of capacity assessment. After performance testing, a series of monitoring and analysis are performed on the client, application system, and basic load. Finally, bottlenecks can be located and corresponding optimizations can be determined. As shown in the above figure, performance testing assesses capacity, and locates and solves bottlenecks through precise and efficient stress testing, ensuring the stable performance of activities.

How to Implement Performance Testing

It has been seven years since the launch of Alibaba Cloud's full-procedure stress testing in 2013. During these seven years, Alibaba Cloud has continuously accumulated and summarized experience, and made optimizations and progresses. Effective process and labor division managements are indispensable for such a large-scale project. The preliminary preparation for the full link stress testing will not be described again here. If you are interested, see the article How Does Alibaba Cloud Implement Full-procedure Stress Testing?. The following parts will focus on operations of the stress testing in the execution stage.

Before full-procedure stress testing, a single application performs internal stress testing to improve the efficiency of full-procedure stress testing. That is to say, the linkage problem is solved after internal problems. So, following sections will describe how to implement the stress testing on Spring Cloud applications and the full-procedure stress testing.

Single-application Stress Testing for Spring Cloud Applications

Many developers choose open-source JMeter for single-application stress testing, and they even build platforms for high concurrency. However, both of the two methods are not recommended because of their obvious disadvantages. Alibaba Cloud Performance Testing Service (PTS), however, is an on-cloud stress testing service that is compatible with JMeter. Users only need to upload scripts to initiate stress testing.


At the same time, PTS supports direct stress testing for microservices. Users do not need to manually manage and upgrade plug-ins. Instead, they only need to directly select the corresponding information in PTS to initiate stress testing quickly.


Full-procedure Stress Testing

The full-procedure stress testing involves many preliminary steps, such as environment selection and transformation, data preparation, and security policies, just like performance stress testing introduced before. So, preliminary steps will not be described here. This section mainly introduces the implementation of the full-procedure stress testing. In other words, by configuring the same business scenario as the online business model, the full-procedure stress testing assesses capacity and locates bottlenecks through multi-dimensional and multi-scenario stress testing with real traffic from the public network.

Generally, multiple stress testing policies are implemented in formal testing, according to the stress testing plan. For example, stress testing for Taobao's Double 11 generally includes following steps:

Peak pulse simulation

Peak pulse simulation simulates the exact target peak traffic at 00:00 of the promotion day. Though the simulation, Taobao can conduct a promotion-state stress testing and observe the system performance.

System limit testing

System limit testing disables traffic limiting and degradation protection and raise the current stress testing value to observe the system limit. Note that the limit testing can be carried out only after the target stress testing value has been reached. Increased-value stress testing can be implemented for several times until the system encounters an exception.

Traffic limiting and degradation verification

Traffic limiting and degradation verification is used to verify whether the traffic limiting and degradation protection is normal. The product Application High Availability Service (AHAS) provides comprehensive traffic limiting and degradation capabilities for full-procedure degradation protection.

Destructive testing

Destructive testing is designed to verify the effectiveness of the plan. It is similar to the plan implementation in the disaster recovery exercise. In this test, the promotion-state stress testing is kept going to verify the effectiveness of the plan and observe the impact on the system after implementation.

Stress Testing in PTS

Above-mentioned stress testing steps can all be implemented in PTS. In addition, PTS allows the configuration of data at different testing levels for multiple stress testing and system performance observance. Stress testing should be repeatedly implemented and verified. The following section takes peak pulse simulation as an example to introduce steps of stress testing in PTS.

The first step is the scenario construction. PTS provides various methods for scenario construction, including JSON, JMX, and YAML scripts importing, zero-code interactive UI creating, and result importing from cloud recorder. Besides, PTS is fully compatible with JMeter scripts. The following diagram shows the scenario construction.


Take PTS exclusive native engine, also called interactive UI orchestration model as an example. After creating a business scenario, PTS provides stress source customization capability. With this capability, users can customize stress sources from multiple regions and operators, and simulate real traffic conditions in a more authentic manner.


In addition, "unmanned" stress testing can be implemented through SLA and scheduled task capability. Thus, performance limits testing of core business procedures can be implemented periodically.


After the stress testing is completed, PTS provides a downloadable stress testing report. The report contains detailed statistical data, trend chart data, sampling logs, and monitoring data. Users can quickly locate and analyze problems.

Stress Testing of Spring Cloud Applications in EDAS

Alibaba Cloud Enterprise Distributed Application Service (EDAS) supports microservices governance and integrates the stress testing capabilities of PTS. On the service query page, users can click stress testing button to start performance testing in PTS, as shown in the following figure:



This article briefly introduces the relevant policies of the business highly-available system, the importance of capacity assessment, and implementation methods of performance testing. This article also introduces the quick application of stress testing in Spring Cloud. In addition, there are much more functions of PTS:

• Isolated transformation of full-procedure stress testing traffic
• Environment management and localized plug-ins of JMeter
• Architecture monitoring of cloud business during stress testing
• Advanced traffic customization of JMeter

Taking performance stress testing as the main line, Alibaba Cloud conducts system capacity assessment during the planning period. By taking stress testing results for reference, Alibaba Cloud carries out multi-dimensional system protection from Gateways to applications, through traffic protection of AHAS. By doing so, Alibaba Cloud achieves high availability after business system is launched. In the future, PTS and AHAS will provide more intelligent functions to better help online services achieve continuity in various extreme scenarios.

0 0 0
Share on

Alibaba Clouder

2,310 posts | 524 followers

You may also like