Static Compilation of Java Applications at Alibaba at Scale

By Sanhong Li, Ziyi Lin, Chuansheng Lu and Kingsum Chow from the Alibaba JVM team.

Background

Cloud computing aims to provide computing resources as a service and the core principle of cloud computing is to use only those resources that are necessary to run an application and scale when needed. To take advantage of the benefits of cloud computing, developers should architect and write applications according to this principle.

A microservice architecture breaks a monolith application into many micro-applications (microservices). This is an attractive approach for applications targeting cloud computing platforms. We can start with the number of microservice instances needed to handle the initial load and scale out with more instances when demand is higher, improving resilience by leveraging the ability of clouds to scale horizontally.

The Java platform has become one of the most widely used platforms. Despite its popularity, Java has received many criticisms, such as Java is very slow to boot; Java takes too much memory; Java syntax is verbose. Notably, long boot time in Java has inhibited horizontal scalability. From a business standpoint, customers may have to wait a long time for an application to boot before the results of a request are received. Speeding up the boot time of Java applications on a horizontally scalable platform is our motivation. To achieve that, we adopted GraalVM native image in serverless computing.

GraalVM native image at Alibaba

Over the years, Java has proliferated in Alibaba. Many applications are written in Java. Approximately 10,000 Java developers have written more than a billion lines of Java code! Alibaba has customized most of its Java software based on the vibrant open-source ecosystem. In Alibaba Cloud, these Java programs are developed for online trading, payments, and logistics operations. Many of them are developed as microservices running on top of Kubernetes native environment to service online requests.

At Alibaba, we use the native image technology of GraalVM to statically compile a microservice application into an ELF executable file which results in native code startup times for Java applications. This is needed to address the horizontal scaling challenge described above.

In our scenario, this serverless application is developed based on the SOFABoot framework. Its fat jar size is 120MB+. Many typical components in Java Enterprise space are included such as Spring, Spring Boot, Tomcat, MySQL-Connector, and many others. We refer to applications using this framework as SOFABoot applications. SOFABoot applications were originally running on top of Alibaba Dragonwell (OpenJDK based) designed for a distributed architecture, handling online transactions, and communicating with many other different applications through RPC.

In the global online shopping festival (also called Double 11, or Nov 11) last year, we deployed a number of SOFABoot applications compiled as native images. They managed to serve real online requests in our production environment on a day with the highest transaction volume.

Besides the SOFABoot application, we have also explored the possibility of introducing statically compiled applications into Alibaba Cloud. We successfully deployed a native image version of the Micronaut demo application on Alibaba Cloud's function computing platform.

In the following sections, we describe the challenges we overcame to use GraalVM native image to do the static compilation to achieve the performance gains in our production environment.

How We Did That

GraalVM native image provides a great set of tools for developers to close the gap between traditional and statically compiled Java and provides a way to migrate from the former to the latter. In this section, we will focus on the challenges we faced and the approaches we developed at Alibaba for compiling Java applications into native images. We also contributed many of the solutions back to the GraalVM community.

While most traditional Java features are supported by native image to build and run applications, there are still some limitations that prevent the automatic migration from traditional Java to statically compiled Java programs. Native image requires programmers to provide additional information or modify the original implementation of an application to get the program compiled and run as expected. The challenges we faced while adapting the SOFABoot application were:

Slow build time: Static compilation consumes a large amount of memory resources and time. The build time is long. In the beginning, it took around 100 GB of memory and 4000 seconds to build the SOFABoot application. We observed that the majority of the time was consumed on type flow analysis in the static analysis phase. So we employed a less precise but much lightweight CHA analysis to replace the original type flow analysis for the scenarios that require the fast build. After we employed the CHA approach, the memory needed to build was reduced from 100GB to 20GB and the build time was reduced from 4000 to less than 1000 seconds. We were delighted to see a 4X speedup in the build time which helped speed up the deployments of our applications.

Class initialization: Classes are initialized at runtime in traditional Java programs. Native image enables class initialization at build time whenever possible to improve the runtime performance. Eager class initialization at build time is not always safe and it still needs programmers to adjust the class initialization timing manually. Class initialization may happen in a chain so postponing one class initialization to runtime without postponing its predecessors in the chain may lead to class initialization errors at build time. For example, the following code has a class initialization chain of A->B->C.

class A {
    static B b = new B();
}
class B {
static {
    C.dosomething();
}
}
class C {
  static long currentTime;
static {
    currentTime=System.currentTimeMillis();
}
static void dosomething(){…}
}

For application correctness, class C MUST be initialized at runtime due to the call on System.currentTimeMillis(). As a result, the user MUST also do the initialization for class A at runtime since class A is the root of this class initialization chain — when class A gets initialized it triggers the initialization of B and then eventually C. However, in the actual scenario, when the developers observed class C has been mistakenly initialized at build time it was difficult for them to find out that class A was the root cause of the issue, i.e., the developer had mistakenly configured class A to be initialized at build time. Native image provides an initialization tracing feature based on instrumentation to resolve this kind of issue, but it fails when the class cannot be instrumented, e.g., when the bootstrap class loader loads the class. In our solution, we modified the Hotspot code to track the class initialization chain at the VM level and helped our developers to track the class initialization chain and locate the root cause of this kind of mistake. Thus our solution enables the broader use of ahead-of-time compilation by the Java developers.

Dynamic class loading: Dynamic class loading is defining and loading classes at runtime with the bytecodes of classes not known at build time. Dynamic class loading has been widely used in real-world applications, libraries, and frameworks. Some typical examples include the serialization/deserialization mechanism in Java which relies on dynamically generated constructor accessors, Spring which uses cglib for proxies, and Derby which uses a dynamic generated classes for SQL statements. We support dynamic class loading with 4 steps: 1) Modify the dynamic names of generated class' as fixed ones. We guarantee the same class always has the same name across different runs. 2) Implement method interceptors in native image agent to dump dynamically generated class with fixed name pattern to the file system. 3) Compile the dumped classes into the native image at build time. 4) Find the prepared "dynamically generated" classes at native image runtime instead of defining them. We have committed this feature to the community.

Slow GC performance

In the world of static compilation, garbage collection is still an indispensable component. The default garbage collector in native image is a pure 'Copy' GC, which divides the heap space into two parts: young and old spaces. Java threads keep allocating objects in the young space, and when young space is full, a 'Young GC' is performed by evacuating all the live objects from young space to old space. When the old space is full, a 'Full GC' is performed by compacting all the live objects in Java heap and release free spaces.

This approach is relatively straightforward and useful for many small workloads, but when we tried to support larger workloads such as Spring-based services, the full GC time and frequency become a headache. We observed a single GC pause time could exceed 1.5 seconds for some Java services. That is unacceptable for online applications. So we made some improvements to the garbage collector component of native image as follows:

Age information is added to objects in the young generation. Age is added to the memory chunk of a group of objects and it is increased by 1 if these objects survived a young GC; live objects are promoted to the old generation only after reaching a certain age threshold.
A background thread is used to un-map memory asynchronously. Native image uses memory chunks to hold Java objects. When it wants to release free chunks to the OS to lower the footprint, it just un-maps the chunk. We observed that for a typical Java application un-mapping memory might cost a long time, so we make this operation asynchronous and execute it outside the stop-the-world pause.
Image roots are scanned based on a card table in the young GC. For some specific workloads the final executable image may be large after static compilation, which usually holds a vast set of GC roots and has to be scanned thoroughly for any GC. In the existing design of the native image garbage collector this may cost much time. We added a card table for the image roots, and for young GC operations we only scan those references that got dirtied since last GC pause.

Some of these changes have been committed into the GraalVM project.

Performance Gains

Startup time speedup

After we made the changes to address the challenges of using native image, we collected the performance data for static compiled SOFABoot applications in our production environment. As shown in the Figure, the startup time decreased from 60 seconds to 3 seconds, i.e., 20X speedup in the starting up time of the Java app. In addition, the GC pause time was controlled to under 100 milliseconds.

Sofaboot Application Startup Time Comparison

We also ran the statically compiled version of the Micronaut-based application on the function computing platform of Alibaba Cloud. The result is also fascinating. native_image_hello is a statically compiled application, and springboot_hello is the same application deployed as a jar and run on top of traditional Java runtime. We have shown the results in the Figure below: native_image_hello is 100x faster at startup with 1/6 memory cost, which can help customers save 80% ("billed duration" is the time the customer is charged for on the cloud platform). The response time of these two deployed applications is nearly the same.

Traditional Java Function vs Statically Compiled Function

GC performance

With the above enhancements we did in GC, we successfully reduced the p90 pause time of a typical Java microservice from 1.5+ seconds to around 100 milliseconds.

GC time improvements

Conclusion

If you're exploring ways to develop a serverless application for the cloud it's worth evaluating GraalVM native image, especially when you're looking for the best startup performance and lower memory footprint.

We are very pleased with the results in our production environment. We are looking forward to driving innovation through the collaboration with the GraalVM community.

Big thanks to the Alibaba JVM team for sharing their experience using GraalVM and contributions to the community!

Thanks to Shaun Smith.

Source: https://medium.com/graalvm/static-compilation-of-java-applications-at-alibaba-at-scale-2944163c92e

The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

5035318385119756 April 14, 2021 at 7:26 am

Hi! Impressive work on replacing the analysis engine! Speed up figures look great, too.I wonder how you view the tradeoff between build time and performance. Sure 4000 seconds is a long time, but you only run native image once per deployment, right? As less precise analysis gets you less performant executable, is that justified by faster build process?

: Alibaba Clouder April 16, 2021 at 3:32 am

From the aspect of deployment-and-run, it is true that the long off-line compilation is not an issue, and longer analysis time leads to more precise result, resulting smaller output artifact. But from the aspect of development, it is too slow. Developers definitely don't want to edit a few lines of code, and then wait an hour to see the result.

0

: Alibaba Clouder April 16, 2021 at 3:33 am

But the GraalVM community has improved the build time significantly since the time we post this article. The analysis is multithreaded now, the same SOFABoot application can be built in 1000 seconds with the same analysis precision as the previous 40000 seconds build.

0

5035318385119756 April 14, 2021 at 7:30 am

Also I wonder if you've considered precompiling dependencies that do not change between builds (Spring, Tomcat etc) into a library that'd link with the main app? Is that feasible?Thanks for the article!

: Alibaba Clouder April 16, 2021 at 3:38 am

Your second question is about the modularity build. GraalVM native image was not designed to build in modules, but still there is a way to achieve your goal. GraalVM native image supports building a Java application into native library. But you need to write a layer to expose all public APIs as entry points under GraalVM's C-Java interaction protocol. That is a huge amount of manual work (there is no tool to automate this now, although it is possible to write one). When the library is built into native so file, you can use them as JNI library in your business code.

0

: Alibaba Clouder April 16, 2021 at 3:39 am

But still have many problems to face: 1. Your call site code has to change to jni style. 2. You'll have a large library file to deploy, although you may only need a very small part of it. 3. Your Java context are divided. Your business code and library code are in two different Java environment, you have to synch the context if you still want it behaves the same as one. We strongly suggest you not do it.

0

Community

Static Compilation of Java Applications at Alibaba at Scale

Background

GraalVM native image at Alibaba

How We Did That

Slow GC performance

Performance Gains

Startup time speedup

GC performance

Conclusion

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

5035318385119756 April 14, 2021 at 7:26 am

Alibaba Clouder April 16, 2021 at 3:32 am

Alibaba Clouder April 16, 2021 at 3:33 am

5035318385119756 April 14, 2021 at 7:30 am

Alibaba Clouder April 16, 2021 at 3:38 am

Alibaba Clouder April 16, 2021 at 3:39 am

Alibaba Clouder

Related Products

Microservices Engine (MSE)

Serverless Workflow

Serverless Application Engine

Super App Solution for Telcos