×
Community Blog Alibaba Dragonwell ZGC – Part 2: The Principles and Tuning of ZGC | A New Garbage Collector

Alibaba Dragonwell ZGC – Part 2: The Principles and Tuning of ZGC | A New Garbage Collector

Part 2 of this 3-part series introduces the principle and tuning of ZGC.

By Hao Tang

1
Part 1 of this 3-part series introduced the basic concepts of Z Garbage Collector (ZGC) and the large-scale ZGC practice of Alibaba. The business and cloud customers of Alibaba enjoyed the optimized response time brought by ZGC but encountered some practical problems. If you want to use ZGC better, we need to understand the principles of ZGC and learn to analyze ZGC logs to tune ZGC.

Related Articles

2

The Principles of ZGC

From a macro perspective, ZGC is a concurrent and compacting GC algorithm:

  • Concurrency: While Java threads are running, GC threads are executed silently in the background.
  • Compaction: The active objects are periodically organized in the heap to solve the problem of memory fragmentation.

Compared with the original 100-millisecond paused Parallel GC and G1 in Java and CMS that has not solved the fragmentation problem, concurrent and compacting ZGC can be regarded as a major leap forward in the capability for GC in Java. While a GC thread sorts out memory, a Java thread can continue its execution.

ZGC uses a mark-compact strategy to collect the Java heap. ZGC first concurrently marks live objects in the heap and then concurrently relocates the live objects to some regions together. The difference between ZGC and the earlier GC in Java is that ZGC is a single-generational garbage collector that traverses all objects in the heap during the marking phase.

Then, the question arises – how does ZGC achieve concurrent marking and relocating? This will introduce the core technologies of ZGC: load barrier and colored pointer.

The load barrier of ZGC aims to insert a processing logic for a pointer when the pointer is loaded:

  • If the pointer points to an object that has been relocated, the load barrier will correct the pointer.
  • In the marking phase, if the pointer is not marked, the load barrier will mark the pointer.
  • In the relocating phase, if the pointer points to the regions where objects will be relocated, the object pointed to by the pointer will be relocated, and then the pointer will be corrected.

The load barrier ensures the correct object can be accessed every time the pointer is loaded, when GC threads and Java threads are running concurrently.

The colored pointer of ZGC uses the unused upper bits of the pointer as the color of the pointer to indicate the state of the pointer. Therefore, when the load barrier processes the pointer, the load barrier can directly obtain the state of the pointer and decide how to process the pointer. The product-ready ZGC supports the address space of 2 ^ 44=16TB. 44+4=48 bits are used as the address of the colored pointer, and the upper 4 bits are the color of the pointer. The colored pointer and the load barrier cooperate with each other to convert the partial conditional judgment in the load barrier into the judgment of the pointer color. If the pointer color is wrong, the load barrier will correct the pointer.

3

Log Analysis of ZGC

Three short pauses are required during the actual execution of a single ZGC cycle. Each pause is followed by several concurrent phases.

[2020-12-23T13:30:57.402+0800] GC(10) Garbage Collection (Allocation Rate) [2020-12-23T13:30:57.408+0800] GC(10) Pause Mark Start 2.918ms [2020-12-23T13:30:58.083+0800] GC(10) Concurrent Mark 674.216ms [2020-12-23T13:30:58.087+0800] GC(10) Pause Mark End 1.336ms [2020-12-23T13:30:58.105+0800] GC(10) Concurrent Process Non-Strong References 18.293ms [2020-12-23T13:30:58.111+0800] GC(10) Concurrent Reset Relocation Set 5.533ms [2020-12-23T13:30:58.111+0800] GC(10) Concurrent Destroy Detached Pages 0.001ms [2020-12-23T13:30:58.121+0800] GC(10) Concurrent Select Relocation Set 10.148ms [2020-12-23T13:30:58.130+0800] GC(10) Concurrent Prepare Relocation Set 9.083ms [2020-12-23T13:30:58.136+0800] GC(10) Pause Relocate Start 2.452ms [2020-12-23T13:30:58.203+0800] GC(10) Concurrent Relocate 66.595ms ... (Omit some data statistics here) [2020-12-23T13:30:58.203+0800] GC(10) Garbage Collection (Allocation Rate) 62020M(76%)->41270M(50%)

The GC log above shows a typical ZGC cycle. The phase that starts with Pause in each cycle in each row is the pause phase. The three pause phases are listed below:

  • Pause Mark Start
  • Pause Mark End
  • Pause Relocate Start

In the GC log above, the periods of the three pause phases of ZGC are significantly lower than 10ms. These three pause phases are mainly responsible for marking and relocating GC Roots and marking the thread synchronization.

The concurrency phase that starts with Concurrent is after these three pause phases. The two core concurrency phases are Concurrent Mark and Concurrent Relocate.

The other concurrency phases are mainly some preparatory work before the Concurrent Relocate.

4
An Illustration of the ZGC Stages

Currently, the Concurrent Mark of ZGC marks all live objects in the entire heap, which is different from generational GCs like G1/CMS/Parallel GC and belongs to a single-generation GC. During the process of Concurrent Mark, the wrong pointer in the heap will be corrected. The strategy of the Concurrent Mark of ZGC will select a certain region where the degree of fragmentation reaches a certain threshold (ZFragmentationLimit) to reduce the burden of relocating objects, which is similar to the Garbage First strategy of the G1.

ZGC Tuning

The following part describes the tuning details related to ZGC. Users should complete the basic tuning part at least.

Basic Tuning

In general, ZGC sets the heap space size (Xmx) and the number of concurrent GC threads (ConcGCThreads). All ZGC users should enable GC logs and enable Xlog:gc*:gc.log:time to record more ZGC details.

Heap Space Size

GC usually requires a developer to specify the heap space size. The specific value will be greater than the total size of the live objects in the heap. The higher proportion of redundant space, the better the GC performance is. For example, if the total size of estimated objects reaches 32GB, the heap space size is set as Xmx40g, which means 40GB of the heap is enabled.

ZGC differs from traditional GC. While ZGC collects objects, Java threads are also allocating new objects. Therefore, ZGC requires a higher proportion of redundant space than traditional GCs.

The total size of objects allocated during each round of ZGC can be estimated by allocation speed and single round ZGC time, so the size of heap space should be greater than the total size of live objects + the total size of objects allocated during a single ZGC.

You can find the preceding allocation speed and single round ZGC time in GC logs.

The Number of Concurrent GC Threads

The default number of concurrent GC threads in ZGC is one-eighth of the CPU cores, such as a 16-core machine. If ConcGCThreads is not specified, ZGC will use two concurrent GC threads.

In GC logs, if Allocation Stall frequently appears, it means the collection cannot keep up with the allocation. Therefore, ConcGCThreads may be required to be increased. However, ConcGCThreads cannot be increased indefinitely because too many concurrent GC threads will occupy CPU resources and affect the normal execution of Java threads.

Note: Concurrent GC threads (ConcGCThreads) are different from parallel GC threads (ParallelGCThreads). The former can be executed concurrently with Java threads, and the latter is the GC threads during GC pauses.

Advanced Tuning

The feature of product-ready ZGC also supports several advanced ZGC tuning options. Please see the instruction of Alibaba Dragonwell 11.0.11.7 for more information.

The core part of advanced tuning is the control of GC trigger timing. Since ZGC still allocates objects during collecting, ZGC is required to trigger GC sometime in advance, not when the heap space is full. Therefore, the heap space will not become full during the ZGC execution or result in Allocation Stall or OOM. However, if ZGC is triggered too frequently, CPU resources will be consumed more, thus reducing the throughput rate.

Dragonwell 11 supports the following options related to GC trigger timing:

  • ZAllocationSpikeTolerance: ZGC estimates the total size of objects allocated during a single ZGC by estimating Allocation Speed and Single Round ZGC Time. As long as this total size is less than the current remaining heap space, GC needs to be triggered. However, since the allocation speed of Java application is often unstable, it is necessary to multiply the allocation speed by the glitch coefficient, ZAllocationSpikeTolerance, to trigger GC conservatively in advance. If the allocation speed of Java application is unstable and Allocation Stall occurs occasionally, the appropriate ZAllocationSpikeTolerance should be increased.
  • ZCollectionInterval: The GC is triggered regularly to avoid excessively long GC intervals.
  • ZProactive: It literally means to proactively trigger GC. It is used to handle cases where the allocation rate is low.
  • ZHighUsagePercent: ZGC is triggered if the level of the heap exceeds this percentage.

ZGC is triggered as long as one of the conditions above for GC trigger timing is met.

The SoftMaxHeapSize option can set a soft upper limit of the ZGC heap space between Xmx and Xms. The ZAllocationSpikeTolerance, ZProactive, and ZHighUsagePercent above all use the SoftMaxHeapSize value as the soft upper limit of the ZGC heap space. When the allocation speed is too fast, the heap space can be expanded up to Xmx. When the allocation speed is slow, the heap space can be contracted to Xms. SoftMaxHeapSize usually needs to enable -XX:+ ZUncommit.

In addition, there are some useful advanced tuning features:

  • ZFragmentationLimit: Control the fragmentation degree of ZGC regions. The lower the ZFragmentationLimit, the more thorough ZGC collection is.
  • ZMarkStackSpaceLimit: Adjust the size of the ZGC marking stack space
  • ZUnloadClassesFrequency: Control the frequency of ZGC class unloading
  • ZRelocationReservePercent: Control the reserved allocation space of ZGC to reduce risks of OOM
  • ZStatisticsInterval: Control the output frequency of statistics in ZGC logs. The default output once every ten seconds may affect the interpretation of GC details.

ZHighUsagePercent, ZUnloadClassesFrequency, and ZRelocationReservePercent above are unique options for Dragonwell 11. If you switch to other versions of OpenJDK, avoid using these options.

Part 3 of this 3-part series will introduce Alibaba Dragonwell 11 in terms of its production-ready transformation for ZGC.

Dragonwell has joined the Java language and virtual machine SIG in the Anolis community (OpenAnolis). At the same time, Anolis operating system (Anolis OS) 8 supports Dragonwell cloud-native Java. You are welcome to join the SIG Community and construct the community together.

About the Author

Hao Tang joined the Alibaba Cloud programming language and compiler team in 2019 and is currently engaged in JVM memory management optimization.

SIG Address: https://openanolis.cn/sig/java/doc/216166872482840581

0 0 0
Share on

OpenAnolis

22 posts | 0 followers

You may also like

Comments

OpenAnolis

22 posts | 0 followers

Related Products