By Zhiqi
We have a project that uses a full memory caching mechanism. The main goal is to achieve excellent RT (Response Time) while dealing with a small amount of data that can be easily processed by the standard 4C8G container. However, one day, the staging environment started to heavily alert FullGC, which was traced back to the cache becoming too large.
We usually load configuration items with a data magnitude of about 100 pieces into the memory. Recently, a new demand has caused the configuration data volume to expand to 100,000 pieces, leading to a significant increase in memory usage. Upon analysis, we found that the information entropy of these data is not very high. Most of the JSON stores strings with a limited number of permutations and combinations, but these strings are repeatedly loaded into the heap space in a new String way by the deserialization framework.
Inspired by the concept of a constant pool, I thought of using it to solve the issue without any changes to the business logic or design.
It is clear that the Fastjson serialization tool we use does not perform constant pool processing on the value field. This makes sense as value typically represents unlimited possibilities. Introducing every incoming string into the constant pool would have a detrimental effect on the system. However, we understand that in our specific business scenarios, certain values are limited and do not require Young GC. Hence, we need to make these specific values constants by explicitly calling the String.intern() method.
To achieve this, let’s start with String.intern().
Fastjson uses the appropriate ObjectDeserializer to deserialize a field and the @JSONField(deserializeUsing = xxx.class) annotation also gives us space to customize deserializers. Therefore, we plan to customize a deserializer to call the intern method.
public class StringPoolDeserializer implements ObjectDeserializer {
@SuppressWarnings("unchecked")
@Override
public <T> T deserialze(DefaultJSONParser parser, Type type, Object o) {
if (!type.equals(String.class)) {
throw new JSONException("StringPoolDeserializer can only deserialize String");
}
return (T) ((String) parser.parse(o)).intern();
}
@Override
public int getFastMatchToken() {
return 0;
}
}
After this optimization, 800M heap memory is released, and the metaspace hardly increases. After all, our data information is repeated with a very low entropy.
However, the remaining size is still larger than expected, and it is later found that this method cannot process value in members of the Map<String, String>
type.
Let's take a further look at how Map is processed. Fastjson internally implements MapDeserializer to deserialize fields of Map type. However, the implementation of this deserializer is relatively complex and the methods of the core mechanism are modified by final, which is not suitable for solving the problem by inheriting, overriding, and replacing. Later, a unique value is discovered in the code.
In the path of Map, we can intervene in the type of Map and override the corresponding put method to find the appropriate String.intern() call point.
Note:
Apart from the put method, other operations such as putAll and Map construction using parameter maps, can augment a Map's contents. Instead of recycling the put method, these actions share a putVal method, which is also finalized. Technically, given putVal cannot be overridden, these additional methods would also necessitate overrides. However, considering that MapDeserializer only calls the put method and the implementation of other methods is more complicated, only the put method is overridden.
My solution is to directly override the put method, which is simple and easy. Then replace the original HashMap type declaration of the JavaBean member with StringPoolMap:
public class StringPoolMap extends HashMap<String, String> {
@Override
public String put(String key, String value) {
if (key != null) {
key = key.intern();
}
if (value != null) {
value = value.intern();
}
return super.put(key, value);
}
}
At this point, the optimization has been completed, and the memory usage is reduced from 800M to 619M, saving 1G of space compared with the original 1.6G+.
The essence of this problem is not the call of String.intern(), but that the low information entropy is not well compressed. Therefore, the second iteration will re-consider and solve this problem from the design of the data structure.
I've been reading the JDK (OpenJDK) source code recently, so I'd like to expand on it.
String.intern()
is a native method that represents:
Find the corresponding source code:
Source code corresponding to String.intern()
#include "jvm.h"
#include "java_lang_String.h"
JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
return JVM_InternString(env, this);
}
A JVM_InternString is called and the object "this" is passed in.
JVM_InternString
#include "jvm.h"
#include "java_lang_String.h"
JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
return JVM_InternString(env, this);
}
StringTable
oop StringTable::intern(Handle string_or_null_h, const jchar* name, int len, TRAPS) {
unsigned int hash = java_lang_String::hash_code(name, len);
// Check the shared table and the local table for the string
// Return quickly if found
oop found_string = lookup_shared(name, len, hash);
if (found_string != nullptr) {
return found_string;
}
if (_alt_hash) {
hash = hash_string(name, len, true);
}
found_string = do_lookup(name, len, hash);
if (found_string != nullptr) {
return found_string;
}
// If not, create one and insert it
return do_intern(string_or_null_h, name, len, hash, THREAD);
}
Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.
In-Depth Exploring of the Spring MVC Best Practice for Cross-Origin Issues
1,012 posts | 247 followers
FollowAlibaba Cloud Native Community - May 8, 2023
Alibaba Cloud Native Community - July 19, 2022
Alibaba Cloud Community - January 10, 2024
Alibaba EMR - July 18, 2022
Arman Ali - May 26, 2021
ApsaraDB - April 27, 2023
1,012 posts | 247 followers
FollowExplore Web Hosting solutions that can power your personal website or empower your online business.
Learn MoreTair is a Redis-compatible in-memory database service that provides a variety of data structures and enterprise-level capabilities.
Learn MoreA cost-effective online time series database service that offers high availability and auto scaling features
Learn MoreStream sports and events on the Internet smoothly to worldwide audiences concurrently
Learn MoreMore Posts by Alibaba Cloud Community