By Wen Shaojin (Gaotie)
Implementing strings after JDK 8 and 9 is different. The String structure in JDK 8 is listed below:
class String {
char[] value;
// The constructor can copy.
public String(char value[]) {
this.value = Arrays.copyOf(value, value.length);
}
// No Copy Constructor
String(char[] value, boolean share) {
// assert share : "unshared not supported";
this.value = value;
}
}
class String {
static final byte LATIN1 = 0;
static final byte UTF16 = 1;
byte code;
byte[] value;
// No Copy Constructor
String(byte[] value, byte coder) {
this.value = value;
this.coder = coder;
}
}
After JDK 9, the value is stored by byte[], and the code field is used to distinguish between LATIN1 and UTF16. Most of the strings are LATIN1. As such, when we construct strings or encode strings into binary, we implement ZeroCopy to achieve extreme performance.
sun.Unsafe provided after JDK 8 can perform some native operations with better performance. Insecure and wrong calls will cause a JVM crash. If used correctly, it can improve performance. Unsafe can help you bypass any restrictions.
public class UnsafeUtils {
public static final Unsafe UNSAFE;
static {
Unsafe unsafe = null;
try {
Field theUnsafeField = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafeField.setAccessible(true);
unsafe = (Unsafe) theUnsafeField.get(null);
} catch (Throwable ignored) {
// ignored
}
UNSAFE = unsafe;
}
}
JDK 8 starts to support Lambda to easily map a Method to a Lambda Function to avoid reflection overhead. Java.invoke.LambdaMetafactory can do this, but it is limited by visibility, which means private methods cannot be called. There is a trick, combined with Unsafe, to construct a Trusted MethodHandles.Lookup in different versions of JDK to bypass the visibility restriction to call any JDK internal method:
import static com.alibaba.fastjson2.util.UnsafeUtils.UNSAFE;
static final MethodHandles.Lookup IMPL_LOOKUP;
static {
Class lookupClass = MethodHandles.Lookup.class;
Field implLookup = lookupClass.getDeclaredField("IMPL_LOOKUP");
long fieldOffset = UNSAFE.staticFieldOffset(implLookup);
IMPL_LOOKUP = (MethodHandles.Lookup) UNSAFE.getObject(lookupClass, fieldOffset);
}
static MethodHandles.Lookup trustedLookup(Class objectClass) throws Exception {
return IMPL_LOOKUP.in(objectClass);
}
Note: The implementation in IBM OpenJ9 JDK 8/11 version is limited by visibility and requires additional processing. Please refer to FASTJSON2 JDKUtils#trustedLookup code for more information.
The key to quickly constructing strings is reducing copy or doing ZeroCopy. The implementation in JDK 8, JDK 9-15, JDK 16, and later versions is different.
In JDK 8, you need to call its constructor String(char[], boolean) to construct a String object with ZeroCopy. For example:
BiFunction<char[], Boolean, String> stringCreatorJDK8
= (char[] value, boolean share) -> new String(chars, boolean);
Since the String(char[], boolean) method is not public, the preceding code will report an error. In order to construct a TRUSTED MethodHandles.Lookup through reflection, the internal method of String is called and mapped to a BiFunction. The code is listed below:
import com.alibaba.fastjson2.util.JDKUtils;
import java.util.function.BiFunction;
import java.lang.invoke.MethodHandles;
import static java.lang.invoke.MethodType.methodType;
MethodHandles.Lookup caller = JDKUtils.trustedLookup(String.class);
MethodHandle handle = caller.findConstructor(
String.class,
methodType(void.class, char[].class, boolean.class)
);
CallSite callSite = LambdaMetafactory.metafactory(
caller,
"apply",
methodType(BiFunction.class),
methodType(Object.class, Object.class, Object.class),
handle,
methodType(String.class, char[].class, boolean.class)
);
BiFunction<char[], Boolean, String> STRING_CREATOR_JDK8
= (BiFunction<char[], Boolean, String>)
callSite.getTarget().invokeExact();
From JDK 9 to JDK 15, we want to construct a function like this to ZeroCopy a String object.
BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11
= (byte[] value, byte coder) -> new String(value, coder);
Similarly, the String(byte[], byte) method in JDK 9 is not public and cannot be called directly. The preceding code will report an error. Call the String internal method to construct a TRUSTED MethodHandles.Lookup method, as shown below:
import com.alibaba.fastjson2.util.JDKUtils;
import static java.lang.invoke.MethodType.methodType;
MethodHandles.Lookup caller = JDKUtils.trustedLookup(String.class);
MethodHandle handle = caller.findConstructor(
String.class,
methodType(void.class, byte[].class, byte.class)
);
CallSite callSite = LambdaMetafactory.metafactory(
caller,
"apply",
methodType(BiFunction.class),
methodType(Object.class, Object.class, Object.class),
handle,
methodType(String.class, byte[].class, Byte.class)
);
BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11
= (BiFunction<byte[], Byte, String>)
callSite.getTarget().invokeExact();
Note: The preceding method does not work when the user configures the JVM parameter -XX:-CompactStrings.
stiatic BiFunction<char[], Boolean, String> STRING_CREATOR_JDK8 = ...
static BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11 = ...
static String formatYYYYMMDD(LocalDate date) {
int year = date.getYear();
int month = date.getMonthValue();
int dayOfMonth = date.getDayOfMonth();
int y0 = year / 1000 + '0';
int y1 = (year / 100) % 10 + '0';
int y2 = (year / 10) % 10 + '0';
int y3 = year % 10 + '0';
int m0 = month / 10 + '0';
int m1 = month % 10 + '0';
int d0 = dayOfMonth / 10 + '0';
int d1 = dayOfMonth % 10 + '0';
String str;
if (STRING_CREATOR_JDK11 != null) {
byte[] bytes = new byte[10];
bytes[0] = (byte) y0;
bytes[1] = (byte) y1;
bytes[2] = (byte) y2;
bytes[3] = (byte) y3;
bytes[4] = '-';
bytes[5] = (byte) m0;
bytes[6] = (byte) m1;
bytes[7] = '-';
bytes[8] = (byte) d0;
bytes[9] = (byte) d1;
str = STRING_CREATOR_JDK11.apply(bytes, JDKUtils.LATIN1);
} else {
char[] chars = new char[10];
chars[0] = (char) y1;
chars[1] = (char) y2;
chars[2] = (char) y3;
chars[3] = (char) y4;
chars[4] = '-';
chars[5] = (char) m0;
chars[6] = (char) m1;
chars[7] = '-';
chars[8] = (char) d0;
chars[9] = (char) d1;
if (STRING_CREATOR_JDK8 != null) {
str = STRING_CREATOR_JDK8.apply(chars, Boolean.TRUE);
} else {
str = new String(chars);
}
}
return str;
}
In the preceding examples, according to the JDK version, char[] is directly created in JDK 8, byte[] is directly created in JDK 9, and string objects are constructed by ZeroCopy, thus realizing quick formatting of LocalDate to String, which is faster than using SimpleDateFormat/java.time.DateTimeFormat and other implementations.
static final Field FIELD_STRING_VALUE;
static final long FIELD_STRING_VALUE_OFFSET;
static {
Field field = null;
long fieldOffset = -1;
try {
field = String.class.getDeclaredField("value");
fieldOffset = UnsafeUtils.objectFieldOffset(field);
} catch (Exception ignored) {
FIELD_STRING_ERROR = true;
}
FIELD_STRING_VALUE = field;
FIELD_STRING_VALUE_OFFSET = fieldOffset;
}
public static char[] getCharArray(String str) {
if (!FIELD_STRING_ERROR) {
try {
return (char[]) UnsafeUtils.UNSAFE.getObject(
str,
FIELD_STRING_VALUE_OFFSET
);
} catch (Exception ignored) {
FIELD_STRING_ERROR = true;
}
}
return str.toCharArray();
}
We need to construct the following function:
ToIntFunction<String> stringCoder = (String str) -> str.coder();
Function<String, byte[]> stringValue = (String str) -> str.value();
However, since the String.coder and value methods are not public (similar to 4.2), they need to be constructed by TRUSTED MethodHandles.Lookup, as shown below:
import com.alibaba.fastjson2.util.JDKUtils;
import static java.lang.invoke.MethodType.methodType;
MethodHandles.Lookup lookup = JDKUtils.trustedLookup(String.class);
MethodHandle coder = lookup.findSpecial(
String.class,
"coder",
methodType(byte.class),
String.class
);
CallSite applyAsInt = LambdaMetafactory.metafactory(
lookup,
"applyAsInt",
methodType(ToIntFunction.class),
methodType(int.class, Object.class),
coder,
MethodType.methodType(byte.class, String.class)
);
ToIntFunction<String> STRING_CODER
= (ToIntFunction<String>) applyAsInt.getTarget().invokeExact();
MethodHandle value = lookup.findSpecial(
String.class,
"value",
methodType(byte[].class),
String.class
);
CallSite apply = LambdaMetafactory.metafactory(
lookup,
"apply",
methodType(Function.class),
methodType(Object.class, Object.class),
value,
methodType(byte[].class, String.class)
);
Function<String, byte[]> STRING_VALUE
= (Function<String, byte[]>) apply.getTarget().invokeExact();
static Byte LATIN1 = 0;
static ToIntFunction<String> STRING_CODER = ...
static Function<String, byte[]> STRING_VALUE ...
byte[] buf = ...;
int off;
void writeString(string str) {
if (STRING_CODER != null && STRING_VALUE != null) {
// improved for JDK 9 LATIN1
int coder = stringCoder.apply(str);
if (coder == LATIN1) {
// str.getBytes(0, str.length, buf, off);
byte[] value = STRING_VALUE.apply(str);
System.arrayCopy(value, 0, buf, off, value.length);
return;
}
}
// normal logic
}
String has a Deprecated getBytes method. When there are non-LATIN characters, the result is incorrect. However, when the coder is LATIN1, it can be used to directly copy the value.
class String {
@Deprecated
public void getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) {
int j = dstBegin;
int n = srcEnd;
int i = srcBegin;
char[] val = value; /* avoid getfield opcode */
while (i < n) {
dst[j++] = (byte)val[i++];
}
}
}
static Byte LATIN1 = 0;
static ToIntFunction<String> STRING_CODER = ...
byte[] buf = ...;
int off;
void writeString(string str) {
if (STRING_CODER != null) {
// improved for JDK 9 LATIN1
int coder = STRING_CODER.apply(str);
if (coder == LATIN1) {
str.getBytes(0, str.length, buf, off);
return;
}
}
// normal logic
}
The FASTJSON2 project uses the technique where JDKUtils and UnsafeUtils have implemented the technique.
These techniques are not recommended for beginners. You need to know the principle before using it.
1,012 posts | 247 followers
FollowAlibaba Clouder - April 19, 2021
Alibaba Cloud Community - July 29, 2024
Alibaba Clouder - April 29, 2020
Alibaba Clouder - September 6, 2021
Changyi - April 14, 2020
OpenAnolis - July 8, 2022
1,012 posts | 247 followers
FollowAn encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn MoreProvides scalable, distributed, and high-performance block storage and object storage services in a software-defined manner.
Learn MoreMore Posts by Alibaba Cloud Community