String Deduplication of G1 Garbage Collector in Java 8

saurav omar
3 min readAug 27, 2019

--

It is well known that strings take a lot of memory that’s why Java developers came up a lot of things to optimize usage of memory by String(like intern(), String pool, etc). There are several duplicates strings present in JVM memory.

As per JEP-192

Many large-scale Java applications are currently bottlenecked on memory. Measurements have shown that roughly 25% of the Java heap live data set in these types of applications is consumed by String objects. Further, roughly half of those String objects are duplicates, where duplicates means string1.equals(string2)is true. Having duplicate String objects on the heap is, essentially, just a waste of memory.

Java provides two ways to handle duplicate strings and eliminate memory usage

  • By using String.intern() method
  • By enabling string deduplication (only available with G1 garbage collector)

String Intern:

In String intern concept java check whether a string is present in String Pool or not if it's present that same reference has been replied back otherwise it will create new String in String Pool and then reference of the new string is replied back.

The string is interned when you create String like

  • String str = “abc”;
  • String str = new String(“abc”).intern();

If you create String like

  • String str = new String(“abc”);

above string create two objects one heap (or stub object) which will have a reference in String Pool.

Important Points:

  • Intern method is expensive and slow.
  • String.intern() is a native method, and calling a native method incurs massive overhead.
  • The implementation used a fixed size (default 1009, can be set using -XX:StringTableSize=N) hashtable so as the number entries grew, the performance became O(n).

String Deduplication:

String class keeps string value in char[] array. The char array is not accessible and modified from outside, this means that char array can be used safely by multiple instances of String at the same time.

Deduplication may happen during minor GC, Garbage collector visits String objects and stores the hash value next to a weak reference of the char array. When it detects another String object with the same hash code, it compares two strings char by char. When they match, one String’s char array will be re-assigned to the char array of the second string and the unreferenced char array of the first string becomes available for GC.

As we have seen using String intern is slow as well overhead we can use this feature this is fast as compare to the intern.

Why String Deduplication needed?

Let’s See with an example:

If you have a Class Employee

Class Employee{

Long id;
String firstName;String lastName}
  • You are reading this object info from database or file.
  • In a database or file, you have many employees whose name starts with “RICKY” then it creates the same no of a first name string as many times as its present because it is created by directly from database or file so it does get interned.
  • This waste a lot of memory unnecessary.
  • That’s is why String Deduplication becomes an important part.

Enable String Deduplication:

this only available on Java8 afterward and can be enabled only when you G1 Garbage Collector is used.

Using below flags you can enable Deduplication.

XX:+UseG1GC -XX:+UseStringDeduplication

Important Points:

  • This option is only available from Java 8 Update 20 JDK release.
  • This feature will only work along with the G1 garbage collector.
  • You need to provide both -XX:+UseG1GC and-XX:+StringDeduplication JVM options to enable this feature.
  • To check if it happens in your system you can use -XX:+PrintStringDeduplicationStatistics parameter.
  • You can control this by using -XX:StringDeduplicationAgeThreshold=3 option to change when Strings become eligible for deduplication.

As we have seen enabling how String Deduplication helps us to optimize memory and fast overall process. This is becoming very important when dealing with larger data, I always suggest you before enabling this feature visualize heap dump and then take decision whether you need or not.

That’s it.

Happy Learning.

--

--

saurav omar
saurav omar

Written by saurav omar

Geek and Always ready to give more than 100%

Responses (1)