Garbage Collector Latency

As you know the Garbage Collection shouldn’t happen too soon or too late, because in both cases the performance of the system gets affected. A delay in GC cycles results in memory leakage, whereas too many GC cycles reduce system performance and increase CPU usage. Along with this we also need to understand how much time a GC take to complete. Because many GC algorithms cause stop world events; means the system will not serve any other request and will perform only garbage collection. The more stop world event time, the less system performance.

Now, the question is what metric can show the duration of the garbage collection event?

And the answer is Latency.

Garbage Collector Latency

Definition:

Garbage Collector Latency is defined as the amount of time taken to complete one single Garbage collection event. Example: 1 second

However, garbage collection latency does not suffice for the purpose of a thorough investigation of GC. Hence you also need to look into the below metrics:

Average GC time: It provides the average amount of time spent on all the GCs that occurred in the given timeframe.
Maximum GC time: What is the maximum amount of time spent on a single GC event? Your application may have service level agreements such as “no transaction can run beyond 10 seconds”. In such cases, your maximum GC pause time can’t be running for 10 seconds. Because during GC pauses, the entire JVM freezes – no customer transactions will be processed. So it’s essential to understand the maximum GC pause time.
GC Time Distribution: You should also understand how many GC events are completed within what time range (i.e. within 0 – 1 second, 200 GC events are completed, between 1 – 2 seconds 10 GC events are completed etc.)

Important Points on GC Latency Tuning:

Ultra-low Pauses (<1ms):
- Use Epsilon GC which provides almost no GC cycle and is referred to as JDK’s do-nothing GC. But it is suitable only when you have large heaps and daily restarts of the servers.
Pauses between 1ms-2ms:
- Use ZGC/Shenandoah. It is good for small applications and extra resources.
- ZGC/Shenandoah can also handle up to around 10ms pauses, given sufficient CPU and memory.
Pauses around 100ms with ample resources:
- ZGC/Shenandoah is a good choice.
- In case of limited resources for ~100ms pauses and
Pauses around 100ms with limited resources:
- G1GC or Parallel would be a good choice. Otherwise, CMS might fit the bill.