profile picture Endowus Tech

Understanding and Resolving OOMKilled errors in JVM microservices

December 14, 2023  ·  By Harry Tran  ·  Filed under backend, devops, jvm

Have you ever faced the dreaded “OOMKilled” (Out Of Memory Killed) message in your Kubernetes environments? If so, you’re not alone!

In this post, we’ll dive into a real-world scenario where some of our backend services faced exactly this challenge. We’ll unpack what happened, how it was investigated, and the lessons learned. We have tried to keep it understandable even if you’re not a JVM or Kubernetes expert. Feedback welcome!

A Quick TL;DR

In essence, our backend services were getting OOMKilled due to exceeding the assigned container memory usage limits. The crux of the issue? A blanket allocation of 80% of container memory to the JVM heap, without adequately considering off-heap memory usage. The solution? More nuanced memory limit settings per backend service based on usage profile and a reduction in JVM heap memory allocation to make room for off-heap usage.

screenshot of grafana dashboard for JVM metrics
A kubernetes pod that's repeatedly OOMKilled.

Our Backend Setup: A Primer

Before diving into the issue, let’s set the stage with an overview of our backend infrastructure.

Our backend comprises nearly three dozen microservices that work together using event sourcing to manage all of the complexity & functionality of our digital platform.

Observations from our Data

As we mentioned at the beginning, many of our backend services were getting terminated regularly by Kubernetes due to exceeding memory limits.

screenshot of grafana dashboard for JVM metrics
Grafana dashboard for JVM metrics of rebalance service.

The Grafana screenshot above shows memory usage metrics for one of our services named rebalance. This service is responsible for rebalancing client portfolios whenever they deviate from their target asset allocation.

This service with a 6GB container was regularly hitting the memory ceiling. You can see it in the screenshot where the committed heap usage kept growing before the service was OOMKilled, as indicated by the drop to zero.

The JVM max heap was configured to 80% of container memory (-XX:MaxRAMPercentage=80) and we could see that the actual heap usage (peaking at 4.37GB before the container was killed), was well within that 80% (4.8GB) limit.

This indicated that approximately 1.63GB (6GB - 4.37GB) was being used off-heap. But how and why?

JVM and Off-Heap Memory Analysis

Our Grafana dashboards which are populated with JVM reported metrics already show “Non Heap” and “Buffer pool” usage in separate visualizations. If you look at the screenshot above once again, you’ll see that about 346MB was used out of about 397MB committed for off-heap usage. Together with another ~630MB used in buffer pool, total reported off-heap memory usage only adds up to 1027MB which is clearly quite a bit lower than the 1.63GB actual usage from container memory metrics!

To get a more detailed breakdown of this usage, we turned to a JVM feature named Native Memory Tracking.

Native Memory Tracking is disabled by default and can be enabled by passing the flag -XX:NativeMemoryTracking=summary. Once enabled, we can get a current snapshot of native memory usage:

$ kubectl exec rebalance-844cf67689-fz6bg -- jcmd 1 VM.native_memory
[... detailed NMT output ...]

We’ve elided the detailed NMT output for brevity but what this detailed breakdown revealed was that Non-Heap pools reported in JVM metrics only contain the basic data, such as codeheap-non-nmethods, codeheap-non-profiled-nmethods, codeheap-profiled-nmethods, compressed-class-space, metaspace, etc. However, they do not include memory used by the garbage collector itself. This additional GC allocation comes up to about 100-150 MB as the committed heap grows, and helps to partially explain the extra off-heap usage.

The other significant usage of memory – around 630MB – was in direct byte buffers. These are typically allocated for more efficient memory management than afforded by the garbage collector. Usage of these buffers to such an extent was surprising since our code didn’t explicitly use them, pointing towards library usage.

Now tuning the off-heap memory used by the G1 GC is not something that we have control over. So at this point, we knew that diving into the rather considerable byte buffer usage would give us the best bang for buck in getting a handle on this OOM issue.

Do ByteBuffers directly cause OOMKilled?

As mentioned earlier, our backend platform comprises nearly three dozen microservices. Repeating the above memory allocation analysis for all these services revealed that every single one of these had a significant memory allocation in byte buffers. Yet, not every service was equally affected by OOMKilled issues.

When we looked at the allocation profile, we saw that the number of Direct ByteBuffers quickly increases from startup and then stabilises after a while. Post startup and during runtime operations, the demand for more memory within the JVM does not originate from byte buffer allocations. Instead, it originates from regular heap allocations. But since not enough real heap memory is available (with byte buffers and other non-heap usage already having consumed more than expected), the JVM gets OOMKilled.

screenshot showing buffer pool memory growth over time
Buffer pool memory growth over time

Thus, whether a service is more or less likely to get OOMKilled is determined by how heap memory is consumed during runtime:

Delving Deeper: ByteBuffer Usage

In order to rein in OOMKilled issues, we really had to understand our byte buffer usage. To do so, we first had to dump the heap of the JVM process for offline analysis.

Taking a heap dump

In a Kubernetes environment, a JVM heap dump can be take quite easily:

$ kubectl exec rebalance-844cf67689-fz6bg -- jmap -dump:live,format=b,file=/tmp/dump.hprof 1
$ kubectl exec rebalance-844cf67689-fz6bg -- tar -czvf /tmp/dump.hprof.tar.gz /tmp/dump.hprof
$ kubectl exec rebalance-844cf67689-fz6bg -- cat /tmp/dump.hprof.tar.gz > heap_dump/rebalance-844cf67689-fz6bg.tar.gz

Using tar to compress the dump file is necessary as the raw heap dump is really large (~500MB) and it gets easily corrupted even with kubectl cp command.

Offline analysis of heap dump

There are many tools that can be used to analyse a heap dump file. We used VisualVM because it supports OQL which makes it easier to search, especially when we already know what we are after.

To query all java.nio.DirectByteBuffer objects:

select map(
   sort(
       filter(
           heap.objects('java.nio.DirectByteBuffer'),
           'count(referees(it)) == 0'
       ),
       'rhs.capacity - lhs.capacity'
   ),
   '{ bb: it, capacity: it.capacity }'
)

The Heap dump analysis revealed three significant usages of byte buffers in our services.

JImage ImageReader

screenshot showing JImage ImageReader DirectByteBuffer allocations.
1 x buffer of size = 161,255,629 bytes = ~153 MB used by jdk.internal.jimage.ImageReader$SharedImageReader

The first significant usage was in JImage ImageReader. This is used by JDK9 onwards to load modules. This is the first byte buffer that’s allocated (DirectByteBuffer#1) suggesting that it is used during the JVM’s initialization phase. Interestingly, this memory usage is captured neither in JVM metrics nor in NMT summary. This leads to further errors in our estimation of heap vs non-heap memory usage and allocation.

There’s not much we can do about this usage other than ensuring that we budget for it correctly.

Akka remote Artery EnvelopeBuffer

screenshot showing akka.remote.artery.EnvelopeBuffer allocations.
akka.remote.artery.EnvelopeBuffer allocations

As we mentioned in the backend setup section, our services use the Akka framework. Within Akka, the actor pattern is used to implement event sourcing and these actors use akka.remote.artery.EnvelopeBuffer to communicate with each other in the cluster. Each of these buffers is of size 30,000,000 bytes or nearly 28 MiB.

There are a couple of Akka configuration settings that affect this buffer usage:

Based on our configuration, max off heap usage could grow as large as 128 * 30,000,000 bytes ~= 3.6GB which would guarantee an OOMKilled event even before we hit the 128 buffer pool size.

Cassandra Datastax driver’s Netty buffers

screenshot showing io.netty:netty-buffer PooledByteBufAllocator allocations.
io.netty:netty-buffer PooledByteBufAllocator allocations

Our primary database is Cassandra and it’s used by Akka via the akka-persistence-cassandra library. Communication with Cassandra is managed by the Datastax Cassandra driver. This driver makes use of io.netty:netty-common FastThreadLocalThread, which in turns uses io.netty:netty-buffer PooledByteBufAllocator. By default, this allocator would create:

Neither us nor the Datastax library was overriding any of these. Thus, this usage would consume 256 MB off-heap upon driver initialization for every service and this too needed to be factored into JVM off-heap memory sizing.

Killing the OOMKilled

Armed with the insights that this analysis revealed, we identified two sets of changes.

The first set of changes were across the board wherein we revisited our approach of allocating 80% of container memory to the heap and made it more nuanced to the nature of each microservice:

Essentially, we kept the overall container memory for each service at the same level but allocated more of it for non-heap usage.

The second set of changes were focussed on tweaking Akka messaging configuration based on each microservice’s specific messaging patterns:

The alert reader may have noticed that the screenshots shared in this post are from May 2023 while this post was published in December 2023. We’ve been progressively rolling out these changes in production over the last several months and tweaking things along the way based on continued observations. We are happy to share that the frequency of these errors has reduced to such an extent that we feel we’ve really killed the OOMKilled, at least for now!

Conclusion: Embracing the Complexity of Memory Management

Monitoring and Analysis: A Necessity

The complex nature of memory usage in JVM-based applications, especially those using frameworks like Akka, necessitates comprehensive monitoring and detailed analysis. Tools like Prometheus and Grafana, coupled with native memory tracking and heap dump analysis, are indispensable.

Memory Allocation Strategy: Rethinking

Our initial approach of allocating 80% of container memory to the JVM heap didn’t consider off-heap usage. The data clearly indicated the need for a more nuanced strategy, tailored to each service’s specific memory usage profile.

One of the key revelations was the extensive use of Direct ByteBuffers by our libraries. These buffers reside outside the JVM heap and were not accounted for in our initial memory allocation strategy.

Key Takeaways and Recommendations

Based on our findings, we recommend:

Through this deep dive into memory management challenges in our backend services, we’ve seen how intricate and crucial proper memory allocation and monitoring are. By understanding the nuances of JVM and container memory usage, and by leveraging the right tools and strategies, we can ensure that our services run reliably, even under heavy load.

Keep exploring, keep optimizing, and may your services never be OOMKilled again!