Introduction to Monitoring Java Applications

Posted by : Todd Radel | Java, APM, JMX

Tell me if this sounds familiar. Your users are complaining about the performance of a Java application in production, so you take a quick look at CPU and memory usage on the host. Both are fine. You dig a little deeper with tools like ps, nmon, sar, and iostat. Still nothing terribly wrong. With a sinking feeling, you realize the problem lies somewhere inside the JVM. What do you do now? 

Basically, you have two options.

Option one is to get out the big hammer and install an application performance management (APM) tool. APM tools can instrument your application down to individual lines of code. They can find slow code, memory leaks, slow SQL queries, and other performance gremlins. But APM tools are often expensive, and require you to install an agent into the JVM and restart it — thus destroying any evidence of the problem until it happens again. And that's before we consider the RAM, storage, and bandwidth requirements these products often have.

Option two is to use free tools that you may already have, like JConsole or VisualVM. Either of these can connect to a running JVM and examine performance metrics exposed by Java Management Extension (JMX) objects called MBeans.

Here’s a chart comparing the various options:

  Advantages Disadvantages
APM Tools
  • Identifies slow code, memory leaks, slow SQL queries, etc.
  • Very little configuration required
  • Requires a JVM restart (useless if your JVM is already in trouble)
  • Agent must be installed into JVM
  • High overhead
  • Expensive (or free and feature-limited)
JMX Tools
  • Lightweight solution to identify most common performance problems
  • Not as comprehensive as APM tools
  • Requires some manual configuration

There is also a middle way, using a low-cost tool like Outlyer to do most of the JMX heavy lifting for you, but more on that later.

If your back is up against the wall, I'd suggest trying JMX first. If JMX tools don't give you enough detail to figure out the root cause, then you can investigate more heavyweight APM tools.

Screen Shot 2017-11-16 at 12.04.49 PM.png

So once you’ve followed one of the many tutorials on how to launch JConsole or VisualVM and connect it to a running JVM -- including one we posted a while back -- what should you look for?

I’m glad you asked.

Key Java Performance Metrics

In this section I’ll explain some of the most critical metrics related to Java performance. I’ll explain where you can find them with an MBean browser (like JConsole), why they’re important, and what danger signs to look for.

Heap Memory

Heap memory is the space where Java stores objects. Everything from strings to the most complex data structures is stored here. The JVM has a set of very efficient routines to allocate, free, and manage this memory, but guess what? It’s not perfect and memory is not unlimited.

You can’t tell what Java is doing in the heap with your usual command-line tools like ps. That’s because Java asks the OS for heap memory on startup, and manages within that space itself. You need to use tools that look inside the JVM.

Once you do, you’ll find a wealth of metrics. Here are some of the most important:

Bean Name Attribute Name Description Name

java.lang:type=Memory

HeapMemoryUsage. max

The maximum amount of memory your JVM can request from the OS, in bytes

HeapMemoryUsage. committed

Memory the JVM has actually requested from the OS so far, bytes

HeapMemoryUsage. used

Bytes of memory actually in use

If HeapMemoryUsage.used approaches HeapMemoryUsage.max, the JVM is no longer able to allocate memory. The log file will fill up with java.lang.OutOfMemoryError messages and the JVM will likely crash.

But your problems will actually start long before this. Once the heap usage (the percentage of “used” divided by “max”) exceeds 90%, the JVM will start to frantically throw out old objects and compact the remaining memory. This process is called garbage collection and it can have a terrible effect on performance. More on this below.

If your application is running out of heap, and you have more RAM available on the host, try giving more memory to the JVM. Here’s a calculator that shows you what command line options to add to your startup script.

Non-Heap Memory

Besides data objects, the JVM needs to store the loaded bytecode itself, as well as internal data structures. These are stored in an area called non-heap memory, and like the heap area, the size of the non-heap area is fixed at startup. If you have a large application with many classes or methods, it’s possible you will need to increase the size of this area. Here are some metrics to watch:

Bean Name Attribute Name Description Name

java.lang:type=Memory

NonHeapMemoryUsage. max

Maximum size of the non-heap area

NonHeapMemoryUsage. used

Non-heap bytes in use

If you need to resize the non-heap area, the JVM memory calculator I linked earlier is a good reference.

Garbage Collection

As your JVM runs, there are background processes to sweep up old objects and compact the remaining free space in a process known as garbage collection. Heap memory gets fragmented over time, and garbage collection is like defragmenting a hard drive: it moves the empty spaces together so the JVM can more efficiently allocate large blocks of memory.

How does the JVM know that an object can be swept up and disposed of? Java uses something called reference counters. Whenever code references an object, the JVM increments a counter for that object. Once the code finishes executing, the counter is decremented. When the counter reaches zero, the object is marked for garbage collection, and the collector will sweep it up and dispose of it on its next pass.

There are actually many types of garbage collectors available, depending on the version of Java you run. In the end, they all boil down to two types: minor GC, which runs in the background, frees up unreferenced objects, and major GC, which compacts and reorganizes the heap.

Minor GC is generally not a concern, as it runs in a background thread and doesn’t affect the application at all. Major GC is much more of a problem, as it’s what we call a “stop-the-world” event. All other threads are frozen until a major GC cycle completes. The application completely stops responding to requests. This isn’t much of a problem if the GC cycle takes a few milliseconds. But if it takes much longer, or if the JVM is running major GC cycles constantly, your application will slow to a crawl.

Luckily each collector provides metrics you can watch to tell you what the garbage collectors are doing.

Bean Name Attribute Name Description Name

java.lang:type= GarbageCollector

CollectionCount

Total number of GC cycles run by this collector

CollectionTime

Total time spent, in milliseconds

Again, you mostly need to pay attention to major GC. I usually advise people that more than one or two cycles per minute, or more than 5000 ms in a single minute, means your JVM is seriously unhealthy. Usually this points to a code problem, like a memory leak, or too many objects being created and destroyed when they should be held around for a while.

Threads

Within the JVM, there are lots of threads running various tasks. You’ve already seen that the garbage collectors run in their own threads. Applications will often create thread pools to handle incoming HTTP requests or long-running tasks.

If there are too many threads running, the JVM can spend so much time switching control between threads (in a process called context switching) that the actual application code doesn’t get enough CPU time.

To guard against this, watch the following metrics:

Bean Name Attribute Name Description Name

java.lang:type=
Threading

ThreadCount

Number of threads currently running

PeakThreadCount

Highest thread count recorded since JVM start

TotalStarted
ThreadCount

Total number of threads created since JVM start

If you see TotalStartedThreadCount climbing rapidly, it may indicate that the application is creating and destroying too many threads. If the CPU usage is also high, you may want to talk to your developers about reducing the number of threads they create.

Application-Specific Metrics

That takes care of some of the most critical JVM metrics. Obviously there are many more than I can cover here, but in my experience, those are the primary sentinels that will warn you when your JVM is in danger of falling over.

But besides these, your application itself (or the container it runs in, such as Tomcat or Weblogic) can also publish JMX metrics. And most of them do!

How do you find out what metrics are available? Fire up a tool like JConsole or VisualVM, attach it to your JVM, and browse the MBeans that are available. Chances are you’ll find some pretty juicy stuff in there.

Here are some hints.

Database Connections

Many application servers provide connection pooling for JDBC databases. Usually they will also expose some MBeans so you can monitor the number of connections in use. Look around for a bean with a type named ConnectionPool or DataSource, or similar.

Request Metrics

Web containers like Tomcat will provide lots of metrics about incoming HTTP requests: how many concurrent connections there are, how long each request takes to process, how many bytes have been received and sent. Request processing time is especially important because it tells you how long users are waiting for a page to load or a request to complete.

Our Tomcat monitoring pack keeps an eye on all of these, plus cache hit ratios and more.

Now that you understand the basics, you’ll be happy to hear we took care of most of the hard work for you.

The Outlyer Java Pack

Out of the box, Outlyer provides a Java Pack that monitors these key metrics and a few others besides. It also calculates some synthetic metrics for you -- grouping various GC events into “major” and “minor” garbage collections, for example, and calculating the percentage of heap used.

The pack provides a dashboard that shows the most critical metrics of your JVMs:

Outlyer Java Dashboard
Java Dashboard in Outlyer

The pack also sets up an alert based on some key performance metrics. If any of these criteria are violated, you’ll get an email. Of course you can also set up any of the usual webhooks to post a message to Slack, open an incident with PagerDuty, etc. Here are the standard criteria for the alert:

Java Alert Criteria.png
Java Alert Criteria in Outlyer

Why did I choose to alert on these particular metrics? Because, as I explained above in the section on garbage collection,if your heap usage is over 90% or you’re spending more than 5 seconds every minute doing major garbage collections, you’re going to be in for a bad time.

You're gonna have a bad time

Of course, as with any pack provided by Outlyer, you are free to edit and customize it however you want. You can edit the plugin script to gather additional metrics beyond the defaults, build new dashboards, or set up your own alerts.

Installing the Pack

Installing the Java monitoring pack is super easy:

  1. Log into Outlyer.
  2. On the left navigation menu, click "Setup Monitoring", then "Pack Library".
  3. Find the Java pack and click the green Install button.

Setting up the Pack

Next, you have to tell Outlyer to run the pack on the hosts that have JVM's.

  1. On the left nav menu, click "Setup Monitoring", then "Plugins List".
  2. Click on the plugin named java.py.
  3. Flip to the second tab, named "Details & Edit".
  4. Use the "+" buttons at the bottom to assign the plugin to run based on hostname or tags. (Tags are preferred.)

Configuring the Pack

Finally, you need to tell the pack how to connect to your JVM's.

  1. Assuming you're still looking at the "Details & Edit" tab, flip back to the "Code Editor" tab.
  2. Take a look at the script, specifically lines 22-52. Don't be afraid if you're not familiar with Python. Everything has comments for you.
  3. Edit the variables to correspond to your JVM. For example, if you have set up the JMX listener on port 8888, then change the value of JMX_PORTS to (8888).
  4. At the bottom right corner, select one of your agents and click "Run Plugin" to make sure the plugin works. 
  5. Finally, make sure you click "Save Changes" before leaving the screen.

Here's what all of the configuration parameters mean:

Name Meaning
JMX_HOST Name of the host where the JVM is running. If the agent is installed on the JVM host, you should leave this as "localhost".
JMX_PORTS List of port numbers to probe for a JMX connection. If your JMX listener is not configured on one of these ports, replace the contents of the list with your selected port.
JMX_USERNAME Some JMX connections require authentication. If so, specify the username here. Otherwise, leave it blank.
JMX_PASSWORD Password for authenticated JMX connection. Leave blank if authentication is not needed.
JAVA_BIN Path to the java executable on your system. The default is to search the PATH for the system JVM.
SAMPLE_TIME In order to calculate rate metrics like "GC events per minute", we take two samples a few seconds apart. Increasing the sample time might make the calculations more accurate, but will cause the plugin to take a few extra seconds to run.
JMX_URL The full connection URL. Normally you should not need to change this, as we construct it from the other parameters above.
EXTRA_METRICS

A list of extra MBeans to query besides the standard JVM health metrics. Each entry should be in a specific format like this:

'outlyer_metric_name=mbean_name/attribute_name'

Example:

'tomcat.threads.8080.count=Catalina:type=ThreadPool,name="http-apr-8080"/currentThreadCount'

That takes care of basic JVM health and hygiene. What if you want to capture some metrics that are more specific to your application, like database usage, or even some business KPI’s to impress the bosses?

Capturing Application-Specific Metrics

First of all, use a tool like JConsole to see what MBeans are available. Your application developers might have chosen to expose some application-specific MBeans of their own.

Once you've found the metrics you want, copy the ObjectName of the MBean and paste it into the java.py script in the EXTRA_METRICS section, and give your metric a name.

 

For example, to capture request processing time, you would add this to EXTRA_METRICS:

'tomcat.request_time=Catalina:type=GlobalRequestProcessor,name="http-bio-8080"/processingTime'

Of course, once you have the plugin capturing your metrics, you can then create custom dashboards to show them off, or alerts that fire when the values deviate from expected norms. More on that in a future post.

Conclusion

JMX gives us an easy way to peer inside a running JVM and assess its health without spending a fortune on APM tools. There's a whole wealth of metrics waiting to be discovered.

One final bit of advice, if I may: set up JMX in your containers now and poke around. You'll be able to see what normal values look like, which will help you spot the warning signs as soon as things deviate from normal. Good luck!

 

Go Beyond Cloud Monitoring

Scale your infrastructure monitoring for Cloud, SaaS, Microservices and IoT-ready deployments with Outlyer. We’ve designed a solution for the agile organization. Start now and see your operational metrics in minutes.

Get Started for FREE