This is the first installment of our 3-part "Monitoring Kafka" series. It provides a Kafka overview and discusses how to monitor it using JConsole and Prometheus. Part 2 shows how to monitor Kafka using Outlyer, and Part 3 discusses the important metrics you should be aware of to ensure your Kafka cluster is working properly.
Back in December 2014, we had a team Christmas hackathon and I decided I wanted to make Java Monitoring really simple via our agent and integration plugins. At that time we were recommending new users monitor Java services using solutions like Jolokia, listed in this blog we wrote back then. It was frustrating that our users could monitor non-Java services in a few clicks but the moment they wanted to monitor Java we had to get them to install 3rd party agents, and there were so many ways of doing it there was no consistent way we could rely on to make it a one click setup like our non-Java integrations. It broke the whole setup experience.
Tell me if this sounds familiar. Your users are complaining about the performance of a Java application in production, so you take a quick look at CPU and memory usage on the host. Both are fine. You dig a little deeper with tools like ps, nmon, sar, and iostat. Still nothing terribly wrong. With a sinking feeling, you realize the problem lies somewhere inside the JVM. What do you do now?