Nagios is the 800lb gorilla of open source monitoring software. It's a bit clunky by modern standards but is incredibly powerful and has permeated into a lot of organisations because it's free and extremely extensible. I won't dig into how you configure Nagios in this blog post but suffice to say a lot of the flexibility comes with the ability to create plugins.
So what's a plugin? In their simplest form plugins are small programs that output a return code and some text. They extend the functionality of what Nagios can monitor and over time the format has been adopted by various other monitoring tools. The most basic plugin isn't much harder to write than the typical 'hello world' example that you'll find for almost every language.
Why would you write one? Usually because you want to detect a change in state of something you consider to be important. Common examples might include; is my site up, have I run out of disk space, are my backups working etc.
Disclaimer: I'll be recommending that we cut some corners on the full plugin spec to make things simpler and easier to work with.
What makes a beautiful Nagios plugin?
I think the most beautiful plugins are the ones that work straight out of the box. When I'm browsing Nagios Exchange or searching Github or Google in the hope of finding something pre-written that monitors exactly what I want, I get this overwhelming sense of joy if it just works. It's a similar feeling to finding a pistachio nut in the bag without it's shell on. You've essentially gained something without having to put in any work.
As per the official documentation you can create either binary plugins or scripts.
Plugins can either be compiled binaries (written in C, C++, etc) or executable scripts (shell, Perl, PHP, etc).
In my opinion there is no need to go through the painful process of creating compiled binaries for monitoring when you have cool interpreted languages like Python, Ruby and Powershell. In most cases if I find a plugin written in C or any other languages that requires compilation to binary I'll move on with my search or write my own in an interpreted language.
How can I make my Plugins beautiful?
Firstly, it needs a name. A lot of people seem to prefix their plugins with check_ as per the original batch of plugins created by the Nagios team. This has always seemed a bit superfluous to me. These files are generally put into a folder that denotes they check something so I typically just call mine a name that confers what it monitors. For instance if I create a script that checks if Ebay is up I'll probably just call it ebay.sh.
Does it really matter what it's called? Actually, probably not, because typically you end up grouping plugins together and managing them at a higher level anyway.
Let's say that you want to monitor that a Wordpress blog post is online and can be viewed by everyone. You could write a simple script in any language that pulls down the page, searches for some known text and exits with the correct return code depending on whether it finds the text or not.
Nagios defines the return codes as 0 for Ok or Up, 1 for Warning, 2 for Critical or Down and 3 for Unknown.
In a lot of cases you probably just want to use 0 and 2 as some things either work or they don't. There are monitoring systems that have up to 9 status codes so be thankful that Nagios only has 4 to remember.
The full development documentation can be found here :
Our next job is deciding what language to write our plugin in. If this was a simple Linux check script where I could get away with running a command, grepping the output and cutting it into fields for comparisons I'd be thinking about using a simple Shell script. If I'm checking something on a modern variant of Windows then I'm likely to reach for Powershell.
In this particular case I want to interact with a website which means Python and Ruby are probably my best options. For this example I'll go with Python because it's easier to read.
check_url = 'http://blog.dataloop.io/2013/10/26/notes-from-monitorama-eu/'
html_content = requests.get(check_url).content
if 'I recently attended Monitorama' in html_content:
print "FAIL! Content not found."
except Exception, e:
print "FAIL! %s" % e
If you don't know Python then I'd highly recommend the Codecademy course. For the amount of time it would take you to learn how to setup Nagios you could become a Python coder and use those skills across the board for automation.
So what did the code above do? The top lines import some libraries which give you enough to talk to a web page. In Python the requests library is great for this.
We read the web page content into the variable html_content so you can imagine that variable containing exactly the same text as what you'd see if you right clicked on the page in a browser and clicked 'view page source'.
Then we search for some known text within the page. If it can find that phrase then it's going to exit successfully (with a 0), if not then it's going to fail with an exit status of 2 which we can then alert off.
Here's the equivalent in Ruby which does exactly the same thing as the Python example:
html_content = Net::HTTP.get(URI.parse(check_url))
if html_content.match('I recently attended Monitorama')
Is that script really beautiful? I guess it depends how you look at things. For that script to just work out of the box you need Python installed and nothing else.
To adhere to the Nagios spec it really should have a bunch of arguments like -C and -W for warning and critical thresholds. It also needs a -V and -H which in this case are pretty irrelevant as the code is so small you can see what it does by quickly skimming it, and if it gets any more complicated you could add a couple of lines of comments in the script.
You don't have to provide arguments to a script if they are meaningless so why clutter things up? Do I want an entire screen of code dedicated to setting up command line options in every file? Probably not.
Originally the Nagios plugins were designed to be self contained binaries or scripts that could be used by multiple checks via the use of arguments. For instance check_http in Nagios does the same as the above piece of Python code and you'd just pass in the URL of the blog post and string to check for. Nowadays you can do what those binaries do in a few lines of Python and Ruby, plus you can extend them a lot more easily.
This post shows how you can write a simple check script in a couple of minutes that is self contained and outputs in Nagios plugin format. We've done the absolute minimum required in order to tell if something is Up or Down and stopped at that point.
In a future post I'll cover off how we can slightly extend the text part of the Nagios plugin output with some performance data. For instance we may be interested in how fast our wordpress page loaded, and we may want to record that in a graph over time (in addition to getting alerted when the page goes offline).
At Dataloop.IO we are constantly trying to simplify monitoring. We put an emphasis on working out what to monitor and how to do it in the shortest time possible. We're advocates of slick drag and drop interfaces, copying and pasting scripts, rapid script deployment and online debugging. If you're interested in learning a bit more about how we can save you a ton of time managing your monitoring solution then sign up at www.dataloop.io
Edit : The follow-up is now posted at http://blog.dataloop.io/2013/10/28/fancy-graphs-from-nagios-plugins/