This is a guest post by Grant Smith, author of Next Gen DevOps and former Head of Operations at British Gas's Hive Home service, a Dataloop.IO customer. Grant shares his rationale on why they chose Dataloop.IO for monitoring the Hive Home service.
Dataloop.IO presents a unique opportunity to DevOps teams. Far more than just a monitoring solution, Dataloop.IO allows teams to easily access real-time data from their online services and correlate that with system and business metrics to create meaningful real-time business intelligence.
Online services require different tools
The advent of Cloud bought with it a transition to a new world of DevOps. Previously developers were only responsible for writing software and throwing it over the wall to QA, who were responsible only for testing it before throwing it over the wall to Operations, who were then responsible for deploying and running the software in production.
In this old world of skill-set silos, problems were passed down the chain to the next team, slowing down everyone’s work, and ultimately slowing down new software releases. The good news is everyone has their own motivation for breaking down these silos and moving to DevOps:
Operations engineers hope that by focusing software engineers on deployment, configuration and systems that some of their frustrations will be addressed.
Software engineers hope that by taking a more active role in the management of their services they can make more rapid progress through their stories and provide better solutions.
Test engineers hope that a focus on rapid development and automated deployment leads to a continuous investment in automated testing.
In this new automated, collaborative world of DevOps, online services have seen the iteration of their software dramatically accelerate, with teams being more productive and happier.
However in order to address the new processes and automation needs of DevOps, new tools are required, and fortunately there has been an explosion in tools development. Configuration management and deployment tools were first, and now with the advent of Dataloop.IO, monitoring is next.
Breaking down the silos
In the collaborative world of DevOps, its not only skill-set silos that need to be broken down, but also data silos so that everyone has visibility on how their changes are impacting the service and the business.
Most monitoring tools today focus only on System data. The only problem is that it’s not terribly valuable most of the time. At best it’s circumstantial evidence that hints at the function and performance of applications and allows us to attempt to use infrastructure efficiently. Application function data is often hidden in log files obscured by gigabytes of acceptable exceptions. In recent years it has become available from alternative sources.
Imagine what’s possible when a monitoring, alerting and dashboarding solution has access to real-time application, system performance, sales journey, payment transaction and usage data.
Tools like Ftrace, New Relic and Appdynamics have arisen to give teams insight into the inner workings of their applications. In the past correlating the data provided by these tools with system data has required some development effort on the part of the operations teams.
Real-time user journey data has been available for years but it’s rarely used in conjunction with other data because it’s usually only available in log files, which require time-consuming parsing or in web analytics services like Google or Adobe Analytics.
Payment transaction data usually sits within databases and so is considered the remit of data warehouses and big data however it has a very real real-time value that’s often overlooked – wouldn’t a major drop in transactions signify an issue on the site?
Software engineering teams are usually forced to focus exclusively on the functional capabilities of the product so product performance falls to the operations team. Accessing and processing all this data stored in separate systems and services without tools is difficult and time consuming, particularly for a team constantly context switching.
So it sits in it’s separate systems used only by the teams that have a specific need for it. A multi-discipline team following DevOps principles should have access to all this data so they can make better data-driven decisions, but developing the mechanisms to extract and process it would be prohibitively time consuming with most tools today
With its ability to execute scripts in almost any language and hence access data from almost any source, Dataloop.IO makes it trivial to pull data from multiple sources simultaneously. Correlating, aggregating, summarising and displaying data from multiple data sources such as relational databases, key-value stores, log files as well as SNMP is Dataloop.IO’s coup de grace.
Now a team can make changes to their service and get instant feedback. Did that last update change the product usage pattern? Did system performance change? Did that affect transaction completion rates? Have the number of users changed at any point in the user journey when compared to the stats recorded prior to the change?
Because Dataloop.IO provides a powerful way of quickly correlating real world events with system, application, transaction and traffic data it’s a boon to troubleshooting and managing performance but that is just the tip of the iceberg. The ability to observe usage patterns in real-time and compare that data to data recorded for the same period last week, month or year and compare it to system, application and transaction data for the same periods enables real-time A-B or multivariate testing and decision making throughout the day.
Real-time and evidence-based decision making
Just bringing Dataloop.IO into an organisation that still has separate teams for development, testing and operations still provides significant benefits. It makes it much easier to pull data from multiple different sources and operate on all that data together. A multi-discipline team using Dataloop.IO is capable of making an evolutionary leap to a truly aware product-focussed team capable of rapid decision-making using real-time product information and acting on those decisions just as rapidly.
This is why I brought Dataloop.IO into Connected Homes. The Hive product is provided by the interaction of several different services operating on different systems built by different teams. Presenting all the data we needed to understand the performance of the service as a whole in Graphite wasn’t practically possible within the time we had available.
Dataloop.IO gave me the ability to specify the metrics I wanted and make them available to me within an hour.
Compared to the days it was taking to pull data from all the different sources with custom scripts and open source tools available to us, this helped us rapidly break down the silos and run a better service for our users.
About the author
Grant has created and led high performance Operations teams in some of the largest and fastest growing companies in the UK over the last 15 years and has been at the forefront of the DevOps movement for the last 4 years.
He has delivered game platforms running in the cloud enjoyed by millions of players per day at Electronic Arts and websites serving a billion page views per month at AOL. Most recently he has delivered a high performance, scalable Internet-of-things platform for British Gas’s Hive Home service.
Grant is frequently sought out for his cloud and DevOps expertise and can be reached at firstname.lastname@example.org. More of Grant’s work can be found at www.nextgendevops.com and his new book Next Gen DevOps: Creating the DevOps Organisation will be available on Amazon soon.