Quantifying the Staggering Cost of IT Outages

This post is from Splunk, a Priceonomics Data Studio customer. Does your company have interesting data? Become a Priceonomics customer.

***

Notable technology investor Elad Gil recently observed that online markets are about ten times larger than they used to be ten to fifteen years ago. For companies this is mostly good news; you gain more customers and revenue than was even conceivable a decade ago.

But with great opportunity comes great opportunity costs. For companies with a substantial online presence, every moment of downtime means a staggering loss of revenue, customer goodwill, and expense to fix the issue.

While a decade or two ago, a server outage was a minor inconvenience, today it can mean a material loss of revenue. But just how much does it cost when there is a data center outage, a network crash, or mysterious technical outage? We analyzed the data to look at trends in the cost of data center outages.

As one would expect, the cost of technical outages has skyrocketed in recent years. What’s driving the growth in expense, however, isn’t that they are expensive to remedy. The cost driver is that when your product or service is down, you’re losing out on a lot of money. And while the costs of outages are higher than they’ve ever been, thankfully the tools to prevent or mitigate their impacts are also increasingly available.

***

To quantify the cost of downtime, data center company Vertiv has periodically commissioned the research firm Ponemon to calculate direct and indirect costs of an outage. The study was completed in 2010, 2013, and 2016, and used an activity-based costing method. In 2016, the study was based on 63 firms that experienced an outage that year.

The chart below shows the cost per minute of a data center outage:

In 2016, every single minute of server downtime cost nearly nine thousand dollars. Put differently, an hour of downtime costs over half a million dollars. In 2010, the cost per minute of downtime exceeded a very substantial five thousand dollars and by 2016 that figure increased nearly 60%.

Outages, unfortunately, rarely last only a minute. The following chart shows the average cost of an outage by year:

In 2016, the average outage costs nearly three quarters of a million dollars, an increase of nearly 50% from 2010 when it still cost almost half a million dollars. Given that each minute of downtime cost nearly $9,000, that implies the typical downtime is approximately an hour and twenty minutes. While a relatively short period, the cost (and stress for the IT department) of an outage is simply staggering.

In another report, the Uptime Institute examines the range of severity of different outage scenarios. In a 2018 survey of nearly three hundred firms with downtime incidents, the following was the distribution of the cost of the outages:

While the majority of incidents cost less than one million dollars, approximately 15% of the outages in the survey cost over a million dollars. In one case the cost of the outage exceeded $50MM dollars! As online markets grow larger, the downside risk from an outage is becoming increasingly uncapped.

What’s driving the exorbitant cost of an outage? Let’s break the overall cost into its subcomponents to better understand the cost drivers:

By far, the largest costs associated with outages are business disruption and lost revenue which combined make up over 60% of the total cost of an outage. It’s worth noting that these are opportunity costs from the outage, rather than direct costs associated with fixing it. In fact, each of the largest four cost buckets could be classified as opportunity costs and together they comprise 90% of the total cost of an outage.

In the grand scheme of things, fixing an outage is relatively cheap, but having your product go down is extremely expensive. New customers can’t sign up, existing customers aren’t being serviced, and your staff may have their daily activities brought to a standstill.

Of these cost categories, which ones are increasing the fastest? The following chart shows the growth rate of each cost bucket between 2010 and 2016:

The “opportunity cost” of downtime categories are not only the largest, but are growing the fastest. By a significant margin, the fastest growing cost category of downtime is lost revenue.

Why are people losing so much more revenue from being down than ever before? Simply put, more commerce and operations are completely online. Consider the case study of Microsoft. A decade ago, if their data centers went down, it wouldn’t affect selling copies of Microsoft Office or Windows which were distributed and operated offline. Today, much of their revenue comes from online subscriptions to their products or access to data center products. Downtime today would catastrophically affect revenue in a way that couldn’t be imagined 10 or 20 years ago.

While some industries are more inoculated from the risk of downtime, today others operate entirely online. The next chart shows the average cost of downtime by industry (keep in mind the sample size of this analysis is only 63 firms, so more limited conclusions should be drawn from this).

Costs are highest in high-transaction industries like finance and ecommerce where downtime means lost money. For example, when the Visa credit card network had half a day downtime in Europe in 2018, nearly 5.2 million transactions were affected even though 90% of transactions in the region still took place without issue. Or consider the cataclysmic and costly impact of a hospital network outage; last year Sutter Health of California experienced a network outage in all its hospitals. During the outage, patients were turned away and medical providers could not access electronic health records.

***

The biggest cost of downtime isn’t fixing the issue, but rather your opportunity costs that stem from the outage. Lost revenue, productivity, and business disruption costs dwarf all other outage related costs.

The old adage “an ounce of prevention is worth a pound of cure” is especially apropos when it comes to data center outages. Companies can invest in predicting and preventing future systems crashes and save an inordinate amount of money by not incurring the costs of outages.

What are companies to do? One solution championed by Splunk is to deploy “predictive analytics” to mine data for early signals of an outage in order to prevent it from happening. Given the enormous amount of data generated by the modern enterprise, these kinds of solutions use machine learning and artificial intelligence to help humans anticipate future events. Being able to anticipate future events in one’s IT infrastructure has myriad applications from cybersecurity, gaining operational efficiencies, and even preventing future outages.

What other “low hanging fruit” can companies address to prevent future outages? Given that greater than 20% of all outages are a result of human error, investing in training and systems to prevent those kinds of errors can provide immediate dividends.

Furthermore, given that hardware failures and natural disasters can put your best-laid plans to waste, setting up a well-rehearsed recovery plan is absolutely necessary. After all, if the cost of downtime is about $9,000 a minute and rising, it’s best to have the shortest downtime possible.

And finally, if the biggest technology companies in the world occasionally suffer outages on their biggest days of the year, it could happen to anyone. Anything you can do to get ahead of the issues before outages take place will save you massive amounts of lost revenue and productivity.

Splunk Inc. turns machine data into answers with the leading platform to tackle the toughest security and IT challenges. See how companies can improve network reliability and prevent outages with the Power of Predictive IT.

Quantifying the Staggering Cost of IT Outages

Team Recurrency

Published June 4, 2019 by Team Recurrency

Request a Demo