What enterprises say the CrowdStrike outage really teaches

Misjudging cloud’s impact on application reliability

But what does this all mean with regard to things like cloud usage?

Enterprises said they are looking at their use of the cloud as a means of improving application reliability. In fact, the number who said they believed they’d misjudged the cloud’s value in that area increased from less than 15% before the CrowdStrike event to 35% immediately after it, and to 55% by early August. The biggest factor in that growth was the realization that massive endpoint faults could take down their operation, and no cloud backup would be effective. Enterprises were forced by the fault to examine just how the cloud impacts application reliability.

Let’s say you have a data center application linked to a Windows PC device. Let’s say that each is likely to be down one percent of the time. You want to improve reliability with the addition of a cloud front-end, and let’s say that it’s also down one percent of the time. What’s your reliability? It depends on whether the cloud and data center are able to back each other up. If they can’t, the chances all three will be up is 0.99 cubed, or 97%, which is less than it would have been without the cloud. But, if the cloud and data center can back each other up, then both would have to fail to take your application down. The chances of both cloud and data center failing is 1% times 1% or 0.0001, which is one in ten thousand, and application reliability is improved.

The same thing has to be considered in multi-cloud. Of 110 enterprises who commented on the reliability impact of multi-cloud, 108 said it made applications more reliable. Does it? It depends. If two clouds back each other up, the risk of failure is indeed lower, just like in my cloud/data-center example above. But many enterprises admitted that at least some of their applications needed both clouds because components relied on features specific to each cloud. Now they both need to be up, and so multi-cloud actually reduced reliability!

What this proves is that enterprises may be deluding themselves about the cloud and reliability, overall. The cloud isn’t always going to improve reliability any more than it always lowers costs. There’s no substitute for knowing what you’re doing, especially in the area of managing reliability. Instincts are a poor substitute for a tutorial in probability and statistics.

But let’s go back to my cloud reliability calculation. Yes, the chances of both cloud and data enter failing is one in ten thousand, but the chance of the endpoint failing in that example is one in a hundred. Endpoint risk is clearly more of a problem, so what can enterprises do about it?



Source link