Hi-fiving failure and a lesson on how not to treat your users…

I saw today via Techmeme that Twitter was excited and proud that they were able to achieve 97.3% uptime during the Apple WWDC keynote yesterday.

If I were them, I’d be a more humble and a little more circumspect. Reading thought he comments, you’d think they just landed a man on the moon.

First, let me say that under many circumstances achieving 97.3% availability is grounds for termination. Most Enterprise SLAs specify 99.9% or more with service credits applied for failure. Amazon’s SLA provides for 99.9% with 10% credited back if they fall below that and 25% credit below 99%.

Salesforce.com had some serious trouble with availability a while back and people were legitimately wondering if they would survive the crisis. Today, they make a provision for these SLA failure expenses, and so far, have been lucky enough (i.e. smart enough) not to have to make any payments.

Just to put that in perspective, 99.9% uptime translates to about 44 minutes downtime per month (99%, about 440). So, at 97.3% for the (roughly) four hours of peak time usage during the keynote the were unavailable for about 6.5 mins. or nearly 15% of the downtime budget for the month.

This isn’t’ something I would be proud of.

Second, your users are not your QA or test engineering department. They claim:

…we learned a lot during this stress test and that will translate to better performance down the line.

Finally, turning off features to support peak loads is treating the symptoms, not the underlying problem.

Is it any wonder their site is as unreliable as it is? With this kind of attitude, I don’t think things are going to materially change anytime soon.

Category: News
Topics: Events

We're hiring!

Search and find your once in a lifetime opportunity.

View current openings
Contact Us Free Trial