Service outages (2)

Since my previous post (Christmas Eve, no less!) about’s outage, I have been monitoring some bloggers’ and other writers’ responses.

I wouldn’t go as far as Ralph Grabowski and – after the interruptions to TypePad and’s services – conclude that ‘Web 2.0’ is fatally flawed (after all, I could access TypePad and Newsgator via an unfamiliar computer in an rented apartment in central Belfast last week, allowing me to update this blog!). Echoing my own humble opinion, Susan Kuchinskas in an article quotes someone who argues:

"To say that businesses should not rely upon SaaS or other on-demand applications and services because of potential outages is disingenuous at best…. One may as well claim that network or telecom outages are a reason not to use those services for mission-critical business."

In The glitch that stole, Phil Wainewright seems to hit the nail on the head with his suggestion that the incident has more to do with how the ASP responds to the outage (some of his readers are less charitable – jmjames uses the incident as an excuse to rant and damn all ‘third-party vendors’). In his next post, 0.1% downtime is more than 8 hours a year, Phil responds by pointing out that very few IT shops would be capable of delivering 99.9% uptime – a point supported by another Wainewright reader (chrisbaggott):

"The reality is that that 8 hours a year is probably a lot better than most in house systems. You might also want to consider the thoughts on security as well. Who’s more qualified to manage your data securly? "Bob" your tech guy who is also responsible for cleaning spam of the CEO’s laptop? Or a company that manages data full time for a living?"

Similar points are also made in the article:

… the issue is the reliability of the particular application, not of the on-demand model. "When it comes down to it, I’d rather pay $65 a user and get an army of IT people at fixing issues as they come up, [rather] than install Siebel locally and have to pay for that army of IT people myself."

Acknowledging that nearly six hours is a painful outage for anyone, Treb Ryan, CEO of OpSource, said, "’s uptime is way better than any corporate application I know of. How often are even simple apps, such as e-mail or the phone, down in most organizations?"

What are the learning points from the episode? Phil Wainewright offers two cardinal rules for ASPs:

  1. Keep users informed – ASPs should alert customers immediately when there’s an outage and keep in touch with status reports. "Salesnet has four tiers of customers; those at the top can expect hourly calls from account executives and engineers during a ‘code red,’ while the lower tiers can expect e-mails to their administrators."
  2. Be upfront about service levels – ASPs should "spell out to customers the service levels they’ll commit to — and in what circumstances they’ll forfeit penalties, if any."

Finally, in a further posting, Can you trust your service provider?, Wainewright responds to other points from jmjames, providing some additional suggestions for ASPs’ good practice. He gives a five-point code of practice:

  1. Say exactly what the contract does and doesn’t deliver (eg: specify the service levels).
  2. Spell out what to do if something does go wrong (is there another website users can go to for news, or will they receive an email within the hour telling them what’s going on, etc).
  3. Report live service level metrics (can customers view the same dashboard for their services that the provider’s own operations staff get to see?).
  4. Let customers download their data whenever they like ("Nothing else a provider can do offers a better expression of confidence in customer loyalty").
  5. Accept 30 days’ notice of termination at any time, no questions asked.

Permanent link to this article:

1 comment


Comments have been disabled.