Often times in the software industry you’ll hear statements suggesting that the quality of the underlying software code is really not that important and what matters most is immediate term customer satisfaction, primarily achieved by delivering what they asked for within time and budget. This is usually a position put forward by the customer facing folks within the software industry who, in their defence, probably don’t have that strong a grasp on why code quality is important simply because they’re not living and breathing it. So let me give you an example.
Yesterday I was both surprised and delighted to receive the following email from Virgin Blue, our local Australian arm of the Virgin airline empire:
As excited as I was about the prospect of receiving not one, but two personalised luggage tags, I was a little surprised that my meagre two short haul, domestic trips would qualify me to jump a couple of status levels in one go. But hey, who am I to complain?!
If something sounds too good to be true…
A mere two hours later the truth was revealed. I didn’t qualify for an upgrade and I would not be receiving my personalised luggage tags. Not even one of them. As it turns out, the root cause of the problem was due to the day of the week being a Friday and the day of the month being the 13th. Right, nice try.
Digging a little deeper
It turns out someone screwed up big time judging by the breadth of the problem as reported across the Twitersphere. The Australian IT News website reports Virgin Blue "error" upgrades passengers to gold status and explains the issue as follows:
The Velocity website appeared to crash at 6:40PM AEDST Friday, hinting that a large number of customers had visited the site to find out if the upgrade was true.
A call centre representative from Virgin Blue said that the email was “a system error”.
“IT have advised that we do have a system error,” the representative said.
“Right now we do have a lot of phone calls because of this error."
And this brings me right back around to the topic at hand; software quality really is important. The thing about software quality is that often times it only becomes a priority once it’s too late and the pain of financial loss and public humiliation is felt.
Counting the cost
Let me take a stab at how this has hit Virgin Blue in the hip pocket:
- Their website went down so they lost bookings
- Their call centre was tied up dealing with unhappy customers so they lost even more bookings
- IT staff and management were kept preoccupied in “damage control” rather than doing their normal job
- A carefully managed corporate image (and the Virgin brand is very carefully managed), has been tarnished
It’s the last point that really hurts. Virgin has constructed such a carefully honed brand image as an organisation that really relates to people, dangling a carrot in front of their customers then taking it away in the blink of an eye is very, very bad for business.
What went wrong
I can only speculate, I honestly have no inside knowledge of what happened within Virgin’s software yesterday. What I can do is take a pretty educated guess at the conditions which lead to this after having spent a lot of time seeing the inner workings of software products.
Assuming the content of the email was correct in that some customers who would not normally be eligible for Gold but were “close enough”, there is inevitably some conditional logic along the lines of “if customer points balance is > 90% of points required for gold then upgrade”. There may even be a couple of conditions such as “and > 6 flights taken this year” or “and has been a member for > 2 years”. Now imagine a “>” was accidentally replaced with a “<”, or the “and” condition accidentally became an “or” condition. Easy to do and every software developer has done it before (or they lie and say they haven’t). The difference is that this went through to production unchecked and caused Virgin a big headache.
The point is, the conditions required to legitimately qualify for this offer were probably pretty simple and were equally simple to get wrong. The issue also occurred just before 5pm on a Friday night so picture this situation; its been a long, hot week, everyone’s about to head out for beers, the office is emptying and there’s just this one little task that needs to go out the door beforehand…
Of course I’m speculating but without any information from Virgin to the contrary, I think they’re pretty fair assumptions. Regardless, it’s safe to assume we’re talking about a software quality issue here.
Because quality does matters
There are probably a number of different ways this could have been avoided by looking at software from a quality standpoint. Perhaps there were no automated tests for this particular condition. Perhaps an activity with such broad ranging ramifications didn’t alert the operator to the scope of the audience. Perhaps the logic was embedded in a 500 line file and indented through half a dozen “if” statements hence obfuscating its true intent.
Whatever it was, someone skimped on quality and now the true ramifications of this have come to bear. So the next time someone attempts to convince you software quality doesn’t matter, ask them if they ever received their personalised luggage tags. Hopefully the raw emotion and bitterness of their loss will help them see the error of their ways!