Another technology blog...

The commoditisation of the coder

Friday, January 29, 2010

image I love a cold beer. Not just because it’s refreshing and makes me worry less about the world’s problems, but also because of beer’s fungibility. Let me explain; I can go down to the store and buy a beer and it’s pretty much the same as any other beer I might purchase elsewhere. Sure, there are different standards of beer and I’m going to pay a few dollars more for my favourite Little Creatures than I am for VB but in essence I’m still getting hops and yeast with some water.

The point about fungibility is that beer is the same basic product no matter which one it is you’re drinking. In short, beer is a commodity.

Coder commoditisation

Commoditisation occurs once people start believing that a good, in the case of this post a coder, can be acquired without qualitative differentiation. In simple terms it’s the mindset that CoderA == CoderB == CoderC. There are obviously cases where this is true but the commoditisation I’m referring to is the broad-brush assumption that software development is a consistently repeatable process with the same end result from the same amount of effort regardless of the person at the keyboard.

You can spot coder commoditisation in action when you see comments like this:

Don’t worry, we’ll just get unknown CoderB to step in while CoderA is away.

Or the classic financial argument:

Hey, we could get unknown CoderB for 25% less than what CoderA costs!

These are commodity based statements in that they assume CoderA can be mutually substituted by CoderB whilst achieving the same results in the absence of quantifiable knowledge of their skill level.

Building software is simple - if you know where to look

I don’t actually believe that software development is a complex process. Very tricky problems can be solved by smart coders with such ease and grace that the practitioner barely raises a sweat. They know the shortcuts, the pitfalls, the well trodden paths to coding success and when they tie it all together the process becomes simple, elegant and most likely successful.

The key observation here though is that this person is clear about what they’re setting out to do. They have the pieces of the puzzle and a mental image of what it should look like once they put it all together. What’s more, they’ve successfully performed this process many times before. These are the people who take something complex and make it look easy.

Compare this to the coder who simply cannot put the pieces together. Either that or the effort it takes them is substantially different. There are numerous potential causes of this; they may be poor at problem solving, unable to ask the rights questions to understand the requirements, have a poor grasp of the technology or possibly they’re just lazy!

Oils Ain’t oils

Just like engine lubricants, when it comes to coders Oils Ain’t Oils. Castrol would have you believe that not all oils are created equal and the same is true of the coder. When it comes to coding we’re talking about a process which has a huge number of variables in terms of how software is built. The process is then performed by people from extremely diverse backgrounds in terms of culture, education and experience.

One thing that separates software development from many other industries is that people become coders by following very diverse paths. If you want to become a doctor, you go to medical school. If you want to become a lawyer, you go to law school. If you want to become a coder, you start writing software in your basement when you’re 15. Or you start doing it as the “shadow IT” guy alongside the reason you’re actually at your place of work. Or you possibly go to university but often it’s to study engineering. It’s because of this people diversity that we simply cannot expect consistency in either effort or output from the process of writing software.

Quite frankly, some people are just very bad coders. Certainly underperformance is not unique to the software world, you could be a bad chef or a bad cleaner but the difference is that poor quality work is immediately apparent whereas the handiwork of a bad coder often does not become obvious until long after the work is done. Oftentimes the coder has the advantage of relatively anonymous work. I know if my steak is overcooked and I know if the kitchen in the office hasn’t been cleaned but quite frankly I have no idea what happens to my internet banking credentials after I hit the submit button. Unlike the chef or the cleaner, the coder’s work is not usually unmasked as sub-standard unless a peer of equal or greater knowledge is exposed directly to it.

image2

Offshoring

One area ripe for commoditisation is outsourcing to low cost markets. A key rationale for sending work offshore is that resource costs are lower. Why pay $100/h in the Western World when you can send the work to an emerging market at a fraction of the cost? Once again though, this assumes the people can be commoditised based purely on an hourly rate. What this rationale often does not consider is the quality of the resource.

I’m not saying resources in your typical offshoring locations are necessarily inferior. The commoditisation argument would be the same if the tables were turned; without sufficient due diligence to quantify the capabilities of the resources you’re still making the assumption that writing code is a mutually exchangeable activity regardless of the individual.

Obviously the “win-win” situation is to leverage high quality, low cost resources but this only happens when commoditisation is not the sole focus and an understanding the coder’s competence is gained. Only then can the decision be made holistically based on both competence and cost but without both pieces of information only part of the picture is clear.

The Mythical Man Month

So why not worry less about putting high quality coders on a project and just focus on cost? Why not grab twice as many people for possibly a third of the price from a low cost market and not even bother with all the rigmarole of attempting to understand their capabilities? The answer is described in Fred Brooks’ The Mythical Man Month through Brooke’s Law:

Adding manpower to a late software project makes it later

Brooks describes his point through this now popularised saying:

Nine women can't make a baby in one month

In The Mythical Man Month, the author talks about how an increasing number of people in a project increases not only the total amount of project familiarisation which is required (which is linear as numbers increase), but more importantly how much extra communication is required which increases exponentially with greater numbers. Joel Spolsky illustrated this recently in his column about A Little Less Conversation where he demonstrates how connections increase in relation to people. More connections mean exponentially more communication which means less output.

People Connections
1 0
2 1
3 3
4 6
5 10
6 15
7 21
8 28
9 36
10 45

So by attempting to overcome inadequacies in skill level by doubling the number of coders on a project from say, three up to six, you’re potentially introducing five times as much communication. This level of non-productiveness will very quickly erode cost savings. And you’re still left with an application which although functional, will likely suffer from quality related issues over the longer term.

Commoditisation fallout

Assuming coder commoditisation has occurred and the wrong people are left to run wild at the keyboard, there are a number of ways things can rapidly go downhill:

  1. Required effort to finish tasks becomes high. Seemingly simple jobs become a chore and durations rapidly blow out well beyond what’s required by the competent coder.
  2. Sustain costs increase either due to buggy software or high degrees of effort required for future changes.
  3. Customer satisfaction decreases as durations increase and confidence wanes when expectations are not met.

All of these can be damaging for the project and for the organisation as a whole. It can also be extremely difficult to manage the wrong people out of the roles they’re in.

Coders are not beer

Coders are not very fungible and treating them as interchangeable commodities is simply a recipe for failure. The mindset that units of work can be defined and quantified then assigned to any coder and still get a consistent result is borne of a misunderstanding of what the software development process consists of. Here’s what needs to be understood if we’re to avoid commoditised disappointment:

  1. Coders come from a very broad range of backgrounds and have wildly varying skill levels.
  2. This varying skill level can have a major impact on both the duration of development and the ongoing sustainability of the software.
  3. Merely throwing a greater number of less skilled individuals at a problem will not necessarily solve it faster or more cost effectively.
  4. Software development is a profession requiring skilled practitioners to produce a quality product.

Ignore these at your own peril; I’m off for a beer!


Share/Save/Bookmark

Why ReSharper recommends the “var” keyword in .NET 2.0 projects

Sunday, January 24, 2010

I was a little confused this week as to why ReSharper was recommending using implicitly typed variable declarations in a VS2010 solution targeting .NET 2. Somewhere in my mind I had directly associated the “var” keyword with the release of .NET 3.5 so this looked a little odd to me:

image

image

As it turns out, the var keyword is a feature of the compiler, not the .NET CLR. The same is true for automatic properties and object initialisers. The bottom line is that you can use these features in VS08 or VS2010 and the compiler will happily go along with it and translate the code to .NET 2.0 compatible syntax in the object code.

There’s an excellent post on Shahar Gvirtz's blog where he disassembles code using this syntax in Reflector to reveal plain old .NET 2.0 syntax. So in short, implicit typing is fine for anyone running a recent version of Visual Studio and, as usual, ReSharper is correct in identifying this as an opportunity to polish the code.


Share/Save/Bookmark

SVN “Can’t create directory” Error

Friday, January 15, 2010

Here’s another one of those Subversion idiosyncrasies which threw me the other day and I couldn’t readily find an answer for. When committing a changeset I kept getting the error “Can’t create directory” followed by the the path of the repository on the server then “The system cannot find the path specified”.

image

The first thing to get clear is that this is a Subversion error, it’s not related to the local working directory nor is it related to Tortoise SVN.

imageLooking at the path in the image above, you’ll see it specifies the subdirectory “db\transactions”.  After inspecting the folder structure of the repository, I found the subdirectory was missing. Comparing it to a newly created test repository I found that not only was the "db\transactions” directory missing but so was “db\txn-protorevs”.

I’m not sure how these folders disappeared. I run a robocopy script to backup my repositories to a NAS device and although the source shouldn’t be touched, the timing is rather coincidental. Whatever the cause, the folders disappeared and this was what caused the issue.

The fix

Really, really basic; just manually recreate the folders. They don’t retain any information post-commit, they just need to exist so transactions can be established. Simple as it may be, I found vey few online references to this error and nothing around the fix so hopefully this will save someone a bit of time in the future.

BTW, small sidenote and a quick plug for some very good software; as you’ll see in the images above, this project is called “TotalBabyReport”. Total Baby is an iPhone app which tracks everything your baby does which is pretty handy when you’re sleep deprived and can’t remember what you had for breakfast let alone when the baby last slept. Or ate something. Only thing is it doesn’t do well is report on trends across time such as sleep patterns so I’ve created a little personal app to try and get some baby business intelligence metrics :)


Share/Save/Bookmark

Foolproof personal backups with Mozy

Sunday, January 10, 2010

It starts with that sinking feeling and all sorts of questions running through your head:

When did I last backup?

Did I include everything?

Does the backup actually work?

When did I last try restoring it?

How much unrecoverable data could I have lost?

Losing data can be an absolute stomach churning experience. Those first moments when the penny drops and you realise you’ve got a big, big problem are absolutely nightmarish and without a robust backup strategy it’s just a matter of time until you experience this firsthand.

This last happened to me a couple of years ago when a disk had a catastrophic failure and refused to boot. Slaving the disk off another machine didn’t do much good so it was off the data recovery specialists and let me tell you, this is not cheap and it’s not fast. I used a group in Sydney called Payam Data Recovery and whilst their service was fantastic, I always felt like each conversation was akin to speaking to your doctor and expecting him to tell you you’ve got a terminal illness and that you need to “start making arrangements”.

Anyway, I got to thinking about the whole backup strategy thing again recently after both Jeff Atwood and Phil Haack suffered data loss on their blogs after a server hardware failure. Being pretty popular guys this lead to a lot of feedback and suggestions from the community as to how to both retrieve their data (some very innovative feedback there!) and how to implement a suitable backup strategy in the future. It brought the whole disaster recovery topic back into focus for me so I thought I’d share my approach which personally, I think works pretty well.

Personal, not enterprise

Let me quickly contextualise this; I’m talking about personal backups, not enterprise data which is probably more consistent with Jeff and Phil’s experience above although many of the principals are the same. Let me typify what I mean by personal data:

  1. Documents such as Word, Excel, PDF.
  2. Audio content such as MP3s (could be from iTunes, ripped from CD, etc).
  3. Photos (predominantly JPG and possibly RAW) and video.
  4. Any other personal content you don’t want to lose, which for me, includes things like my Subversion repositories and database backups for personal projects.

Essentially I’m talking about the sort of content most of us accrue during our daily lives as civilians. It’s the same sort of data your parents or kids or cousins accrue regardless of whether they’re techie people or not.

Manual processes will fail; it’s just a question of when

Here’s a pretty typical backup strategy:

  1. Grab the thumb drive / USB hard drive / blank DVD from the cupboard / safe / relative’s house.
  2. Copy and paste the important stuff onto the device.
  3. Return said device to its storage location.

The biggest problem here is that it requires you to be proactive. You have to consciously think “Wow, haven’t done a backup for a while, might be a good idea” and the reality is this never happens as frequently as it needs to and you carry a lot of risk between backups. What’s more, the process will become increasingly lengthy as the data expands. I could already fill nearly 20 DVDs with my data and it would take a long time to copy the data over via USB to an external device. Inevitably, the process deteriorates and becomes suboptimal increasing the risk of losing important data.

Onsite backups are vulnerable

One thing you need to get your head around and I don’t think most people do, is that if your backup strategy doesn’t involve getting your data outside of your house then you need consider what happens in the event of your backups disappearing. This usually happens one of two ways:

  1. You get burgled and the thieves take anything that has a flashing light or looks valuable which might include your NAS device, your backup server or your carefully hidden thumb drive.
  2. You literally experience a disaster such as flood or fire. 11 months ago we had absolutely catastrophic fires in Australia with the loss of 173 lives and, I suspect, huge amounts of now unrecoverable data.

The point is that if your data and your backups are anywhere near each other then they are both at risk of the same events causing you to lose it. Irrecoverably.

“Last version” backups can still lead to data loss

A typical use case for restoring from backup is corrupted data. This could occur via program error or malicious activity but the net result is still the same; you need to restore from an uncompromised copy. The problem is, if you’re using one of the typical approaches mentioned above and you’re actually using it frequently, there’s a good chance your corrupted copy has overwritten a previous good version. The bottom line is that without the ability to restore from point in time you are vulnerable if files become corrupted. You could take the approach of continually making new copies, rather than overwriting existing ones, but the feasibility of doing this decreases dramatically with increasing content size and frequency of backups.

You need to be able to get the data back again

In the wake of Jeff and Phil’s data loss I mentioned earlier, Joel Spolsky posted about Let’s stop talking about “backups”. His point was simply that although backups are great, you need to be able to reliably get the data back again. This means you need confidence in the backup process, the integrity of the media it’s stored on, the ability to read the content again at a later date and of course that you’ve actually captured everything you need to restore from square one i.e. new machine with zero data. Entrusting media such as DVDs and thumb drives to reliably hold data without physical degradation for possibly many years is again, a risky proposition.

Mozy backup

image As far as I’m concerned, you’re really only left with one viable choice if your data is important and that’s using an online service such as Mozy. After my own data loss I spent a great deal of time researching suitable backup providers and although there were a few out there at the time, Mozy is the one that really stood out to me. The status quo may have changed by now but certainly Mozy is still doing a great job of managing my backups.

The purpose of Mozy is pretty straight forward:

Mozy is a simple and safe way to back up all the important stuff on your computer. A copy of your data is stored in a secure, remote location for safekeeping, so that in the event of disaster your data is still retrievable.

It sounds easy, and it is. Mozy provides a small client which runs on your machine and allows you to specify backup sets. By default, you get a bunch of common sets such as photos and documents.

image

Alternatively, you can just backup based on file system path as I’ve done (more on that later).

image

Finally, you decide on what sort of schedule you’d like your backups to happen on. By default it will try and do it when the PC is not in use but you can manually define a particular time of day if it suits. I found there was no noticeable performance impact on the PC when it ran (you can actually adjust the balance between backup speed and PC performance if you like) so I just ran with the defaults.

image

That’s it, you’re done! Any content you place in the specified backup sets or file paths will automatically be backed up offsite.

Restoring

imageMozy provides a few different options here, the easiest of which is that your entire backup set gets mapped as another drive you can simply navigate to in Windows Explorer. You see a list of each drive you’ve previously backed up to from which you can drill down to the appropriate path, locate the content you’re looking for then just right click and “Restore”.

That’s it; restoring lost content could not possibly be easier! Obviously there is a bandwidth consideration depending on the volume of content you’re restoring but I’ll touch on that again in a moment.

image

imageBut what if the latest backup was a corrupted version? Not a problem, right click on any folder, choose “Change time” and you can restore from any point in time in the past so you can easily roll back to a known good version. You can access the same functionality by right clicking on the source file and choosing “Restore Previous Version” which then lists all the previous versions you’ve successfully backed up. Although not its primary purpose, this is also a neat little version control feature.

The other online mechanism of restoring files is via a web interface. This is pretty handy if, for whatever reason, you don’t have the Mozy client on a machine and still want to access backups.

All this works great for restoring a reasonable size collection of files, but what if you’re talking about catastrophic loss and the need to restore tens of GB? Mozy provides a sneaker-ware solution involving burning to DVD and FedExing the content to you. Obviously they’re going to charge for this service but its US$30 plus US$0.50 per GB so even if you’re talking 100GB you’re only looking at 80 bucks to get all your data back. That’s a very small fraction of what I paid Payam to recover my data and even then, some of it wasn’t recoverable.

Security

When I talk to people about this backup strategy one of the most common questions is “is it secure?” It’s a bit of a tough one because there’s no simple yes or no answer. Is anything on the net “secure”? Banks get compromised! Mozy reports using 128 bit SSL for the transport layer and 448 bit Blowfish encryption in storage so at least on the surface of it, the encryption appears to be solid.

Just a couple of other thoughts on security though. Firstly, how secure is your data at present? I mean what is the possibility of someone walking out your PC or your backup device under their arm and reading the data? There might be a password on the PC but I’d hasten a guess that’s about the extent of it.

The other consideration is the balance of impact of loss versus impact of disclosure. Unless you’ve got your own Paris Hilton style home video on your machine, I’d personally be far more upset about losing my data altogether than someone getting hold of my photos or even my financial statements. Passwords can be changed; family photos can never be recreated.

Bandwidth

The other obvious question is “What’s it going to do to my bandwidth?” Obviously large backups are going to chew up a lot of upstream capacity. I just came from holidays with over 10GB of photos and video and it took several days for Mozy to work through it all. In terms of the impact of using this bandwidth, I haven’t noticed any adverse behaviour primarily because I have Mozy configured not to run while I’m using it which means a lot of the network utilisation is happening during the night.

The other big factor is how your ISP handles upstream data billing. Providers such as Telstra BigPond count it along with your downstream traffic but someone like TPG, who I use, don’t count any sent traffic. Even if they did, when you’re looking at less than A$70 a month for 160GB of traffic (ok, half of that is off peak), bandwidth is starting to get pretty cheap.

Cost

First, the good news; you can use up to 2GB for free which means you can download the client, test it out, make sure you’re happy with it then make a financial commitment. After that you’re looking at US$4.95 a month for unlimited data which on my cappuccino-meter (i.e. the price of a coffee which you wouldn’t think twice about spending – around A$4), it’s less than a cappuccino and a half a month which I reckon is very, very good value.

Limitations

There’s only gotcha which really causes me any grief and that is the “Home” version (as opposed to the “Business” version which is significantly more expensive) must backup from a local drive. I use a DLINK DNS-323 NAS device to store absolutely everything because it’s easily accessible from any machine, has built in RAID 1 with two mirrored drives and acts as an FTP server. Unfortunately because this drive is considered external there was absolutely no way I could get Mozy to backup from it.

What I’ve ended up doing is installing a spare disk in my primary PC then configuring Robocopy to mirror all the content I actually want on the NAS to this drive. It runs nightly and in some ways makes things a little simpler as I just tell Mozy to backup absolutely everything on the F drive which is the spare disk. It would be nice to be able to avoid this but I can see where they’re coming from in terms of segregating their business and personal offerings.

Alternatives

There are a couple of alternatives to this sort of approach. Apple has the Time Machine which I hear good things about and Microsoft has Windows Home Server which both provide personal backup solutions. I’m not sure how the respective manufacturers recommend taking backups offsite but this to me is an essential requirement if the backups are to be seriously foolproof. In my case, neither solution was suitable (I don’t have a Mac and don’t particularly want to run a server at home), but they might be viable alternatives for some assuming the data can be safely and reliably stored somewhere redundant.

Summary

In reading back through this post, it does appear overtly favourable towards Mozy but I’m honestly just really, really impressed with the product and am not incentivised in any way to write this. I just love the fact that you install it and forget about it and the only time you’ll ever need to actually think about it is when you need it most. It sure makes me sleep a lot better at night and as so long as I myself don’t suffer a catastrophic failure (incidentally, I have a backup for me as well!), I’m confident I’m never going to be experiencing that sinking feeling again.


Share/Save/Bookmark

Disclaimer

Opinions expressed here are my own and may not reflect those of my employer, my colleagues, my mates, my wife and so on and so forth. Unless I’m quoting someone, they’re my own opinions and may not necessarily be cohesive nor entertaining but hey, at least they’re original!