Troy Hunt: The Capgemini leak of Michael Page data via publicly facing database backup

A couple of weeks ago I wrote about the leak of data from the Red Cross' Blood Service down here in Australia. Many people were shocked that you could have a situation where troves of personal data were obtainable not through any advanced hacking technique, but by merely downloading a database backup from the website it had been published to. It's literally that simple and it's shockingly common.

I've had this blog post in mind for a little while and the basic premise is that as much as we're working through really creative new defences against attacks, we're also still alarmingly bad at the basics. For example, we're doing cool things with security headers like content security policies, but then we've still got SQL injection all over the place. We're making great inroads with getting SSL on all our things (particularly due to the likes of Let's Encrypt and Cloudflare), but then we're publishing our databases to publicly facing websites. And that brings us to Capgemini and Michael Page.

On Sunday 30 (all times by my Australian clock), I was contacted by someone who provided the following screen cap:

Michael Page directory listing of database backups

It was the same individual who located the Red Cross data and the same story in terms of discovery an underlying risk on the server end; publicly exposed website, directory listing enabled, .sql files exposed. This time, the data was identified as belonging to Michael Page, the British-based (yet very global) recruitment firm. Per the directory listing above, he'd identified backups from a variety of different global assets totalling several gigabytes.

He sent over a file indicating it was sourced from the UK as a proof. It was a 362Mb compressed file which extracted out to 4.55GB. Assuming a similar compression ratio, the files in the directory listing above would total well over 30GB of raw data which is a very large set of data to leak publicly.

The file I received included table names indicating that as with the Red Cross, this was the output of mysqldump and in this case it contained table names pointing to Acquia, a hosted Drupal platform. Further info followed by way of screen caps indicating various other fields and data snippets that you'd expect people to provide a recruitment company:

Michael Page fields

This is just one of many tables and there was a degree of sanitisation that nulled out some fields; not every piece of data existed for every record. I'll refer to Michael Page's disclosure a little later on, but what I will say here is that there were over 780k unique email addresses in that one file and plenty of data relating to candidates' jobs such as cover letters relating to their experience.

Moving on, in the early messages I received from the individual, one in particular stuck out:

michaelpage is capgemini

This changed things somewhat because Capgemini is a multinational consulting and outsourcing firm with 180k people across 40 countries. As the messages flowed, the story that unfolded was that whilst it was Michael Page's data, it was Capgemini that had exposed it. Again, the similarities with the Red Cross continue with their data also having been leaked by a partner. Coincidentally, I had a contact within Capgemini so I reached out to him on Monday 31 with the preface of "you're probably about to have a very bad day". It turned out to be more like a bad week as they worked to understand the scope of the leak and remediate the underlying risks.

As with the Red Cross situation, there were numerous failings which led to the exposure of this data. I won't go into those here, some of them are obvious and others are up to Capgemini to choose how transparent they wish to be. Also, as with the Red Cross the individual who reported the leak has deleted the data he obtained and every trace of the backup I had is also gone. Of course, these are the instances of the data we know of but the commitment is the same as the last time: all known copies of the data have been removed.

This obviously came as a shock to Michael Page, but it was also a shock for Capgemini, especially given the nature of some of the organisations they provide services to:

It's a big company and those of us that have spent time in organisations of serious size (particularly those that are globally distributed), understand how there can be pockets that follow, shall we say "quaint" approaches to security. And that's another really important observation in all this: Capgemini has an annual revenue of €12B but it hasn't stopped a series of egregious mistakes being made. Security flaws simply don't discriminate by organisation size.

I've held off posting this story until impacted parties could begin being notified which clearly, has now happened:

@troyhunt @michaelpageuk "You don't have to change your password". Interesting breach notice. pic.twitter.com/YRz1nXt16N
— Lewis Forfar (@lewisforfar) November 10, 2016

There is also now an FAQ by Michael Page.

A final comment on these incidents: this is a perfect illustration of where companies need bug bounties. These were such low-hanging vulnerabilities that had there been even the slightest inkling of incentivisation, they would have been found very quickly and reported ethically via a channel that researches could trust. Check out Bugcrowd as a way of managing the entire process and look at a case like their bounty program with Tesla. Ask yourself this: would these incidents be making news if they had people looking for these risks early on? I highly doubt it.

The Capgemini leak of Michael Page data via publicly facing database backup

Troy Hunt

Upcoming Events

Must Read