It's been a crazy time for data breaches and as I wrote yesterday, we've seen a very distinct pattern of historical mega breaches lately. Fling in 2011, LinkedIn in 2012, tumblr in 2013 and the mother of them all, MySpace in, well, we don't quite know. There's been no information forthcoming from anyone about when this breach actually occurred and there's no explicit indicators in the data dump either (sometimes there are timestamps on account creation or website activity). So when did it actually happen? Let's work out.
Firstly, the only data in the breach is an incrementing ID (possibly an internal MySpace identifier which would enable them to date it), an email address, username and one or two passwords. The passwords are stored as SHA1 hashes of the first 10 characters of the password converted to lowercase. That's right, truncated and case insensitive passwords stored without a salt. There are likely some interesting insights to take away from the passwords alone, but it's the email addresses that can help us actually date the thing.
When we look at the top 3 email address in the MySpace breach by domain, we see an interesting distribution:
What's up with Gmail?! Here we have the world's largest provider of email addresses and it has only a fifth the prevalence of Yahoo addresses. Think of the email account distribution like this:
- For every one Gmail account there are 5 Yahoo accounts
- For every one Gmail account there are 3 Hotmail accounts
But what we need to remember with Gmail is that they're a relatively new player. They entered private beta in April 2004 and didn't hit the mainstream until February 2007. There are still 25M accounts in the MySpace data so the incident certainly happened after that early 2007 time frame (I recall there were a lot of people in the beta, but I doubt it was enough for 25M of them to have MySpace accounts), but how much after?
Going back to another recent large incident, here's how the data on LinkedIn breaks down:
This is obviously a really different split; Gmail is now well and truly out front which is more commensurate with what we'd expect today. Keep in mind that LinkedIn was hacked in May 2012 so now we have a window somewhere between then and 2007. Of course this is all assuming an even distribution of accounts over services at similar times which will never be exactly the case, but at least we're comparing two truly global services that launched at similar times (2002 for LinkedIn and 2003 for MySpace). It would be a reasonable assumption that MySpace was hacked well before LinkedIn was. What we really need though is more data from between 2007 and 2012.
One source of info is to look at the stats comparing mail providers from around the turn of the decade. For example, this Business Insider chart:
Keep in mind that this is a US chart (although arguably MySpace was US-centric), and even if Gmail was at parity with Yahoo back then we'd still expect more Yahoo accounts as the incumbent mail provider; people were creating MySpace accounts before Gmail existed. But check the proportions in April 09:
- There are about 3 Yahoo accounts for every 1 Gmail account
- There's about 1.25 Hotmail accounts for every 1 Gmail account
This data feels too late when we think back to the 1:5 and 1:3 ratios for Yahoo and Hotmail versus Gmail. Let's find some stats on the previous year, for example this cart from TechCrunch in Jan 2009:
Here the ratio of Gmail to Hotmail starts out at about 1:2.5 in Oct 2007 and finishes at about 1:1.5 in Dec 2008. Now again, this is not to say that as of the end of 2008 MySpace would have had 1.5 Hotmail accounts for every 1 Gmail account as they'd already had 6 years of accumulating Hotmail customers and only a couple of years of Gmail customers (at least since they went out of beta). But the patterns do start to fit much better, most notably because Gmail is far enough out of beta to have tens of millions of people use it on MySpace but not far enough along it's growth curve to come anywhere near the Hotmail and especially the Yahoo numbers.
What I really needed though was evidence of people who created accounts around this time frame, so I asked for some help yesterday:
I'm trying to date the MySpace breach. If you have an account you created between about 2006 and 2009, DM me! I'll write about it later.— Troy Hunt (@troyhunt) May 31, 2016
I had a few followers get back to me and I checked their accounts in the data set. One account I was given was by someone who is in the breach stated quite emphatically:
if my memory serves me well, the account must have been created on April 28th, 2007
That was a good baseline, but then there was this:
Account was created around late November - early December 2007
And another with this:
4 November 2007 18:08 CET
(Many people actually kept the original welcome email from MySpace which proved enormously useful in this exercise.)
All of this was pointing to the gut feel I had about the incident not occurring earlier than a 2008 time frame. What I really needed though was a max date, so a point in time after which people registered yet weren't in the data breach.
The oldest account provided by someone who wasn't in the breach was also emphatic about the date:
I have a MySpace account (apparently) that I appear to have created on Dec 26, 2009
That last one used a MySpace prefix on a personal domain for their email address so was obviously taking care to track which accounts were created where.
From about mid-2008, everything started going downhill for MySpace's business. Facebook was on a massive climb and MySpace was losing visitors. However, it wasn't losing registrations and you could speculate that they could have been frozen in time in 2008 then hacked years later and the evidence would still point to the incident occurring 8 years ago. It may be that the incident occurred after 2008, but I doubt it was much later as they were still getting registrations and they would have been very heavily Gmail biased by that time.
One possibility that would explain the ginormous volume of data that was taken (the extracted breach file is 33GB) is an insider threat. Keep in mind that we're talking about a time where MySpace was running into serious trouble and there were a raft of lay-offs, it's highly conceivable that someone literally walked out the door with the data. That's not to say it couldn't have been hacked in a more traditional external-actor sort of way, but the timing is coincidental...
So that's the best estimate I can draw on the evidence here - MySpace was probably hacked in the mid-2008 to early-2009 time frame. The data is now searchable in Have I been pwned (yes, all 359,420,698 unique email addresses) and if you do find yourself in there and know when your account was created, drop a note in the comments below and we might be able to crowd source a more accurate picture of when this event occurred.
Update: Since writing this piece, MySpace have posted a blog about the incident which they've dated as having occurred before 11 June 2013. They're not clear about how much before then the incident occurred and indeed it could have been many years earlier (it seems like they had a major architecture change then which has allowed them to provide some context around the date). Interesting side note: the MySpace blog post specifically names "peace" - the seller on the dark market site - which is very unusual in an announcement like this, particularly given he may just be the seller and not the individual who actually hacked the system in the first place.