Adobe had a little issue the other day with the small matter of 150 million accounts being breached and released to the public. Whoops. So what are we talking about? A shed load of records containing an internal ID, username, email, encrypted password and a password hint. Naked Security did a very good write up on Adobe’s giant-sized cryptographic blunder in terms of what they got wrong with their password storage so I won’t try to replicate that, rather I’d like to take a look at the password hints.
This is an interesting one from an application security perspective and the rationale basically goes like this: In order to help people remember their passwords, you give them the ability to create a “hint” or in other words, record a piece of information that will later help them recall their password. Password hints are an absolutely ridiculous security measure. The whole premise that the secret that is the password can be unlocked by referring to a retrievable user-generated piece of text is just completely nonsensical.
The other thing that’s completely nonsensical is this: Whilst Adobe encrypted their passwords (even though done poorly), password hints had absolutely no security whatsoever. Right, so protect the password but don’t protect the data that helps you determine the password! Because of their penchant for encryption rather than hashing (as the passwords themselves should have been stored), this mechanism could have been used to protect the hints and still allow reversing of the cipher for display when required. Except that’s not what happened.
Password hints or any other security mechanism designed around allowing users to create their own security related questions fundamentally compromise the security of the account. I touched on this in the context of secret questions in my article on Everything you ever wanted to know about building a secure password reset feature but now we’ve got some real hard data on which to draw some conclusions. Heaps of real hard data.
About the data
Yesterday I wrote about Using high-spec Azure SQL Server for short term intensive data processing where I did bunch of analysis of the data. During that exercise I imported 152,989,508 records using the SQL Server bcp utility. That’s actually a fraction less than the just over 153 million records in the dump but there were a few where delimiters didn’t play nice – you just can’t trust hackers to always give you a clean data dump! Regardless, each record looks like this:
84557956-|--|-[redacted]@parponline.org-|-0tlHzKbr18uO6Wu5iaXtPQ==-|-mother's maiden name-Wilson|--
Let’s take a look inside the dump and see what sort of conclusions we can draw from all this.
Hint prevalence and reuse
As I mentioned earlier, each record contains an internal ID, username, email, encrypted password and a password hint. The first thing I did was to take a look at the top 100 password hints. Not all records have a hint (my own personal breached account doesn’t), in fact there are only 109,305,888 records that do. Of these, 7,365,869 used the same 100 top hints. Here they are ranked by the number of occurrences of each:
|la de siempre||38,428|
|same as always||29,233|
Within here there are actually some very common patterns that break down into the following categories:
Family names are obviously the biggies and within here we’re looking at things like kids or partner names. Clearly these are very easily discoverable and obviously next to useless as a password. Of course the account owner could have done some character substitution to strengthen the password (for example, “e” becomes “3”), but that’s very easily tested for if you’re looking to breach the account.
Next was password repeats and this is instances of hints such as “the usual” or “same as always” (including variations in foreign languages). Clearly this is indicating a reused password and it shouldn’t really come as a surprise – we know password reuse is rampant.
The “Other” category is a bit of a catch-all: The hints include thing like “adobe” and short names such as “id” and “yo”. It’s a bit hard to tell what to make of these – they could be junk entries but their prevalence suggests there is a common approach here. The “adobe” hint quite possibly belongs to a password of “adobe” and so on and so forth.
Animals are obviously very popular so think about dog and cat names. That also should come as no surprise as we know pets are a favourite for passwords. Remember that next time you put your dog’s name on your Facebook profile – you don’t want to have to rename Fido after a password breach!
I won’t go through each of the remaining categories as they’re largely self-explanatory, what I will say though is that the “Discoverable names” were all hints about things that could be easily found or guessed. For example, “colors” has a pretty finite set of options. Speaking of which…
Matching passwords to the hint
Due to Adobe’s choice of encryption, all instances of a password of, say “blue” will produce the same cipher. Determine what that cipher is and you’ve just cracked every “blue” password in the DB. Indeed this is the very risk we set out to avoid by salting our hashes, but of course Adobe was a long way off having this option to begin with.
Anyway, what this now means is that we can draw a number of conclusions about the passwords from the hints. For example, we can group all the ciphers for the hint “color” and because we can quite rightly assume that the password will be, well, a colour, there aren’t too many options of what that will be (character substitutions and mutations aside).
Let’s take an example: here are the top 20 ciphers by occurrence for the hint “color” (just the US spelling of the word):
The dominance of the first three ciphers (it adds up to about the sum of the next sixty ciphers) means you can pretty reliably conclude that we’re looking at passwords of “red”, “green” and “blue”. The broad range of other possibilities can be explained by a combination of foreign language versions of colours along with possible character substitutions. Probably something to be said for minimum strength criteria there too…
Reconciliation with the Sony breach
Looking at breaches across systems gives you some interesting insights, in particular the prevalence of password reuse. It gets really interesting when you look at the hints from Adobe next to the plain text passwords from Sony. There are way too many to list in their entirety – there were nearly 7,000 matches between the sources based on correlating the email address.
Let’s a take a look at those hints next to the passwords:
|Adobe hint||Sony password|
|grandmas maiden name||jostone|
|The most awesome dog ever!||Schultz|
|how many cats do I have||ihave3athome|
|My grandmother's name||bosco|
|name of reindeer||prancer|
|KING OF KINGS||JESUSISLORD|
|animal that says neigh||horses|
|black and white cat||sebastian|
A couple there might require a bit of imagination, but there’s a link :) Then again, a bunch of them aren’t so much personal questions as they are common knowledge; an “animal that says neigh” – good one!
On the point of the personal nature of some of these, ponder this for a moment: Even though the password itself may be nonsensical in isolation, the hint can disclose its purpose and inadvertently leak sensitive information about the account holder. Here are some pretty serious examples:
|Adobe hint||Sony password|
|moms maiden name||schniggy|
|the password you usuallly use||torrie|
|bank pin once||raw3223|
|my initials and code||sb5501|
The couple of SSN examples in particular are bad news – tie this back to an ID (and of course we now have the email addresses for these individuals) and now you have something rather personal out there. Same again with the birthdays. And passwords from other sites. And PINs for crying out loud! In these cases, the password hint has been extremely detrimental to the individual’s privacy.
You don’t often see this discussed and in the Adobe case all the focus has been on the passwords, but it’s the hints that can be seriously bad news for people and there was absolutely no cryptographic protection on these to begin with.
Other incidental statistics
One of the things that makes the Adobe breach so significant is the range of countries it covers. Very often we see breaches that don’t have great international representation but this one is a little different. Take a look at the top 100 domains of the email addresses and how many appear on each:
hotmail.com (32,571,130), gmail.com (24,035,771), yahoo.com (17,816,528), aol.com (3,478,727), hotmail.fr (1,498,847), msn.com (1,443,862), hotmail.co.uk (1,411,400), comcast.net (1,249,918), mail.ru (1,248,392), live.com (1,235,491), web.de (1,226,283), yahoo.co.jp (997,808), qq.com (967,112), gmx.de (962,019), yahoo.com.tw (664,356), naver.com (659,968), sbcglobal.net (656,133), yahoo.fr (653,995), 163.com (645,375), yahoo.co.uk (624,446), hotmail.it (596,945), ymail.com (569,064), t-online.de (517,467), yahoo.com.br (490,986), verizon.net (452,069), libero.it (446,540), yahoo.com.hk (444,380), googlemail.com (441,136), me.com (402,349), yandex.ru (386,843), yahoo.co.in (382,410), yahoo.es (380,856), hotmail.es (375,316), live.fr (369,010), yahoo.de (367,180), cox.net (362,993), hotmail.de (361,869), mac.com (358,295), aim.com (348,749), hanmail.net (339,228), wanadoo.fr (326,855), btinternet.com (320,511), adobe.com (315,401), bellsouth.net (315,189), orange.fr (313,532), att.net (296,584), gmx.net (285,376), wp.pl (282,902), 126.com (278,700), free.fr (270,278), rediffmail.com (267,777), earthlink.net (240,916), rocketmail.com (240,308), live.co.uk (239,045), yahoo.ca (227,496), yahoo.it (216,798), yahoo.com.mx (201,963), shaw.ca (197,060), bigpond.com (190,732), charter.net (187,788), freenet.de (180,192), hotmail.co.jp (175,968), nate.com (171,951), yahoo.co.id (169,780), mundopositivo.com.br (169,373), o2.pl (165,186), live.it (162,987), bluewin.ch (162,299), alice.it (160,435), rambler.ru (153,155), bol.com.br (152,311), rogers.com (151,084), live.nl (148,760), gmx.at (148,317), arcor.de (146,084), sina.com (144,483), sympatico.ca (142,470), windowslive.com (140,690), live.com.mx (140,664), seznam.cz (140,624), laposte.net (136,782), ig.com.br (135,812), ntlworld.com (132,927), optonline.net (132,861), tiscali.it (130,429), mail.com (124,425), yahoo.com.cn (120,887), yahoo.com.au (120,540), yahoo.com.ar (119,279), live.ca (107,419), live.de (106,387), nifty.com (106,305), abv.bg (105,694), uol.com.br (104,572), yahoo.com.vn (99,435), terra.com.br (96,851), optusnet.com.au (94,705), telus.net (94,594), bk.ru (94,043), juno.com (93,590)
This is just over 113 million addresses and 9 of the top 20 domains come from France, the UK, Russia, Germany, Japan or Taiwan. Another thing that’s interesting in this is that about 37 million of the addresses are on Hotmail domains. That may also give some insight into the age of the accounts – I know from various sources that many of these accounts are quite old.
Here’s another oddity for you: Of those 150 million accounts, only 8,052 have a username. Of those 8,052 usernames, 6,260 are against adobe.com email addresses. Of those 6,260 Adobe addresses, 1,070 have the same password cipher of “2GtbVrmsERzioxG6CatHBw==” and another 1,070 have no password at all. I’m not sure why this is the case – I suspect there’s some internal idiosyncrasies around the fact that Adobe staff seem to have their own usernames and you can only hope that these aren’t actually their passwords! At the very least though, there might be a bit of social engineering leverage from having Adobe usernames now made public.
A little tangential but another interesting result of this breach is that Facebook have been notifying their own users whose accounts were in the Adobe breach and they used the same password on FB:
Let that sink in for a moment: Facebook have retrieved plain text versions of the breached Adobe accounts then hashed them per-user and compared them to the passwords they have on file. Of course I’m assuming FB has unique salts and strong hashing and this is what they’ve done, but it seems to reconcile and has been confirmed by their security team. Initially this was a bit of a “what the?!” moment but on reflection, it makes a lot of sense; this is a great way of highlighting the breach to users and I can’t see a security downside to their approach, after all, the passwords are (almost) as good as public now (see the link in the opening paragraph of this post re decrypting them). At the end of the day, shame on anyone who actually sees this message for reusing passwords!
Firstly there are the obvious ones that have come before me: This is quite possibly the largest breach of user accounts we’ve ever seen and Adobe have done a very poor job of securing passwords. This should be one of those incidents that we now point to in the future and say “There – that’s why encryption of passwords instead of hashing is bad!”
But I’ll add to that another conclusion: The presence of a plain text password hint significantly weakened the password irrespective of the poor cryptography applied. It’s clear from the analysis above that very often the hint narrows down the possible password values to an unacceptably low number of possible results.
The other alarming conclusion came when I reconciled the data with the Sony breach: When matched to plain text passwords from other breaches, the hint often disclosed sensitive personal information which wouldn’t otherwise have been gleaned. That’s a serious issue and it’s only a problem because of the presence of the hint.
Ultimately, password hints are evil and they add nothing to an online system that can’t be achieved with a secure password reset feature.