Troy Hunt: Adobe credentials and the serious insecurity of password hints

Adobe had a little issue the other day with the small matter of 150 million accounts being breached and released to the public. Whoops. So what are we talking about? A shed load of records containing an internal ID, username, email, encrypted password and a password hint. Naked Security did a very good write up on Adobe’s giant-sized cryptographic blunder in terms of what they got wrong with their password storage so I won’t try to replicate that, rather I’d like to take a look at the password hints.

This is an interesting one from an application security perspective and the rationale basically goes like this: In order to help people remember their passwords, you give them the ability to create a “hint” or in other words, record a piece of information that will later help them recall their password. Password hints are an absolutely ridiculous security measure. The whole premise that the secret that is the password can be unlocked by referring to a retrievable user-generated piece of text is just completely nonsensical.

The other thing that’s completely nonsensical is this: Whilst Adobe encrypted their passwords (even though done poorly), password hints had absolutely no security whatsoever. Right, so protect the password but don’t protect the data that helps you determine the password! Because of their penchant for encryption rather than hashing (as the passwords themselves should have been stored), this mechanism could have been used to protect the hints and still allow reversing of the cipher for display when required. Except that’s not what happened.

Password hints or any other security mechanism designed around allowing users to create their own security related questions fundamentally compromise the security of the account. I touched on this in the context of secret questions in my article on Everything you ever wanted to know about building a secure password reset feature but now we’ve got some real hard data on which to draw some conclusions. Heaps of real hard data.

About the data

Yesterday I wrote about Using high-spec Azure SQL Server for short term intensive data processing where I did bunch of analysis of the data. During that exercise I imported 152,989,508 records using the SQL Server bcp utility. That’s actually a fraction less than the just over 153 million records in the dump but there were a few where delimiters didn’t play nice – you just can’t trust hackers to always give you a clean data dump! Regardless, each record looks like this:

84557956-|--|-[redacted]@parponline.org-|-0tlHzKbr18uO6Wu5iaXtPQ==-|-mother's maiden name-Wilson|--

Let’s take a look inside the dump and see what sort of conclusions we can draw from all this.

Hint prevalence and reuse

As I mentioned earlier, each record contains an internal ID, username, email, encrypted password and a password hint. The first thing I did was to take a look at the top 100 password hints. Not all records have a hint (my own personal breached account doesn’t), in fact there are only 109,305,888 records that do. Of these, 7,365,869 used the same 100 top hints. Here they are ranked by the number of occurrences of each:

Hints	Occurrences
dog	559,358
name	479,828
usual	387,222
same	242,932
Me	242,160
cat	227,979
son	183,943
daughter	181,047
nickname	165,648
pet	142,626
normal	140,075
work	125,695
car	113,530
school	112,000
birthday	109,731
my name	102,843
love	102,404
The Usual	97,711
kids	95,938
123	95,348
wife	90,406
Home	83,856
standard	83,455
numbers	80,939
NONE	80,308
password	80,269
123456	70,891
mom	67,260
adobe	66,418
number	65,640
email	61,761
middle name	59,040
bday	57,832
baby	57,341
always	54,022
1	53,485
1to6	53,465
last name	53,086
Nombre	52,637
animal	52,624
company	50,435
yo	49,513
family	47,474
dogs name	47,036
sport	46,262
msn	45,633
lol	45,202
a	45,003
Phone	44,522
yahoo	44,195
band	42,972
husband	42,858
NO	41,962
numeros	40,930
birth	40,622
regular	40,551
myspace	38,799
la de siempre	38,428
city	37,960
dad	37,637
duh	37,564
you know	37,466
football	37,361
hotmail	36,995
date	36,124
hi	36,056
color	35,691
street	35,667
mother	35,627
id	34,730
middle	33,540
Address	32,744
hund	32,463
DOB	32,446
mama	32,385
hello	31,991
1234	31,709
sister	31,550
My Dog	31,500
perro	31,359
amor	31,199
maiden name	31,149
wie immer	31,079
Job	30,891
old	30,057
Anniversary	29,535
facebook	29,506
dogs	29,473
pet name	29,241
same as always	29,233
team	28,573
Horse	28,411
Food	28,275
Bird	27,971
nome	27,910
girl	27,879
pass	27,527
moi	27,475
tel	27,444
123456789	27,326

Within here there are actually some very common patterns that break down into the following categories:

Common password hints by category

Family names are obviously the biggies and within here we’re looking at things like kids or partner names. Clearly these are very easily discoverable and obviously next to useless as a password. Of course the account owner could have done some character substitution to strengthen the password (for example, “e” becomes “3”), but that’s very easily tested for if you’re looking to breach the account.

Next was password repeats and this is instances of hints such as “the usual” or “same as always” (including variations in foreign languages). Clearly this is indicating a reused password and it shouldn’t really come as a surprise – we know password reuse is rampant.

The “Other” category is a bit of a catch-all: The hints include thing like “adobe” and short names such as “id” and “yo”. It’s a bit hard to tell what to make of these – they could be junk entries but their prevalence suggests there is a common approach here. The “adobe” hint quite possibly belongs to a password of “adobe” and so on and so forth.

Animals are obviously very popular so think about dog and cat names. That also should come as no surprise as we know pets are a favourite for passwords. Remember that next time you put your dog’s name on your Facebook profile – you don’t want to have to rename Fido after a password breach!

I won’t go through each of the remaining categories as they’re largely self-explanatory, what I will say though is that the “Discoverable names” were all hints about things that could be easily found or guessed. For example, “colors” has a pretty finite set of options. Speaking of which…

Matching passwords to the hint

Due to Adobe’s choice of encryption, all instances of a password of, say “blue” will produce the same cipher. Determine what that cipher is and you’ve just cracked every “blue” password in the DB. Indeed this is the very risk we set out to avoid by salting our hashes, but of course Adobe was a long way off having this option to begin with.

Anyway, what this now means is that we can draw a number of conclusions about the passwords from the hints. For example, we can group all the ciphers for the hint “color” and because we can quite rightly assume that the password will be, well, a colour, there aren’t too many options of what that will be (character substitutions and mutations aside).

Let’s take an example: here are the top 20 ciphers by occurrence for the hint “color” (just the US spelling of the word):

Distribution of ciphers used for "colour" passwords

The dominance of the first three ciphers (it adds up to about the sum of the next sixty ciphers) means you can pretty reliably conclude that we’re looking at passwords of “red”, “green” and “blue”. The broad range of other possibilities can be explained by a combination of foreign language versions of colours along with possible character substitutions. Probably something to be said for minimum strength criteria there too…

Reconciliation with the Sony breach

Looking at breaches across systems gives you some interesting insights, in particular the prevalence of password reuse. It gets really interesting when you look at the hints from Adobe next to the plain text passwords from Sony. There are way too many to list in their entirety – there were nearly 7,000 matches between the sources based on correlating the email address.

Let’s a take a look at those hints next to the passwords:

Adobe hint	Sony password
grandmas maiden name	jostone
name birthyear	mike1953
Usual	75857585
church	lovejoy
profession	acting
spice	tumeric
dog's name	sunny
The most awesome dog ever!	Schultz
Bibleology	psalm54
how many cats do I have	ihave3athome
My grandmother's name	bosco
name of reindeer	prancer
favorite animal	elephant
KING OF KINGS	JESUSISLORD
animal that says neigh	horses
black and white cat	sebastian
National bird	toucan
Old jeep	97laredo
Money	million
fruit	grapefruit

A couple there might require a bit of imagination, but there’s a link :) Then again, a bunch of them aren’t so much personal questions as they are common knowledge; an “animal that says neigh” – good one!

On the point of the personal nature of some of these, ponder this for a moment: Even though the password itself may be nonsensical in isolation, the hint can disclose its purpose and inadvertently leak sensitive information about the account holder. Here are some pretty serious examples:

Adobe hint	Sony password
and ssnumb	k156668818
bdayssn	83881090
birthday	072586
birthday	102048
maiden	lubeck
moms maiden name	schniggy
License Plate	pjrn19
license plate	ibt123
yahoo password	092187
the password you usuallly use	torrie
bank pin once	raw3223
my initials and code	sb5501

The couple of SSN examples in particular are bad news – tie this back to an ID (and of course we now have the email addresses for these individuals) and now you have something rather personal out there. Same again with the birthdays. And passwords from other sites. And PINs for crying out loud! In these cases, the password hint has been extremely detrimental to the individual’s privacy.

You don’t often see this discussed and in the Adobe case all the focus has been on the passwords, but it’s the hints that can be seriously bad news for people and there was absolutely no cryptographic protection on these to begin with.

Other incidental statistics

One of the things that makes the Adobe breach so significant is the range of countries it covers. Very often we see breaches that don’t have great international representation but this one is a little different. Take a look at the top 100 domains of the email addresses and how many appear on each:

hotmail.com (32,571,130), gmail.com (24,035,771), yahoo.com (17,816,528), aol.com (3,478,727), hotmail.fr (1,498,847), msn.com (1,443,862), hotmail.co.uk (1,411,400), comcast.net (1,249,918), mail.ru (1,248,392), live.com (1,235,491), web.de (1,226,283), yahoo.co.jp (997,808), qq.com (967,112), gmx.de (962,019), yahoo.com.tw (664,356), naver.com (659,968), sbcglobal.net (656,133), yahoo.fr (653,995), 163.com (645,375), yahoo.co.uk (624,446), hotmail.it (596,945), ymail.com (569,064), t-online.de (517,467), yahoo.com.br (490,986), verizon.net (452,069), libero.it (446,540), yahoo.com.hk (444,380), googlemail.com (441,136), me.com (402,349), yandex.ru (386,843), yahoo.co.in (382,410), yahoo.es (380,856), hotmail.es (375,316), live.fr (369,010), yahoo.de (367,180), cox.net (362,993), hotmail.de (361,869), mac.com (358,295), aim.com (348,749), hanmail.net (339,228), wanadoo.fr (326,855), btinternet.com (320,511), adobe.com (315,401), bellsouth.net (315,189), orange.fr (313,532), att.net (296,584), gmx.net (285,376), wp.pl (282,902), 126.com (278,700), free.fr (270,278), rediffmail.com (267,777), earthlink.net (240,916), rocketmail.com (240,308), live.co.uk (239,045), yahoo.ca (227,496), yahoo.it (216,798), yahoo.com.mx (201,963), shaw.ca (197,060), bigpond.com (190,732), charter.net (187,788), freenet.de (180,192), hotmail.co.jp (175,968), nate.com (171,951), yahoo.co.id (169,780), mundopositivo.com.br (169,373), o2.pl (165,186), live.it (162,987), bluewin.ch (162,299), alice.it (160,435), rambler.ru (153,155), bol.com.br (152,311), rogers.com (151,084), live.nl (148,760), gmx.at (148,317), arcor.de (146,084), sina.com (144,483), sympatico.ca (142,470), windowslive.com (140,690), live.com.mx (140,664), seznam.cz (140,624), laposte.net (136,782), ig.com.br (135,812), ntlworld.com (132,927), optonline.net (132,861), tiscali.it (130,429), mail.com (124,425), yahoo.com.cn (120,887), yahoo.com.au (120,540), yahoo.com.ar (119,279), live.ca (107,419), live.de (106,387), nifty.com (106,305), abv.bg (105,694), uol.com.br (104,572), yahoo.com.vn (99,435), terra.com.br (96,851), optusnet.com.au (94,705), telus.net (94,594), bk.ru (94,043), juno.com (93,590)

This is just over 113 million addresses and 9 of the top 20 domains come from France, the UK, Russia, Germany, Japan or Taiwan. Another thing that’s interesting in this is that about 37 million of the addresses are on Hotmail domains. That may also give some insight into the age of the accounts – I know from various sources that many of these accounts are quite old.

Here’s another oddity for you: Of those 150 million accounts, only 8,052 have a username. Of those 8,052 usernames, 6,260 are against adobe.com email addresses. Of those 6,260 Adobe addresses, 1,070 have the same password cipher of “2GtbVrmsERzioxG6CatHBw==” and another 1,070 have no password at all. I’m not sure why this is the case – I suspect there’s some internal idiosyncrasies around the fact that Adobe staff seem to have their own usernames and you can only hope that these aren’t actually their passwords! At the very least though, there might be a bit of social engineering leverage from having Adobe usernames now made public.

A little tangential but another interesting result of this breach is that Facebook have been notifying their own users whose accounts were in the Adobe breach and they used the same password on FB:

Facebook message about compromised Adobe account

Let that sink in for a moment: Facebook have retrieved plain text versions of the breached Adobe accounts then hashed them per-user and compared them to the passwords they have on file. Of course I’m assuming FB has unique salts and strong hashing and this is what they’ve done, but it seems to reconcile and has been confirmed by their security team. Initially this was a bit of a “what the?!” moment but on reflection, it makes a lot of sense; this is a great way of highlighting the breach to users and I can’t see a security downside to their approach, after all, the passwords are (almost) as good as public now (see the link in the opening paragraph of this post re decrypting them). At the end of the day, shame on anyone who actually sees this message for reusing passwords!

Conclusions

Firstly there are the obvious ones that have come before me: This is quite possibly the largest breach of user accounts we’ve ever seen and Adobe have done a very poor job of securing passwords. This should be one of those incidents that we now point to in the future and say “There – that’s why encryption of passwords instead of hashing is bad!”

But I’ll add to that another conclusion: The presence of a plain text password hint significantly weakened the password irrespective of the poor cryptography applied. It’s clear from the analysis above that very often the hint narrows down the possible password values to an unacceptably low number of possible results.

The other alarming conclusion came when I reconciled the data with the Sony breach: When matched to plain text passwords from other breaches, the hint often disclosed sensitive personal information which wouldn’t otherwise have been gleaned. That’s a serious issue and it’s only a problem because of the presence of the hint.

Ultimately, password hints are evil and they add nothing to an online system that can’t be achieved with a secure password reset feature.

Security Passwords

Adobe credentials and the serious insecurity of password hints