Tuesday, November 12, 2013

Adobe credentials and the serious insecurity of password hints

Tuesday, November 12, 2013

Adobe had a little issue the other day with the small matter of 150 million accounts being breached and released to the public. Whoops. So what are we talking about? A shed load of records containing an internal ID, username, email, encrypted password and a password hint. Naked Security did a very good write up on Adobe’s giant-sized cryptographic blunder in terms of what they got wrong with their password storage so I won’t try to replicate that, rather I’d like to take a look at the password hints.

This is an interesting one from an application security perspective and the rationale basically goes like this: In order to help people remember their passwords, you give them the ability to create a “hint” or in other words, record a piece of information that will later help them recall their password. Password hints are an absolutely ridiculous security measure. The whole premise that the secret that is the password can be unlocked by referring to a retrievable user-generated piece of text is just completely nonsensical.

The other thing that’s completely nonsensical is this: Whilst Adobe encrypted their passwords (even though done poorly), password hints had absolutely no security whatsoever. Right, so protect the password but don’t protect the data that helps you determine the password! Because of their penchant for encryption rather than hashing (as the passwords themselves should have been stored), this mechanism could have been used to protect the hints and still allow reversing of the cipher for display when required. Except that’s not what happened.

Password hints or any other security mechanism designed around allowing users to create their own security related questions fundamentally compromise the security of the account. I touched on this in the context of secret questions in my article on Everything you ever wanted to know about building a secure password reset feature but now we’ve got some real hard data on which to draw some conclusions. Heaps of real hard data.

About the data

Yesterday I wrote about Using high-spec Azure SQL Server for short term intensive data processing where I did bunch of analysis of the data. During that exercise I imported 152,989,508 records using the SQL Server bcp utility. That’s actually a fraction less than the just over 153 million records in the dump but there were a few where delimiters didn’t play nice – you just can’t trust hackers to always give you a clean data dump! Regardless, each record looks like this:

84557956-|--|-[redacted]@parponline.org-|-0tlHzKbr18uO6Wu5iaXtPQ==-|-mother's maiden name-Wilson|--

Let’s take a look inside the dump and see what sort of conclusions we can draw from all this.

Hint prevalence and reuse

As I mentioned earlier, each record contains an internal ID, username, email, encrypted password and a password hint. The first thing I did was to take a look at the top 100 password hints. Not all records have a hint (my own personal breached account doesn’t), in fact there are only 109,305,888 records that do. Of these, 7,365,869 used the same 100 top hints. Here they are ranked by the number of occurrences of each:

Hints Occurrences
dog 559,358
name 479,828
usual 387,222
same 242,932
Me 242,160
cat 227,979
son 183,943
daughter 181,047
nickname 165,648
pet 142,626
normal 140,075
work 125,695
car 113,530
school 112,000
birthday 109,731
my name 102,843
love 102,404
The Usual 97,711
kids 95,938
123 95,348
wife 90,406
Home 83,856
standard 83,455
numbers 80,939
NONE 80,308
password 80,269
123456 70,891
mom 67,260
adobe 66,418
number 65,640
email 61,761
middle name 59,040
bday 57,832
baby 57,341
always 54,022
1 53,485
1to6 53,465
last name 53,086
Nombre 52,637
animal 52,624
company 50,435
yo 49,513
family 47,474
dogs name 47,036
sport 46,262
msn 45,633
lol 45,202
a 45,003
Phone 44,522
yahoo 44,195
band 42,972
husband 42,858
NO 41,962
numeros 40,930
birth 40,622
regular 40,551
myspace 38,799
la de siempre 38,428
city 37,960
dad 37,637
duh 37,564
you know 37,466
football 37,361
hotmail 36,995
date 36,124
hi 36,056
color 35,691
street 35,667
mother 35,627
id 34,730
middle 33,540
Address 32,744
hund 32,463
DOB 32,446
mama 32,385
hello 31,991
1234 31,709
sister 31,550
My Dog 31,500
perro 31,359
amor 31,199
maiden name 31,149
wie immer 31,079
Job 30,891
old 30,057
Anniversary 29,535
facebook 29,506
dogs 29,473
pet name 29,241
same as always 29,233
team 28,573
Horse 28,411
Food 28,275
Bird 27,971
nome 27,910
girl 27,879
pass 27,527
moi 27,475
tel 27,444
123456789 27,326

Within here there are actually some very common patterns that break down into the following categories:

Common password hints by category

Family names are obviously the biggies and within here we’re looking at things like kids or partner names. Clearly these are very easily discoverable and obviously next to useless as a password. Of course the account owner could have done some character substitution to strengthen the password (for example, “e” becomes “3”), but that’s very easily tested for if you’re looking to breach the account.

Next was password repeats and this is instances of hints such as “the usual” or “same as always” (including variations in foreign languages). Clearly this is indicating a reused password and it shouldn’t really come as a surprise – we know password reuse is rampant.

The “Other” category is a bit of a catch-all: The hints include thing like “adobe” and short names such as “id” and “yo”. It’s a bit hard to tell what to make of these – they could be junk entries but their prevalence suggests there is a common approach here. The “adobe” hint quite possibly belongs to a password of “adobe” and so on and so forth.

Animals are obviously very popular so think about dog and cat names. That also should come as no surprise as we know pets are a favourite for passwords. Remember that next time you put your dog’s name on your Facebook profile – you don’t want to have to rename Fido after a password breach!

I won’t go through each of the remaining categories as they’re largely self-explanatory, what I will say though is that the “Discoverable names” were all hints about things that could be easily found or guessed. For example, “colors” has a pretty finite set of options. Speaking of which…

Matching passwords to the hint

Due to Adobe’s choice of encryption, all instances of a password of, say “blue” will produce the same cipher. Determine what that cipher is and you’ve just cracked every “blue” password in the DB. Indeed this is the very risk we set out to avoid by salting our hashes, but of course Adobe was a long way off having this option to begin with.

Anyway, what this now means is that we can draw a number of conclusions about the passwords from the hints. For example, we can group all the ciphers for the hint “color” and because we can quite rightly assume that the password will be, well, a colour, there aren’t too many options of what that will be (character substitutions and mutations aside).

Let’s take an example: here are the top 20 ciphers by occurrence for the hint “color” (just the US spelling of the word):

Distribution of ciphers used for "colour" passwords

The dominance of the first three ciphers (it adds up to about the sum of the next sixty ciphers) means you can pretty reliably conclude that we’re looking at passwords of “red”, “green” and “blue”. The broad range of other possibilities can be explained by a combination of foreign language versions of colours along with possible character substitutions. Probably something to be said for minimum strength criteria there too…

Reconciliation with the Sony breach

Looking at breaches across systems gives you some interesting insights, in particular the prevalence of password reuse. It gets really interesting when you look at the hints from Adobe next to the plain text passwords from Sony. There are way too many to list in their entirety – there were nearly 7,000 matches between the sources based on correlating the email address.

Let’s a take a look at those hints next to the passwords:

Adobe hint Sony password
grandmas maiden name jostone
name birthyear mike1953
Usual 75857585
church lovejoy
profession acting
spice tumeric
dog's name sunny
The most awesome dog ever! Schultz
Bibleology psalm54
how many cats do I have ihave3athome
My grandmother's name bosco
name of reindeer prancer
favorite animal elephant
animal that says neigh horses
black and white cat sebastian
National bird toucan
Old jeep 97laredo
Money million
fruit grapefruit

A couple there might require a bit of imagination, but there’s a link :) Then again, a bunch of them aren’t so much personal questions as they are common knowledge; an “animal that says neigh” – good one!

On the point of the personal nature of some of these, ponder this for a moment: Even though the password itself may be nonsensical in isolation, the hint can disclose its purpose and inadvertently leak sensitive information about the account holder. Here are some pretty serious examples:

Adobe hint Sony password
and ssnumb k156668818
bdayssn 83881090
birthday 072586
birthday 102048
maiden lubeck
moms maiden name schniggy
License Plate pjrn19
license plate ibt123
yahoo password 092187
the password you usuallly use torrie
bank pin once raw3223
my initials and code sb5501

The couple of SSN examples in particular are bad news – tie this back to an ID (and of course we now have the email addresses for these individuals) and now you have something rather personal out there. Same again with the birthdays. And passwords from other sites. And PINs for crying out loud! In these cases, the password hint has been extremely detrimental to the individual’s privacy.

You don’t often see this discussed and in the Adobe case all the focus has been on the passwords, but it’s the hints that can be seriously bad news for people and there was absolutely no cryptographic protection on these to begin with.

Other incidental statistics

One of the things that makes the Adobe breach so significant is the range of countries it covers. Very often we see breaches that don’t have great international representation but this one is a little different. Take a look at the top 100 domains of the email addresses and how many appear on each:

hotmail.com (32,571,130), gmail.com (24,035,771), yahoo.com (17,816,528), aol.com (3,478,727), hotmail.fr (1,498,847), msn.com (1,443,862), hotmail.co.uk (1,411,400), comcast.net (1,249,918), mail.ru (1,248,392), live.com (1,235,491), web.de (1,226,283), yahoo.co.jp (997,808), qq.com (967,112), gmx.de (962,019), yahoo.com.tw (664,356), naver.com (659,968), sbcglobal.net (656,133), yahoo.fr (653,995), 163.com (645,375), yahoo.co.uk (624,446), hotmail.it (596,945), ymail.com (569,064), t-online.de (517,467), yahoo.com.br (490,986), verizon.net (452,069), libero.it (446,540), yahoo.com.hk (444,380), googlemail.com (441,136), me.com (402,349), yandex.ru (386,843), yahoo.co.in (382,410), yahoo.es (380,856), hotmail.es (375,316), live.fr (369,010), yahoo.de (367,180), cox.net (362,993), hotmail.de (361,869), mac.com (358,295), aim.com (348,749), hanmail.net (339,228), wanadoo.fr (326,855), btinternet.com (320,511), adobe.com (315,401), bellsouth.net (315,189), orange.fr (313,532), att.net (296,584), gmx.net (285,376), wp.pl (282,902), 126.com (278,700), free.fr (270,278), rediffmail.com (267,777), earthlink.net (240,916), rocketmail.com (240,308), live.co.uk (239,045), yahoo.ca (227,496), yahoo.it (216,798), yahoo.com.mx (201,963), shaw.ca (197,060), bigpond.com (190,732), charter.net (187,788), freenet.de (180,192), hotmail.co.jp (175,968), nate.com (171,951), yahoo.co.id (169,780), mundopositivo.com.br (169,373), o2.pl (165,186), live.it (162,987), bluewin.ch (162,299), alice.it (160,435), rambler.ru (153,155), bol.com.br (152,311), rogers.com (151,084), live.nl (148,760), gmx.at (148,317), arcor.de (146,084), sina.com (144,483), sympatico.ca (142,470), windowslive.com (140,690), live.com.mx (140,664), seznam.cz (140,624), laposte.net (136,782), ig.com.br (135,812), ntlworld.com (132,927), optonline.net (132,861), tiscali.it (130,429), mail.com (124,425), yahoo.com.cn (120,887), yahoo.com.au (120,540), yahoo.com.ar (119,279), live.ca (107,419), live.de (106,387), nifty.com (106,305), abv.bg (105,694), uol.com.br (104,572), yahoo.com.vn (99,435), terra.com.br (96,851), optusnet.com.au (94,705), telus.net (94,594), bk.ru (94,043), juno.com (93,590)

This is just over 113 million addresses and 9 of the top 20 domains come from France, the UK, Russia, Germany, Japan or Taiwan. Another thing that’s interesting in this is that about 37 million of the addresses are on Hotmail domains. That may also give some insight into the age of the accounts – I know from various sources that many of these accounts are quite old.

Here’s another oddity for you: Of those 150 million accounts, only 8,052 have a username. Of those 8,052 usernames, 6,260 are against adobe.com email addresses. Of those 6,260 Adobe addresses, 1,070 have the same password cipher of “2GtbVrmsERzioxG6CatHBw==” and another 1,070 have no password at all. I’m not sure why this is the case – I suspect there’s some internal idiosyncrasies around the fact that Adobe staff seem to have their own usernames and you can only hope that these aren’t actually their passwords! At the very least though, there might be a bit of social engineering leverage from having Adobe usernames now made public.

A little tangential but another interesting result of this breach is that Facebook have been notifying their own users whose accounts were in the Adobe breach and they used the same password on FB:

Facebook message about compromised Adobe account

Let that sink in for a moment: Facebook have retrieved plain text versions of the breached Adobe accounts then hashed them per-user and compared them to the passwords they have on file. Of course I’m assuming FB has unique salts and strong hashing and this is what they’ve done, but it seems to reconcile and has been confirmed by their security team. Initially this was a bit of a “what the?!” moment but on reflection, it makes a lot of sense; this is a great way of highlighting the breach to users and I can’t see a security downside to their approach, after all, the passwords are (almost) as good as public now (see the link in the opening paragraph of this post re decrypting them). At the end of the day, shame on anyone who actually sees this message for reusing passwords!


Firstly there are the obvious ones that have come before me: This is quite possibly the largest breach of user accounts we’ve ever seen and Adobe have done a very poor job of securing passwords. This should be one of those incidents that we now point to in the future and say “There – that’s why encryption of passwords instead of hashing is bad!”

But I’ll add to that another conclusion: The presence of a plain text password hint significantly weakened the password irrespective of the poor cryptography applied. It’s clear from the analysis above that very often the hint narrows down the possible password values to an unacceptably low number of possible results.

The other alarming conclusion came when I reconciled the data with the Sony breach: When matched to plain text passwords from other breaches, the hint often disclosed sensitive personal information which wouldn’t otherwise have been gleaned. That’s a serious issue and it’s only a problem because of the presence of the hint.

Ultimately, password hints are evil and they add nothing to an online system that can’t be achieved with a secure password reset feature.


comments powered by Disqus

Leaving comments is awesome, please do. All I ask is that you be nice and if in doubt, read Comments on troyhunt.com for guidance.