A couple of months ago, I launched version 2 of Pwned Passwords. This is a collection of over half a billion passwords which have previously appeared in data breaches and the intention is that they're used as a black list; these are the "secrets" that NIST referred to in their recent guidance:
When processing requests to establish and change memorized secrets, verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised.
In other words, once a password has appeared in a data breach and it ends up floating around the web for all sorts of nefarious parties to use, don't let your customers use that password! Now, as I say in the aforementioned blog post (and in the post launching V1 before it), it's not always that black and white and indeed outright blocking every pwned password has all sorts of usability ramifications as well. But certainly, many organisations have taken precisely this approach and have used the service to keep known bad passwords out of their systems.
But I always wondered - what sort of percentage of passwords would this actually block? I mean if you had 1 million people in your system, is it a quarter of them using previously breached passwords? A half? More? What I needed to test this theory was a data breach that contained plain text passwords, had a significant volume of them and it had to be one I hadn't seen before and didn't form part of the sources I used to create the Pwned Passwords list in the first place. (Strictly speaking, I could have used a breach with hashed passwords and used the source Pwned Passwords as a dictionary in a hash cracking exercise, but plain text was always going to be much easier, much faster and would allow me to quickly see which password weren't already in my list.)
And then CashCrate came along:
New breach: CashCrate had 6.8M records breached in November 2016. The data included names, physical and email addresses and a combination of both plain text passwords and MD5 hashes. 71% were already in @haveibeenpwned. Read more: https://t.co/NYUgAiAcdg— Have I Been Pwned (@haveibeenpwned) April 20, 2018
Of those 6.8M records, 2,232,284 of the passwords were in plain text. The remainder were MD5 hashes, assumedly because they were in the process of rolling over to this hashing algorithm when the breach occurred (although when you have all the source passwords in plain text to begin with, it's kinda weird they didn't just hash all those in one go). So to the big question raised earlier, how many of these were already in Pwned Passwords? Or in other words, how many CashCrate subscribers were using terrible passwords already known to have been breached?
In total, there were 1,910,144 passwords out of 2,232,284 already in the Pwned Passwords set. In other words, 86% of subscribers were using passwords already leaked in other data breaches and available to attackers in plain text.
So, what sort of passwords are we talking about here? All the usual terrible ones you'd expect people to choose which, by order of prevalence in the Pwned Password data set, means passwords like these:
These are terrible and granted, who knows how far back they date, but as of today you can still sign up with a password of "123456" if you'd like:
You can't use "12345" - that's not long enough - and its appearance in position 10 above likely indicates an even weaker password policy in the past. Obviously, the password criteria is terrible, but I appreciate some people may suggest the nature of the site predisposes people to making terrible password choices (it's a "cash-for-surveys" site).
But I was also interested in some of the more obscure CashCrate passwords that were already in my data set and found ones like these that I've only ever seen once before (I'll substitute several characters in each to protect the source password but still illustrate the point):
- anchorage alaska
- nikki i love u
- i like to have sex
I didn't substitute any characters in the last 3 because I wanted to illustrate that even pass phrases can be useless once exposed. Having a good password isn't enough, uniqueness still matters enormously.
So which passwords weren't in Pwned Passwords already? Predictably, some of the most popular ones were named after the site itself:
And so on and so forth (the last one makes sense once you think about it). Many of the other most common ones were just outright terrible in other ways, for example number combinations or a person's name followed by a number (some quite unique variants appeared many times over suggesting possible bulk account creation). All of those will go into the next release of Pwned Passwords which will go out once there's a sufficiently large volume of new passwords.
Getting back to the whole point of the service for a moment, traditional password complexity rules are awful and they must die a fiery death:
Identify what is wrong with these pictures. pic.twitter.com/KJ1XgrZI1g— Aaron Toponce ☕ (@AaronToponce) April 24, 2018
I wrote last year about how password strength indicators help people make ill-informed choices and clearly based on the tweet above, that's still absolutely true today.
Getting back to the issue of how terrible passwords are and the impact this then has on individuals and organisations alike, one of the big problems I've seen really accelerate over the last year is credential stuffing. In other words, bad guys grabbing huge stashes of username and password pairs from other data breaches and seeing which ones work on totally unrelated sites. I have a much more comprehensive blog post in this in the works and it's a non-trivial challenge I want to devote more time to, but imagine this:
If you're responsible for running a website, how are you going to be resilient against attackers who come to your site with legitimate usernames and passwords of your members?
And just to make things even harder, the site being attacked isn't necessarily viewed as the victim either. Earlier this year, the FTC had this to say:
The FTC's message is loud and clear: If customer data was put at risk by credential stuffing, then being the innocent corporate victim is no defence to an enforcement case. Rather, in the FTC's view companies holding sensitive customer information should be taking affirmative action to reduce the risk of credential stuffing.
That's a hard challenge and the solution is non-trivial too. Again, I've got something more comprehensive in draft and I'll definitely come back to that but for now, this is a great start:
WIP: Helping our @EveOnline players to be aware if their passwords are on a list of known compromised passwords. Thanks @haveibeenpwned ! CC: @troyhunt #tweetfleet #security #workinprogress pic.twitter.com/miovu6g25q— Stefán Jökull Sigurðarson (@stebets) April 27, 2018
I like this because it is trivial! It's not the whole picture in terms of defences, but it's a great start. I don't know if EVE Online would have 86% of members using known breached passwords (it's not exactly "cash-for-surveys", but then again, it's also used by a lot of kids), but I do know that it would still be a statistically significant numbers. (Incidentally, this should go live on EVE Online about the same time I plan to publish this blog post.)
As I come across more plain text data breaches (which is inevitable), I'll do the same sanity check again. For now, I've taken the 322,140 passwords not already in Pwned Passwords, distilled it down to 307,016 unique ones and queued those up for version 3 of the password list. While you're waiting for that one, it might be worth thinking about how many subscribers of your own service are using a previously seen password because if it's even a fraction of the CashCrate number, that's rather worrying.