Troy Hunt: We Didn't Encrypt Your Password, We Hashed It. Here's What That Means:

You've possibly just found out you're in a data breach. The organisation involved may have contacted you and advised your password was exposed but fortunately, they encrypted it. But you should change it anyway. Huh? Isn't the whole point of encryption that it protects data when exposed to unintended parties? Ah, yes, but it wasn't encrypted it was hashed and therein lies a key difference:

Saying that passwords are “encrypted” over and over again doesn’t make it so. They’re bcrypt hashes so good job there, but the fact they’re suggesting everyone changes their password illustrates that even good hashing has its risks. https://t.co/21V6Vte6Wa
— Troy Hunt (@troyhunt) September 2, 2020

I see this over and over again and I'm not just on some nerdy pedantic rant, the difference between encryption and hashing is fundamental to how at-risk your password is from being recovered and abused after a data breach. I often hear people excusing the mischaracterisation of password storage on the basis of users not understanding what hashing means, but what I'm actually hearing is that breached organisations just aren't able to explain it in a way people understand. So here it is in a single sentence:

A password hash is a representation of your password that can't be reversed, but the original password may still be determined if someone hashes it again and gets the same result.

Let's start to drill deeper in a way that can be understood by everyday normal people, beginning with what a password hash actually is: there are two defining attributes that are relevant to this discussion:

A password hash is one-way: you can hash but you can never un-hash
The hashing procedure is deterministic: you will always get the same output with the same input

This is important for password storage as it means the following as they relate to the previous points:

The original password is never stored thus keeping it a secret even from the website you provided it to
By being deterministic, when the password is hashed at registration it will match the same password provided and hashed at login

Take, for example, the following password:

P@ssw0rd

This is a good password because it has lowercase, uppercase, numeric and non-alphanumeric values plus is 8 characters long. Yet somehow, your human brain looked at it and decided "no, it's not a good password" because what you're seeing is merely character substitution. The hackers have worked this out too which is why arbitrary composition rules on websites are useless. Regardless, here's what the hash of that password looks like:

161ebd7d45089b3446ee4e0d86dbcf92

This hash was created with the MD5 hashing algorithm and is 32 characters long. A shorter password hashed with MD5 is still 32 characters long. This entire blog post hashed with Md5 is still 32 characters long. This helps demonstrate the fundamental difference between hashing and encryption: a hash is a representation of data whilst encryption is protected data. Encryption can be reversed if you have the key which is why it's used for everything from protecting the files on your device to your credit card number if you save it on a website you use to the contents of this page as it's sent over the internet. In each one of these cases, the data being protected needs to be retrieved in its original format at some point in the future hence the need for encryption. That's the fundamental difference with passwords: you never need to retrieve the password you provided to a website at registration, you only need to ensure it matches the one you provide at login hence the use of hashing.

So, where does hashing go wrong and why do websites still ask you to change your password when hashes are exposed? Here's an easy demo - let's just Google the hash from above:

And here we have a whole bunch of websites that match the original password with the hashed version. This is where the deterministic nature of hashes becomes a weakness rather than a strength because once the hash and the plain text version are matched to each other, you've now got a handy little searchable index. Another way of thinking about this is that password hashes are too predictable, so what do we do? Add randomness, which brings us to salt.

Imagine that if instead of just hashing the word "P@ssw0rd", we added another dozen characters to it first - totally random characters - and then we hashed it. Someone else comes along and uses the same password and they get their own salt (which means their own collection of totally random characters) which gets added to the password then hashed. Even with the same password, when combined with a unique salt the resultant hash will, itself, also be unique. So long as the same salt used at registration is added to your password at login (and yes, this means storing the salt alongside the hash in a database somewhere), the process can be repeated and the website can confirm if the password is correct.

Problem is, if someone has all the data out of a database Wattpad style, can't they just reproduce the salting and hashing process? I mean you've got the salt and the hash sitting right there, what's to stop someone from having a great big list of passwords, picking a salt from the database then adding it to each password, hashing it and seeing if it matches the one from the breach? The only thing hampering this effort is time; how long would it take to hash that big list of passwords for one user's record from the database? How long for, in Wattpad's case, more than a quarter of a billion users? That all depends on the hashing algorithm that's been chosen. Old, antiquated hashing algorithms that were never really designed for password storage in the first place can be calculated at a rate of tens of billions per second on consumer-grade hardware. Yes, that's "billion" with a "b" for bravo and for the more technical folks, that's where you're at with MD5 or SHA-1. How long is a hashed password going to remain uncracked at that rate of guesses? Usually, not very long.

Going back to the example in the tweet at the start of this blog post, Wattpad didn't encrypt their customers' passwords, they hashed them. With bcrypt. This is a hashing algorithm designed for storing passwords and what really sets it apart from the aforementioned ones is that it's slow. I mean really slow, like it takes tens of millions of times longer to create the hash. You don't notice this as a customer when you're registering on the site or logging on because it's still only a fraction of a second to calculate a hash of your password, but for someone attempting to crack your password by hashing different possible examples and comparing them to the one in Wattpad, it makes life way harder. But not impossible...

Let me demonstrate: here's the Wattpad registration page:

I was interested in what the password criteria was so I entered a single character and was told that it must be at least 6. Righto, let's now check complexity requirements:

Will 6 all lowercase characters be allowed? Let's submit the registration form and find out:

Yep 😎

Here's the problem with this and it's all going to bring us back to Wattpad's earlier statement about changing your password: because Wattpad's entire password criteria appears to boil down to "just make sure you have 6 or more characters", people are able to register using passwords like the one above. That particular password - "passwo" - appears in Have I Been Pwned's Pwned Password service 3,649 times:

It's a very poor password not because of a lack of numbers, uppercase or non-alphanumeric characters (I could easily make a very strong password that's all lowercase), but because of its predictability and prevalence.

Armed with the knowledge that Wattpad allows very simple passwords, I took a small list of the most common ones that were 6 characters or longer and checked them against a sample of their bcrypt hashes. Let's consider a bcrypt hash like this, for example:

$2y$10$5sqOeY.NDcW8vkr47BIG..PeSddwTT/Z8z0MwvF/92NSSh3UsxA.u

The plain text password that generated that hash is "iloveyou". That's in Pwned Passwords 1.6M times and I would argue it's a rather risky one to allow. But because Wattpad's password criteria is so weak, someone (probably many people) used that password and it was easily cracked.

How about this one:

$2y$10$1Gs7jtaGKJjX/7A1GqE2E.0r94/FnKphjp8dyhOVB0jZXirrkfNZW

The plain text version of that one is... wait for it... "wattpad"! These are easy to verify yourself by using an online tool like bcrypt-generator.com that checks a given password against a given hash.

This is why Wattpad recommends changing passwords - because they can be cracked even when using a good password hashing algorithm. They can't be unencrypted because they weren't encrypted in the first place. If they were encrypted and there was genuine concern they may be unencrypted then that would imply a key compromise in which case all passwords would be immediately decrypted.

So there's your human-readable version of what password hashing is. I'll leave you again with the quote from above I'd far prefer to see in disclosure notices and ideally, a link through to this blog post too so people have accurate information they can make informed decisions on:

A password hash is a representation of your password that can't be reversed, but the original password may still be determined if someone hashes it again and gets the same result.

Security Passwords

We Didn't Encrypt Your Password, We Hashed It. Here's What That Means:

Troy Hunt

Upcoming Events

Must Read