Today, almost one year after the release of version 5, I'm happy to release the 6th version of Pwned Passwords. The data set has increased from 555,278,657 known compromised passwords to a grand total of 572,611,621, up 17,332,964 (just over 3%). As with previous releases, I've made the call to push the data now simply because there were enough new records to justify the overhead in doing so.
Also as with previous releases, version 6 not only introduces a heap of new records but also updates the prevalence count on the existing ones. For example, the old favourite "P@55w0rd" has gone from 2,929 occurrences to 3,069 so still a terrible password, just a little more terrible than what it was before.
As the size of the corpus increases, new passwords tend to be less common than those that were already in there. For example, the password "Your password" now makes an appearance as does "bullet_hole" and "Pssw0r". Further, a whole bunch of passwords that, um, well, I can't really print here also make an appearance, but use your imagination and you'll probably be able to work out a few of those.
In terms of the Pwned Passwords service itself, it continues to see steady growth:
Pwned Passwords in @haveibeenpwned is going from strength to strength - 25M requests in the last 24 hours with a cache hit ratio of 99% 😎 /cc @IcyApril pic.twitter.com/heWBVEVxDD
— Troy Hunt (@troyhunt) June 18, 2020
I decided to frame that tweet in precisely the same fashion as I did the one in last year's blog post, but that was "only" 16M requests in the previous 24 hours back then. I can't attribute the growth to any one single source, rather a heap of individual cases just like this one:
Building in some pwned passwords goodness. No more making every password 'avid', post production industry, muwahahaha. @troyhunt pic.twitter.com/jHSMFqFbUw
— Matt Dwen (@mattdwen) June 9, 2020
I have every intention of keeping Pwned Passwords freely available and not requiring any sort of auth at all because I genuinely want to see growth like this continue. Mind you, I also encourage anyone not keen on using the k-anonymity model to just download the whole set and as with previous years, it's all available as either SHA-1 or NTLM hashes (read the rationale behind the choice of these algorithms if you think they look a little dated). If you want to use the API, an improvement to the implementation since the last release is the padding feature I wrote about a few months ago. Today's release changes nothing on that front; the same amount of padding is still used as there's only a 3% increase in response sizes (we catered for way more than that when choosing the amount of padding to use).
If you're not sure whether or not you're searching against the latest data set, check the "last-modified" response header and make sure it's the 19th of June this year (the day I uploaded the data to Azure):
last-modified: Fri, 19 Jun 2020 00:47:46 GMT
All cache at Cloudflare should have been flushed and any searches from here on in should show a date and time around the one above. Because of that Cloudflare cache, anyone measuring response times of the service might also see a small increase whilst those 16^5 different possible hash ranges populate back out into Cloudflare's edge nodes. And while I'm talking about Cloudflare, I want to recognise their support again in providing the services that make Pwned Passwords not just super fast for everyone hitting the API, but also super cheap for me to run:
With @Cloudflare's support, I've saved almost 38TB of bandwidth from @haveibeenpwned Pwned Passwords in the last month. If I'd served all that directly from @Azure, the bill would have been... unpleasant 😲 pic.twitter.com/kMG10SOqLG
— Troy Hunt (@troyhunt) June 19, 2020
So that's version 6. No promises on when there'll be a version 7 because as with all the previous ones, it's entirely predicated on having enough new passwords to justify a new release. I'm still dependent on having them in plain text either due to the way they were stored or by virtue of someone going and cracked a bunch of them. With more and more websites actually doing their password storage well, that's all becoming a rarer circumstance. Which is good 😊