The 42M Record Credential Stuffing Data

This is going to be a brief blog post but it's a necessary one because I can't load the data I'm about to publish into Have I Been Pwned (HIBP) without providing more context than what I can in a single short breach description. Here's the story: is a free, public, anonymous hosting service. The operator of the service (Kayo) reached out to me earlier this week and advised they'd noticed a collection of files uploaded to the site which appeared to contain personal data from a breach. Let me be crystal clear about one thing early on:

This is not about a data breach of - there's absolutely no indication of any sort of security incident involving a vulnerability of that service.

Concerned that the data may indicate a previously unknown breach, Kayo then sent me over a total of 755 files totaling 1.8GB. The vast majority of the files were in a format similar to this: data

This is very typical username:password pair used in credential stuffing attacks. These attacks typically take data from multiple breaches then combine them into a single unified list so that they can be used in account takeover attempts on other services. In May last year, I loaded more than 1 billion records from other incidents very similar to this and the real risk it poses to people is that if they've reused their password in multiple places, each of those accounts is now in jeopardy if the username and password appears in one of these lists.

The data also contained a variety of other files; some with logs, some with partial credit card data and some with Spotify details. This doesn't indicate a Spotify breach, however, as I consistently see pastes implying a breach yet every time I've delved into it, it's always come back to account takeover via password reused. In short, this data is a combination of sources intended to be used for malicious purposes.

When I pulled the email addresses out of the file, I found almost 42M unique values. I took a sample set and found about 89% of them were already in HIBP which meant there was a significant amount of data I've never seen before. (Later, after loading the entire data set, that figure went up to 93%.) There was no single pattern for the breaches they appeared in and the only noteworthy thing that stood out was a high hit rate against numeric email address aliases from Facebook also seen in the (most likely fabricated) Badoo incident. Inverting that number and pro-rata'ing to the entire data set, I'd never seen more than 4M of the addresses. So I loaded the data.

There's always questions after data like this is loaded so let me do a very brief Q&A:

Do the filenames indicate the source? No, each file name is obfuscated, I believe as part of the upload process to

Can I provide the password used? No, I've written about why not and it still poses an unacceptable risk to both individuals in the breach and myself.

Had these passwords been seen before? I found a sample set of the data showed that more than 91% of the passwords were already in Pwned Passwords so if you're worried about yours, check there.

Will you load these into Pwned Passwords? Possibly. My hesitation is that there's a large number of files that aren't all in a consistent format so it's a non-trivial exercise. I'm committing to looking at it, but I can't put a timeframe on it.

Doesn't this make the data useless in HIBP? Time and time again, I've asked if I should load incidents like this under the constraints mentioned above and I always get a resounding "yes". If it's not of use to you, ignore it.

What can I actually do about this? These lists take advantage of password reuse so if you're not reusing passwords, you're all good. If you are, get a password manager (I use 1Password).

In short, this is another one of those awareness incidents. I made a commitment to HIBP subscribers to let them know when I see their data so here we are, even if it's not as immediately actionable as a data breach with a clearly identifiable source is. To be honest, if your personal security practices are up to scratch (password manager plus 2FA), this is a bit of a non-event.

Finally, I want to thank Kayo for the support and I'll ask for their input in the comments below if there's any questions related specifically to the service.

The credential stuffing data is now in HIBP and as with previous similar data sets, it's flagged as unverified.

Have I Been Pwned Security
Tweet Post Update Email RSS

Hi, I'm Troy Hunt, I write this blog, create courses for Pluralsight and am a Microsoft Regional Director and MVP who travels the world speaking at events and training technology professionals