Telegram Combolists and 361M Email Addresses

Last week, a security researcher sent me 122GB of data scraped out of thousands of Telegram channels. It contained 1.7k files with 2B lines and 361M unique email addresses of which 151M had never been seen in HIBP before. Alongside those addresses were passwords and, in many cases, the website the data pertains to. I've loaded it into Have I Been Pwned (HIBP) today because there's a huge amount of previously unseen email addresses and based on all the checks I've done, it's legitimate data. That's the high-level overview, now here are the details:

Telegram is a popular messaging platform that makes it easy to stand up a "channel" and share information to those who wish to visit it. As Telegram describes the service, it's simple, private and secure and as such, has become very popular with those wishing to share content anonymously, including content related to data breaches. Many of the breaches I've previously loaded into HIBP have been distributed via Telegram as it's simple to publish this class of data to the platform. Here's what data posted to Telegram often looks like:

These are referred to as "combolists", that is they're combinations of email addresses or usernames and passwords. The combination of these is obviously what's used to authenticate to various services, and we often see attackers using these to mount "credential stuffing" attacks where they use the lists to attempt to access accounts en mass. The list above is simply breaking the combos into their respective email service providers. For example, that last Gmail example contains over a quarter of a million rows like this:

That's only one of many files across many different Telegram channels. The data that was sent to me last week was sourced from 518 different channels and amounted to 1,748 separate files similar to the one above. Some of the files have literally no data (0kb), others are many gigabytes with many tens of millions of rows. For example, the largest file starts like this:

That looks very much like the result of info stealer malware that has obtained credentials as they were entered into websites on compromised machines. For example, the first record appears to have been snared when someone attempted to login to Nike. There's an easy way to get a sense of the accuracy of this data, just head over to the Nike homepage and click the login link which presents the following screen:

They serve the same page to both existing subscribers and new ones but then serve different pages depending on whether the email address already has an account (a classic enumeration vector). Mash the keyboard to create a fake email address and you'll be shown a registration form, but enter the address in the stealer log and, well, you get something different:

The email address has an account, hence the prompt for a password. I'm not going to test the password because that would constitute unauthorised access, but I also don't need to as the goal has already been achieved: I've demonstrated that the address has an account on Nike. (Also note that if the password didn't work it wouldn't necessarily mean it wasn't valid at some point in time at the past, it would simply mean it isn't valid now.)

Footlocker tries to be a bit more clever in avoiding enumeration on password reset, but they'll happily tell you via the registration page if the email address you've entered already exists:

Even the Italian tyre retailer happily confirmed the existence of the tested account:

Time and time again, each service I tested confirmed the presence of the email address in the stealer log. But are (or were) the passwords correct? Again, I'm not going to test those myself, but I have nearly 5M subscribers in HIBP and there's always a handful of them in any new breach that are happy to help out. So, I emailed some of the most recent ones, asked if they could help with verification and upon confirmation, sent them their data.

In reaching out to existing subscribers, I expected some repetition in terms of them already appearing in existing data breaches. For one person already in 13 different breaches in HIBP, this was their response:

Thanks Troy. These details were leaked in previous data breaches.

So accurate, but not new, and several of the breaches for this one were of a similar structure to the one we're talking about today in terms of them being combolists used for credential stuffing attacks. Same with another subscriber who was in 7 prior breaches:

Yes that’s familiar. Most likely would have used those credentials on the previous data breaches. 

That one was more interesting as of the 7 prior breaches, only 6 had passwords exposed and none of them were combolists. Instead, it was incidents including MyFitnessPal, 8fit, FlexBooker, Jefit, MyHeritage and ShopBack; have passwords been cracked out of those (most were hashed) and used to create new lists? Very possibly. (Sidenote: this unfortunate person is obviously a bit of a fitness buff and has managed to end up in 3 different "fit" breaches.)

Another subscriber had an entry in the following format, similar to what we saw earlier on in the stealer log:[email]:[password]

They responded to my queries with the following:

I think that epic games account was for my daughter a couple of years ago but I cancelled it last year from memory. That sds like a password she may have chosen so I'll check with her in an hour or two when I see her again. 

And then, a little bit later

My daughter doesn't remember if that was her password as it was 4-5 years ago when she was only 8-9 years old. However it does sound like something she would have chosen so in all probability, I would say that is a legitimate link. We believe it was used when she played a game called Fortnite which she did infrequently at that time hence her memory is sketchy. 

I realised that whilst each of these responses confirmed the legitimacy of the data, they really weren't giving me much insight into the factor that made it worth loading into HIBP: the unseen addresses. So, I went through the same process of contacting HIBP subscribers again but this time, only the ones that I'd never seen in a breach before. This would then rule out all the repurposed prior incidents and give me a much better idea of how impactful this data really was. And that's when things got really interesting.

Let's start with the most interesting one and what you're about to see is two hundred rows of stealer logs:[email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password]
android://pfDvxsQIIXYFer6DxBcqXjgyr9X3z0_f4GlJfpZMErP2oGHX74fUnXpWA29CNgnCyZ_phC8IyV0exIV6hg3iyQ==@com.sixt.reservation/:[email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password].[email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password]
android://pfDvxsQIIXYFer6DxBcqXjgyr9X3z0_f4GlJfpZMErP2oGHX74fUnXpWA29CNgnCyZ_phC8IyV0exIV6hg3iyQ==@com.sixt.reservation/[email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password]. [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password] [email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password][email]:[password]

Even without seeing the email address and password, the commonality is clear: German websites. Whilst the email address is common, the passwords are not... at least not always. In 168 instances they were near identical with only a handful of them deviating by a character or two. There's some duplication across the lines (9 different rows of Netflix, 4 of Disney Plus, etc), but clearly this remains a significant volume of data. But is it real? Let's find out:

The data seems accurate so far. I have already changed some of the passwords as I was notified by the provider that my account was hacked. It is strange that the Telekom password was already generated and should not be guessable. I store my passwords in Firefox, so is it possible that they were stolen from there?

It's legit. Stealer malware explains both the Telekom password and why passwords in Firefox were obtained; there's not necessarily anything wrong with either service, but if a machine is infected with software that can grab passwords straight out of the fields they've been entered into in the browser, it's game over.

We started having some to-and-fro as I gathered more info, especially as it related to the timeframe:

It started about a month ago, maximum 6 weeks. I use a Macbook and an iPhone, only a Windows PC at work, maybe it happened there? About a week ago there was an extreme spam attack on my Gmail account, and several expensive items were ordered with my accounts in the same period, which fortunately could be canceled.

We had the usual discussion about password managers and of course before that, tracking down which device is infected and siphoning off secrets. This was obviously distressing for her to see all her accounts laid out like this, not to mention learning that they were being exchanged in channels frequented by criminals. But from the perspective of verifying both the legitimacy and uniqueness of the data (not to mention the freshness), this was an enormously valuable exchange.

Next up was another subscriber who'd previously dodged all the data breaches in HIBP yet somehow managed to end up with 53 rows of data in the corpus:

[email]:Gru[redacted password]
[email]:fux[redacted password]
[email]:zWi[redacted password]
[email]:6ii[redacted password]
[email]:qTM[redacted password]
[email]:Pre[redacted password]
[email]:i8$[redacted password]
[email]:9cr[redacted password]
[email]:fuc[redacted password]
[email]:kuM[redacted password]
[email]:Fuc[redacted password]
[email]:Pre[redacted password]
[email]:Vxt[redacted password]
[email]:%3r[redacted password]
[email]:But[redacted password]
[email]:1qH[redacted password]
[email]:^VS[redacted password]
[email]:But[redacted password]
[email]:Nbs[redacted password]
[email]:*W2[redacted password]
[email]:$aM[redacted password]
[email]:DA^[redacted password]
[email]:vPE[redacted password]
[email]:Z8u[redacted password]
[email]:But[redacted password]
[email]:aXi[redacted password]
[email]:rPe[redacted password]
[email]:b4F[redacted password]
[email]:2u&[redacted password]
[email]:5%f[redacted password]
[email]:Lmt[redacted password]
[email]:Tem[redacted password]
[email]:fuc[redacted password]
[email]:*e@[redacted password]
[email]:(k+[redacted password]
[email]:Ste[redacted password]
[email]:^@f[redacted password]
[email]:XT$[redacted password]
[email]:25@[redacted password]
[email]:Jav[redacted password]
[email]:U8![redacted password]
[email]:LsZ[redacted password]
[email]:But[redacted password]
[email]:g$V[redacted password]
[email]:M9@[redacted password]
[email]:!6D[redacted password]
[email]:Fac[redacted password]
[email]:but[redacted password]
[email]:Why[redacted password]
[email]:h45[redacted password]
[email]:blo[redacted password]
[email]:azT[redacted password]

I've redacted everything after the first three characters of the password so you can get a sense of the breadth of different ones here. In this instance, there was no accompanying website, but the data checked out:

Oh damn a lot of those do seem pretty accurate. Some are quite old and outdated too. I tend to use that gmail account for inconsequential shit so I'm not too fussed, but I'll defintely get stuck in and change all those passwords ASAP. This actually explains a lot because I've noticed some pretty suspicious activity with a couple of different accounts lately.

Another with 35 records of website, email and password triplets responded as follows (I'll stop pasting in the source data, you know what that looks like by now):

Thank you very much for the information, although I already knew about this (I think it was due to a breach in LastPass) and I already changed the passwords, your information is much more complete and clear. It helped me find some pages where I haven't changed the password.

The final one of note really struck a chord with me, not because of the thrirteen rows of records similar to the ones above, but because of what he told me in his reply:

Thank you for your kindness. Most of these I have been able to change the passwords of and they do look familiar. The passwords on there have been changed. Is there a way we both can fix this problem as seeing I am only 14?

That's my son's age and predictably, all the websites listed were gaming sites. The kid had obviously installed something nasty and had signed up to HIBP notifications only a week earlier. He explained he'd recently received an email attempting to extort him for $1.3k worth of Bitcoin and shared the message. It was clearly a mass-mailed, indiscriminate shakedown and I advised him that it in no way targeted him directly. Concerned, he countered with a second extortion email he'd received, this time it was your classic "we caught you watching porn and masturbating" scam, and this one really had him worried:

I have been stressed and scared about these scams (even though I shouldn’t be). I have been very stressed and scared today because of another one of those emails.

Imagine being a young teenage boy and receiving that?! That's the sort of thing criminals frequenting Telegram channels such as the ones in question are using this data for, and it's reprehensible. I gave him some tips (I see the sorts of things my son's friends randomly install!) and hopefully, that'll set him on the right course.

They were the most noteworthy responses, the others that were often just a single email address and password pair just simply reinforced the same message:

Yes, this is an old password that I have used in the past, and matches the password of my accounts that had been logged into recently.


Yes that password is familiar and accurate. I used to practice password re-use with this password across many services 5+ years ago.This makes it impossible to correlate it to a particular service or breach. It is known to me to be out there already, I've received crypto extortion emails containing it.

I know that many people who find themselves in this incident will be confused; which breach is it? I've never used Telegram before, why am I there? Those questions came through during my verification process and I know from loading previous similar breaches, they'll come up over and over again in the coming days and I hope that the overview above sufficiently answers these.

The questions that are harder to answer (and again, I know these will come up based on prior experience), are what the password is that was exposed, what the website it appeared next to was and, indeed, if it appeared next to a website at all or just alongside an email address. Right at the beginning of this project more than a decade ago, I made the decision not to load the data that would answer these questions due to the risk it posed to individuals and by extension, the risk to my ability to continue running HIBP. We were reminded of how important this decision was earlier in the year when a service aggregating data breaches left the whole thing exposed and put everyone in there at even more risk.

So, if you're in here, what do you do? It's a repeat of the same old advice we've been giving in this industry for decades now, namely keeping devices patched and updated, running security software appropriate for your device (I use Microsoft Defender on my PCs), using strong and unique passwords (get a password manager!) and enabling 2FA wherever possible. Each HIBP subscriber I contacted wasn't doing at least one of these things, which was evident in their password selection. Time and time again, passwords consisted of highly predictable patterns and often included their name, year of birth (I assume) and common character substitutions, usually within a dozen characters of length too. It's the absolute basics that are going wrong here.

To the point one of the HIBP subscribers made above, loading this data will help many people explain why they've been seeing unusual behaviour on their accounts. It's also the wakeup call to lift everyone's security game per the previous paragraph. But this also isn't the end of it, and more combolists have been posted in more Telegram channels since loading this incident. Whilst I'm still of the view from years ago that I'm not going to continuously load endless lists, I do hope people recognise that their security posture is an ongoing concern and not just something you think about after appearing in a breach.

The data is now searchable in Have I Been Pwned.

Have I Been Pwned
Tweet Post Update Email RSS

Hi, I'm Troy Hunt, I write this blog, create courses for Pluralsight and am a Microsoft Regional Director and MVP who travels the world speaking at events and training technology professionals