I hit a bit of a milestone last week with HIBP which I thought deserved a little celebration:
Sometime today, @haveibeenpwned broke through the 1M verified subscriber mark. Having a quiet champagne alone before flying home ?? pic.twitter.com/whIss3OXeO— Troy Hunt (@troyhunt) February 2, 2017
A million verified subscribers (that is they've received a welcome email and clicked a link to confirm they actually want in), is a pretty major feat in my books, especially for a somewhat niche service. As I sat on the plane back home, I started to think about where the service now stood in terms of things like subscribers, the notifications it's sent and indeed who's using it for what purposes. I decided that because of this milestone, I should take some time to look at where the service currently stands and to reflect a little on how it's grown. I haven't done this for a while so I'm kind of interested in the stats myself, hopefully they'll be of interest to you too.
Let's kick off here because that was the impetus to this post anyway. I launched the notification feature in December 2013 because, well, it just seemed like a good idea at the time. This was only a few weeks after launching the project itself so the notification service was one of the first real features I built.
Let's start with how that subscriber base has grown over time:
Obviously slow beginnings, but a couple of significant observations in there. One is that there are events which cause massive spikes in registrations. August 2015 was the Ashley Madison breach and on the 20th, there were 30,779 registrations in one day and almost 70k in just a 3-day period. Then in August last year, it was the Dropbox hack and the piece I wrote on that went absolutely viral. That saw 42,599 people sign up on August 31st alone and over 77k across a 3-day period.
The other thing that stands out to me and to be honest, I wasn't completely aware of this, is the rate of growth since about that Dropbox time. In the last 4 months (so well after the Dropbox news had died down), I've seen 218k new subscribers so in other words, almost 1.8k per day. As you can see from the graph, that's a pretty linear growth rate too so we're not talking about one or two newsworthy events driving growth, rather a sustained pattern of people taking an interest in their personal exposure to data breaches.
The big question now though is how many of those people are actually receiving notifications? Let's drill into it.
Every time I load a data breach into HIBP, I run a process that finds the intersection between the email addresses in a breach and the verified subscribers in the system. Every match gets sent an email which over time, has meant the following:
To date, I've sent 868,580 notification emails after loading a breach. That graph obviously has a very slow start in part because back at, say, July 2015 I only had 125k subscribers, but also in part because the scale of breaches changed dramatically last year. That Dropbox breach resulted in 144,136 emails being sent to subscribers. LinkedIn was 111k. Last.fm was almost 100k. MySpace 70k. What makes this challenging as I move forward is the rapid growth of subscribers is dramatically increasing the emails I need to send; there are more than 50% more subscribers now than when I sent all those Dropbox emails and let's face it, Dropbox was "only" 68 million records so if something like that 1 billion record Yahoo breach was ever to surface...
I was talking to an organisation today who's interested in how they can use the data to help their hundreds of millions of customers better understand their exposure and they asked an interesting question - how many people are receiving multiple notifications? Well firstly, of those 868,580 breach notifications sent, there are a total of 396,775 unique subscribers so clearly there's a bunch of people getting multiple notifications. Here's the distribution of how many notifications individual subscribers receive:
Aspects of this were predictable: it tapers off very quickly. But there were things that really surprised me too, for example more people have received 2 or more notifications than have received just a single one. Well over 60k subscribers have received 4 or more notifications. Stunningly, 25 individual subscribers have received 20 or more notifications. That's just for data breaches (not pastes) and yes, they're real email addresses. I don't know what these people have been signing up to... (well actually, I do, I just don't know why there's so many!)
Some of these breaches have been pretty, well, "sensitive" too and a bunch of subscribers have been caught up in that. There were 56 subscribers in the anal fisting site "Rosebutt" (don't worry, that link goes to news and not the actual site!) There were 295 subscribers in the "non-consensual voyeurism site "Candid Board", a site to share photos taken up unsuspecting girls' skirts. Just last week there were 429 subscribers in the Freedom Hosting II breach, a Tor-based hosting providers where half the content was allegedly paedophilia. Clearly, I don't get to choose who appears where, but it goes to show just how far and wide data is spread and how many HIBP subscribers are learning of these (often very unwelcomed) incidents.
That's breaches, but notifications are also sent to subscribers who appear in pastes as well. The paste feature is a construct I introduced in September 2014 and it identifies email addresses dumped at a variety of locations such as Pastebin. Often these are very early indicators of compromise (attackers often post samples of their exploits there) and HIBP automatically trawls multiple sources looking for data. The paste notifications sent is a somewhat interesting chart:
HIBP has sent out a total of 68,974 paste notifications but obviously, there's some really big ones in there. In one case, there were over 25k notifications sent for a single paste. In this case, it was an alleged Gmail breach with a very large number of records. In another case, the Plex data breach was pasted publicly and nearly 10k notifications went out then. Other times I've added new paste sources and a there's been a flurry of activity as they've been indexed and notifications sent hence big jumps in one go. For something that predominantly runs automatically in the background, the paste service has proven to be pretty effective.
The subscriptions and notifications above are all happening at an individual level, but I've also offered the ability to monitor at the domain level since just after I launched HIBP. There's a verification process where proof needs to be demonstrated that the requestor controls either the domain or the site behind it then as with the individual subscriptions, notifications will be automatically sent out. Plus, the domain subscriber can run on-demand searches whenever they please too. Here's how the searches have stacked up over time:
That chart aligns somewhat with the verified subscribers earlier on and inevitably as the service gets more airtime by media and in the wake of large data breaches, more people use it. There's been 55,414 verified searches to date, that is a search where the person running it has since gone on to demonstrate that they do indeed have control of the domain or website.
Naturally, there are also many notifications sent when email addresses appear on those domains:
A total of 96,790 to date which is much less than the number sent to individuals, but then each of those notifications also often relates to multiple emails accounts for the one domain (sometimes there's thousands of one organisation's addresses in a single breach). I don't actually store this data (how many email addresses were impacted by a notification) so I can't report on it, but it's expressed in the notification email sent to the subscriber. The paste notification trend is pretty much what you'd expect so I'll save you from another graph and just share the total figure: there have been 47,180 notifications sent to people monitoring domains which means this:
HIBP has now sent 1,081,524 notification emails to individuals and domain owners.
That's a stat I'm enormously proud of because it's a huge number of people that have benefited from the service. But there's another really interesting angle to all this, and that's who the organisations are that are using HIBP.
Who's monitoring domains?
I'm obviously going to be careful to protect the identities of the organisations involved here, but the topic deserves some attention because even I didn't fully appreciate the scope involved until I reviewed the actual data.
One aspect I was interested in was the use of HIBP by large companies, so I turned to a list of the Fortune 500 companies and their primary domain names. Now this was always going to be a low-ball figure because it doesn't include the hundreds (or even thousands) of other domains for brand names those top 500 companies hold, but I still found what I thought was a surprisingly high figure:
At least 70 of the Fortune 500 companies have successfully run verified domain searches on HIBP.
Again, that number is inevitably way too low. If, for example - and it's only an example - Apple was using the service but never did a search for apple.com and instead did one for me.com or icloud.com, they'd be excluded from the result.
Let's try notifications:
At least 62 of the Fortune 500 companies have received domain notifications after a breach was loaded into HIBP.
I'm pretty chuffed knowing that 12% of the largest companies in the US have been using this service for proactive monitoring and they're actually getting value out of it. It's a very diverse group of companies too spanning all sorts of different industries, everything I could imagine just by eyeballing the list. They're companies whose products you use every single day and they're using HIBP to identify risks to their organisations as soon as I know of them.
As interesting as that is, it's data I've pulled from the system and prepared into the stats above. But there are many, many other organisations I've been working with that have reached out directly and requested support. Let me give you a sense of that and again, I'll obviously be cautious about protecting identities here.
I've had a lot of requests over the years by orgs with very large numbers of domain names they want to monitor. Often this is due to the diverse range of brands a company has which makes automatic verification difficult. I've ended up having some great chats with these companies and ultimately loaded domains into the system after doing my own manual verification. Think about companies such as telcos, automotive, airlines and even government and military departments. You'd be genuinely surprised at some of the brands that are so ingrained into our cultures that have reached out for support and now use HIBP on a regular basis.
The service is being used extensively for precisely what I'd hoped - making security better for those impacted by data breaches, both individuals and organisations alike. The stats are great as is the nature of the orgs proactively using HIBP, but I was also really interested to hear firsthand how it was going. Here's what I heard:
How it's helping people
I actually reached out via Twitter for comments on how orgs were using HIBP and I got some great feedback which I'm sharing with permission here, albeit anonymously.
One of the early comments I got related to a more consumer-centric use case, but I'm relaying it here because I think it's a really cool example:
HIBP has helped me because I was able to use it to show my non-tech family members how easy it was to find out if they had information leaked out there. Now most of my family is signed up for notifications!
By virtue of you being here reading this post, you're almost certainly more aware of how exposed we all are than your average person and we kinda forget that; we forget that despite how "normal" this data breach thing is to many of us, it's a foreign concept to most people. I had a recent experience which was somewhat similar where I showed my father how to search his email address and he promptly discovered that he was in the Dropbox breach. Even though he has a fair idea of what it is I do, it still came as a shock and a bit of a wakeup call.
That same awareness observation was made in this comment:
The website made me realize how I should never trust the security of any website, no matter how large or how small it is. While I had already begun compiling a list of my various accounts around the web, your site was the tipping point for me. I made it a point to ensure each site had a different password and 2FA was enabled wherever possible. For sites I no longer used, I set throwaway passwords, changed my email, and closed accounts.
And that's great because it's a genuine behaviour-changing event. Unique passwords combined with multi step verification has a fundamentally positive impact on an individual's security profile. Onto genuine org uses:
It's been of great help convincing customers that database breaches do happen, that password hashing makes a big difference, and that password reuse is a real thing.
I hear a lot of this - using HIBP to show the harsh, cold reality of our exposure. For an organisation, seeing the names of employees next to data breaches can have a profound effect on the organisational attitude towards security. To that effect, many organisations have made handling HIBP notifications a formal part of how they manage their security posture:
I monitor 50 of our company domains with this. our NOC team notifies the Security officer which in his case creates tickets for the service desk to contact the affected users and guides them in changing their passwords and explaining the impact, also on their personal life/accounts.
I like the responsibility shown here in terms of recognition that security incidents impact personal lives too. In fact, I think what I most like about this is that it moves us away from the corp-only or personal-only attitudes so many organisations have traditionally held. The reality is that compromises of personal security can have consequences on organisational security as well.
* Alerting users so they can take (reactive) pro-active measures once the dumps are (semi) public
* Real-world examples for security awareness sessions
* 're-education' of users whose corporate accounts repeatedly appear in breaches of non-work related systems
That last point is particularly interesting because it raises a prickly question: what should organisations do when they find employees in, well, "sensitive" data breaches? I first started having these discussions with organisations after the Ashley Madison data breach and as you can imagine, organisational responses are somewhat varied. Just this week, I had an interesting comment on my post about fabricated breaches which speaks to this issue:
Having been signed up for domain notifications for well over a year now, I can start to correlate the data in the alerts we're getting. We have a small handful of employees who (incredibly!) signed up for every dating & porn site out there that have had breaches and have been ingested into HIBP.
I was curious as to how this organisation dealt with these incidents so I asked for some more info and got a great response:
In terms of how we deal with the info, it depends on the nature of the breach. For all breaches where alerts are received, employees are notified, told to change their passwords on the site in question and at all other sites where the same creds were used. This tends to become a valuable teaching moment as nothing concentrates the mind more than a breach of an active account.
Which is pretty consistent with earlier feedback, but it's the sensitive stuff I was most interested in:
For 'sensitive' breaches, HR tends to get involved. Two of the most prolific offenders had all Internet access removed (they were warehouse guys so not critical for job) and were warned that any more violations of company policy would result in termination.
Now there's a part of me that's very conscious of the privacy aspect here; my system has just been used to "out" two blokes who were doing something of a very personal nature. However, it's also pretty evident that they shouldn't have been doing it on the company time with the company machine using the company email! But it gets really interesting when it's execs caught in the same situation:
For high-profile users (senior managers & C-suite), they get a personal visit and a review of their password habits, patching status...etc. If a senior exec has been involved in a sensitive breach....well that's where it gets interesting as there is the potential for blackmail, reputational damage...etc. We have only had one of those and it resulted in discussions with HR, CEO, CIO, CISO. I was not privy to how these discussions went and how it all turned out.
Consider how this can turn the discussion to "how can we protect the organisation based on what we know of the exposure of our executives". I'll give you a perfect example: after the Ashley Madison incident, I was contacted by someone in desperate need of advice. He'd found himself in the breach after using HIBP and was in a position that put him and his organisation at particular risk. He wanted to talk, which is something I wouldn't normally do, but it was a pretty unique predicament so we caught up via Skype. Turned out he was a C-suite exec at a large company in merger talks and his exposure via the breach could be seen as a black mark against his reputation and by extension, against his company's. I find it fascinating how these stories have emerged and how this little service has contributed to awareness.
There was a time when I used to track where HIBP appeared in the media, in fact there's still a page on the site with media appearances (which I don't explicitly link to anywhere) but as you'll see, I gave up on that about 18 months ago. I didn't maintain it simply due to the crazy volumes of references, sometimes dozens per day turning up in my Google alerts. Even in recent weeks outside of any notable news stories, I don't think a day has gone by where HIBP hasn't been referenced in a story somewhere or other.
A search of the news via Google is probably the best reference these days and at the time of writing, there's going on 4k results with a bunch of press about the demise of Freedom Hosting II whose data I loaded in last week. There's various forum breaches and even a couple of totally unrelated Trump stories, one due to his press secretary appearing in a breach and another related to his appointment of Giuliani, another individual who's apparently been breached.
In more focused press, Motherboard did a very nice writeup last year on The Rise of ‘Have I Been Pwned?’, an Invaluable Resource in the Hacking Age which I thought they did a great job of. WIRED also did a very nice feature piece and I particularly like the outcome of the photo shoot we did, in fact I now have that story framed on my wall (although standing around on rainy London streets with photographers, a cold and no voice was much less fun...)
The attention the service has received via the security industry has also been really heartening to see. For example, Mikko Hypponen's recent Reddit AMA where he encouraged everyone to sign up to the service as part of people protecting themselves online. He's a guy I respect enormously so obviously, I was pretty excited to see that. I've also had a chance to present what I've learned in running the project at events all around the world, including next week when I'm at the RSA conference in San Francisco (in the same week as I'm presenting how I built it at Microsoft Ignite here at home!)
Summary... and what's next
These were the metrics which came to mind and struck me as particularly interesting. I've learned a bunch in writing this (as I do with so many of my blog posts), and I hope the stats here have been interesting to read. If there's other things you'd like to know then do ask in the comments section below.
As for what's next, I've still got a heap of data to trawl through and there are always interesting discussions happening about the way people want to use the service. I'm also noticing increased interest from organisations wanting to take a more active role in the project; VCs, partnerships, acquisitions and so on. Many of these approaches aren't in keeping with what I believe the project should do, but a few of them have started to get closer to where I'd like HIBP to go in the long term. At some stage, it may well make sense to head in that direction but until that time, it's business as usual and you'll continue to see much more writing, news and general anecdotes about the things I've learned right here.