Have I been pwned, opting out, VTech and general privacy things

It’s now going on two and a half years since I launched Have I been pwned (HIBP) and I’m continually amazed by how much has happened in that time. It started out with a “mere” 152M breached records and has now more than doubled in volume, I’ve added an API, notifications, domain searches, pastes and a heap of other things both visible to the public and behind the scenes. It’s also gone from a hobby project which I thought only a few curious technology people would visit to a site that’s seen over a million visitors in a single day in the wake of the Ashley Madison breach and has been covered by some of the world’s largest media outlets.

The breaches themselves have gotten me involved with a side of security I previously had limited insight into, a side that many of us probably don’t think about that much. For example, the outpouring of deeply sensitive information from Ashley Madison members. Another side of data breaches I didn’t fully appreciate earlier on is the prevalence with which they’re traded online. In fact to my delight, HIBP appears to have proven detrimental to this trade which is an entirely unexpected yet pleasing outcome. As you can imagine, running this service has also exposed me to some very interesting characters; sometimes highly intelligent, other times (mostly) innocent juveniles and other times, a downright nasty side of society.

Dealing with data of this nature means that privacy is always going to be something I need to be especially mindful of. From the outset, I made the call to never store anything other than email addresses in the system so no other data breach attributes such as passwords or even names touch HIBP. Last year I introduced “sensitive” breaches as well, that is I made certain incidents such as Ashley Madison, Adult Friend Finder and YouPorn unavailable for public search due to the potential for it to have a serious adverse impact on those exposed. Yes, they’re all readily obtainable data breaches circulating around in public anyway, but I don’t want to be the one that facilitates a simple online search that then causes someone serious harm as a result of them being found in one of these sites.

The ethics around how I handle data on HIBP is something I put a huge amount of thought into. I hope that by virtue of the transparency I’ve always had with how I run this site, people see it as a resource for good and that’s almost unanimously been the feedback from people – 99.9% positive, I’d say. To that effect, I want to write about two more things I’ve changed on the site just today in order to continue focusing on privacy in an ethical way: the ability to opt out of being publicly discoverable in any data breaches in HIBP and the removal of the VTech data. Let me fully explain the rationale behind both of these changes.

Opting out of HIBP

I’m not sure how many people have visited HIBP since I started it back in 2013. Google Analytics reckons about 15M page views and there’s a heap more that wouldn’t have been counted thanks to ad blockers and VPNs that keep the trackers away. All I know is that it’s 8 figures which at least in my books, is a hell of a lot.

To date, I’ve had 5 people ask me to remove their data. Five. That is all. A few of them have come across tersely to begin with but I’ve always been conciliatory and immediately obliged after which they usually turn out to be pretty cool. Often they’re unaware that sites are disclosing their presence on them anyway via enumeration risks; try entering an email address into Adult Friend Finder’s password reset page and you’ll see what I mean.

But regardless of the ease with which you can discover someone’s presence on almost every one of the sites already loaded into HIBP, I understand the sentiment. Last year I had someone create a UserVoice suggestion requesting I provide a way for people to remove themselves and that’s now precisely what I’ve done. Here’s how it works: There’s now an “Opt out” link under the “About” menu item:

Opt out link in the navigation

That brings you to the opt out page:

Opt out page asking for email address

The reCAPTCHA is necessary to ensure there’s no abuse that causes emails to be sent to unsuspecting people en mass. Sending an email is necessary because that’s the verification channel to ensure someone does indeed want to opt out:

Opt out verificaiton email

The email and what happens next is pretty self-explanatory, let me explain how the mechanism works on the back end.

Firstly, in order to know who wants to opt out, I need to store the person’s email address. I could always just nuke their entire record now, but that doesn’t keep them opted out in the future if a new breach appears that include them. I also wanted to give people the flexibility to still use the service if they wanted to; they should be able to privately search for their exposure and the notification system should still work. To make all of this possible and still ensure both data imports and searches remain efficient, I now flag someone’s record in both the breaches table and the pastes table when they want to opt out. These are Azure Table Storage entities which are super fast to retrieve by key but slow to run ad hoc queries on. What this model means is that when a search is done for someone and their record is flagged as “opt out”, the result is simply 404 (not found), which is identical to if they were never in there in the first place.

That last point is important too: there’s no public indication that someone has opted out. If they’re taking the view that they don’t want to be publicly located then I think it’s fair to assume that applies not disclosing that fact as well. That means that both the search facility and the opt out process itself will not publicly disclose their opted out state – there are no known enumeration risks either explicitly or via timing attacks or anything like that. Having said that, if someone was to trawl through the Adobe breach (which is still easily located on the web) and find that an address in there is not in HIBP then clearly that would imply the opt out. There’s not really anything I can do about that, but I thought it was worth a mention.

To the earlier point about still being able to search for your own exposure, at any time anyone can use the notification service to check breaches and pastes they’ve been involved in. It will send them an email with a unique link which will take them to a page with the search results regardless of whether they’ve opted out or not and regardless of whether the breach was sensitive or not (i.e. Ashley Madison).

So that’s the opt out facility, let’s get onto VTech.

VTech and breach retirement

In November last year, Motherboard reporter Lorenzo Franceschi Bicchierai contacted me about a serious data breach. He’d been handed a large number of records directly by an individual claiming to have broken into VTech’s systems and thus began a saga that resulted in the details of millions of adults, their children and even their kids’ photos being exfiltrated by the individual. I helped Lorenzo verify that the breach was indeed legitimate and he communicated backwards and forwards with VTech and the attacker. I wrote about it at the time in When children are breached – inside the massive VTech hack.

The VTech data breach was unique for a number of reasons. Firstly, there were only ever three individuals who held the data, those being the guy who obtained it in the first place, Lorenzo and myself. To this day, it remains the only breach I’ve ever loaded into HIBP that other parties didn’t already have their hands on. Secondly, it’s kids and there’s really not much more to say about that; when children are involved, the incident takes on a whole new level of emotion for those caught up in the incident (put yourself in the parents’ shoes for a moment).

An incident of this nature has broad-reaching and long-lasting ramifications. One of those is that the alleged perpetrator was arrested in December. I know nothing more of his fate since that time nor do I know his identity and it may well remain permanently suppressed. Whilst he clearly stepped way over the line of ethical disclosure (he originally claimed he simply wanted to make VTech aware of risks in their assets), a saving grace is that to the best of my knowledge, he never redistributed the data beyond Lorenzo. In fact I’ve frequently had people request it from me but as I’ve said in the past, I don’t give anybody any of this data under any circumstances.

I’ve had chats with many parties about the VTech incident and one thing I’ll make crystal clear up front is that they’ve all been casual, productive and friendly chats. It’s included lawyers involved in class actions on behalf of those in the breach, three letter law enforcement acronyms in multiple countries and legal representatives for VTech themselves. Obviously the nature of those discussions has differed but there’s been one constant: nobody wants to see the data spread in the way that so many other data breaches have in the past. Because of this, I’ve agreed to ensure that can never happen due to anything on my end going wrong and I’ve now permanently deleted both the original data breach Lorenzo sent to me and the VTech data in HIBP.

For the sake of full transparency, this was not as a result of a demand or a threat or anything of that nature, rather it’s to give families impacted by the incident that little bit of extra certainty that they can put this incident behind them. This was a concern that was raised by all of the parties I mentioned earlier and regardless of how capable I may think I am of protecting data, it gives these families further peace of mind because I simply cannot lose what I do not have. I’ve been assured that every single person impacted by that incident has been contacted directly by VTech so there should be no interested parties left who aren’t aware of their exposure. These factors together – me being one of the only parties with the data, the concern families have over their kids and everyone in there having been notified already – are what’s driven me to remove that data because it’s the right thing to do.

In terms of the mechanics of how this works in HIBP, I actually had to build a new process to remove the data. As I’ve written before, it’s in Azure Table Storage rather than a relational database so it’s not a simple matter of DELETE FROM, rather I had to enumerate over every single record and remove entries one by one, all 4.8 million of them. You’ll still see VTech listed as a site that was breached, but I’ve now introduced the concept of a “retired breach” and you’ll see a corresponding logo next to it:

Top 10 breaches showing VTech as "retired"

I elected to continue showing it here and not erase all memory of it in part because it has historical significance as a major industry data breach and also due to the referential integrity dependency of HIBP subscribers. I notified 1.7k of them who were found in the VTech breach and I have a historical record of having sent them an email. Whilst removing the 4.8M addresses themselves from the searchable index in HIBP was the right thing to do, it’s a different story for that small number who willingly provided their info to HIBP and asked the service to watch out for them.

This is a natural evolution for HIBP that has negligible real world impact. Continuing to run this service responsibly is absolutely paramount and measures such as the opt out feature and removing the VTech data are consistent with the ethical approach I believe this class of information deserves.


The only reason this service has been able to continue running unhindered is because I’ve always erred on the side of caution when it comes to how I handle data. A month before the Ashley Madison data even hit the torrents, I’d concluded that it shouldn’t be publicly searchable which I’m confident is the reason HIBP never received a DMCA takedown or even a hint of a legal threat. Regardless of any concerns about that, it was the right thing to do as is removing peoples’ data from public searching and retiring the VTech breach.

The things I’ve outlined in this blog post might be viewed as taking the moral high-ground and that may well be right – I hope it’s right – but more than anything, it’s to try and help this project achieve what I ultimately created it for and that’s to help people better protect themselves online.

Have I Been Pwned Privacy
Tweet Post Update Email RSS

Hi, I'm Troy Hunt, I write this blog, create courses for Pluralsight and am a Microsoft Regional Director and MVP who travels the world speaking at events and training technology professionals