Mastodon

The public Have I been pwned API now has a Creative Commons Attribution licence

We're now going on almost 3 years since I introduced the Have I been pwned (HIBP) API. In fact it was one of the first things I did after creating HIBP in the first place because I wanted to make the data as accessible as possible and create an ecosystem of third party apps.

However, over time I've also had to deal with the API being used in ways I never intended. For example, I recently introduced the rate limit because I saw evidence of the API being used in a fashion that was inconsistent with what I (and most others) would reasonably deem to be ethical use. I wanted to make it as broadly accessible as possible in the early days, but ultimately there came a time where it needed to evolve.

One aspect of the API I didn't give a lot of thought to early on is licensing. In fact I deliberately didn't touch on it because I wanted to keep things as open as possible so I could see the directions people wanted to go with the service. The API consumers page lists many excellent uses of it so far including mobile apps, various use cases in different programming languages and all sorts of implementations I never thought of. All this is great - it makes me enormously happy :)

Increasingly though, I've found uses of the API that make me less happy, namely cases where I've looked at a site that represents itself as being a data breach monitoring service and thought "Wow, that's really similar to HIBP", only to realise that each request to it is then passed directly to the HIBP API anyway!

So what's the problem with this? Well there's a few things and one of them is simply confusion it creates. I often have people contacting me and asking "Hey, is this HIBP data?" because they're using another service which not only operates identically, but returns precisely the same results. Are they getting data from HIBP? Is HIBP getting data from them? It's confusing.

In some cases, it isn't simply a case of another service using the HIBP API for the public good, they're commercialising it too. Now here I have a more serious issue and it's not that I'm losing money (I want people to use HIBP for free), rather it's that work I wanted to make accessible to the masses is being monetised by someone else. It's hard to put into words precisely how that makes me feel, but it just doesn't sit well and I think most people understand that.

I've always run HIBP very transparently and I want that ethos to extend to those who wish to leverage the public API. I don't want to stop people using it in the way they are, but I do want them to show the same levels of transparency that I do when they create publicly facing services. As such, the Creative Commons Attribution Licence was a perfect fit and I've now added this to the API documentation:

HIBP Licence

This is really simple: if you want to use the free, publicly available API then just let people know where the data is coming from. I don't mind if services want to charge for that as part of a broader offering but I do want people to know the source of the information. If you're using the API and you want to "white-label" the HIBP data, then contact me and we'll discuss commercial options, some of which I've written about in the past.

The last thing I'll touch on here is the premise of data being free and open: this is not that. In fact I put a lot of effort into making sure the data is not free and open because frankly, it's just not that sort of data. For example, last year I wrote about how I'd handle the Ashley Madison data precisely to ensure it wasn't open. The rate limit I mentioned earlier is also a good example of how I've deliberately worked to ensure the data is not free for all the sorts of reasons I explained there. It's not in the same class as open source software either; this isn't a funky JavaScript library you want spread as far and wide as possible to help developers do their job, it's a serious amount of often sensitive information that needs to be carefully guarded.

I hinted at this earlier today on Twitter and the response was (almost) unanimously positive with most people being very understanding of why attribution is important. By adding this licence, not only does it clear up some of the use cases I've already alluded to, it also paves the way for new consumers to have a much better understanding of how they can use the service right from the outset. I'll be reaching out to any services I know of that aren't in keeping with the philosophy I've outlined here and supporting them whilst they make the transition - I'm certainly not about to cut anyone off overnight! But I do hope that ultimately, this leads to greater transparency and more people having a better understanding of where their data has been compromised.

Have I Been Pwned
Tweet Post Update Email RSS

Hi, I'm Troy Hunt, I write this blog, create courses for Pluralsight and am a Microsoft Regional Director and MVP who travels the world speaking at events and training technology professionals