Handling Chinese data breaches in Have I been pwned

China is an immensely fascinating place for many reasons. It's geographically bigger than the US, it has almost double the population of Europe and it's had the world's largest economy for the majority of the last two thousand years.

On the technology front, there are more internet users than the US, Brazil, Japan, Russia and Indonesia combined (which make up 5 of the top 7 most connected countries), yet there's still only about half the population online. When that half does connect, it's usually not to the services that you and I use (assuming you form part of my predominantly Western cultured audience demographic). We use Google, they use Baidu. We use Twitter, they use Sina. We use YouTube, they use Youku Tudou.

It's not just different websites either, there are some fundamental differences in the way they browse the web, starting with the PCs they use. Windows XP still has over 20% of the operating system market share, this in a time where Australian usage is getting close to sub 1% (I've written before about the drivers keeping it alive in China). When folks in China get online, there's a bunch of places they won't be going due to the Great Firewall of China. Everyday sites we take for granted elsewhere are largely inaccessible within China; Google. Facebook. YouTube. Twitter. Instagram. The list is extensive and whilst many people use VPNs, they're regularly blocked or are unreliable.

The point of all this is that China is a very different place to so much of the rest of the world, including neighbouring Asian countries. Yet one thing that's not different is that like everywhere else, data breaches are a serious ongoing problem. In fact in some ways it's worse in China due to their massive size combined with a very different social tolerance for privacy. In my experience travelling there frequently for work and having had many Chinese colleagues I worked closely with, there's just not the same outrage we'd have here knowing that others have access to our personal data. I want to be careful how I put this and caveat it with "my personal experiences", but where we'd be very unhappy with, say, government monitoring of our personal communications, they accept it as a more normal part of life.

When I see alleged Chinese data breaches, it's enormously hard for me to do the same level of due diligence as I'd normally do when I verify these incidents. This is due to a combination of language barrier (there's Google translate, but that only takes you so far), breach origin (site domain names often don't match the name of the service) and a general lack of understanding about how some of the sites implicated in these breaches are used by the local population. I've certainly tried the usual means I wrote about in the above link including reaching out to Have I been pwned (HIBP) subscribers impacted in those breaches and asking for their support in verification. I've had really mixed results from them, for example when providing one subscriber with his data from an incident this week:

Those are sadly legitimate. The ip resolves to my internet provider

And someone else in the same incident who didn't believe they ever had an account on the site:

After triggering "forgot password" I got the email in my spam

Then there was an earlier alleged breach which resulted in feedback from HIBP subscribers such as this:

I have now looked at these sites and have never knowingly used them

Yet she then went on to say:

The word next to my email address is one I used to use as a password

This particular incident had many other subscribers respond in similar ways:

I have never used [redacted] or been to China, so my data being listed as part of a Chinese breach does not make sense. However, I was in Malaysia in 2009, where i would have used those credentials to access various hotel internet services.

Time and time again, Chinese data breaches would pop up and I'd verify them enough to establish that there's some merit to them, but I just haven't been certain enough to put them into HIBP with the same degree of confidence as, say, Dropbox or LinkedIn or any of the others where I've been as close to certain as I can be. As a result, I've been sitting on a lot of large Chinese data breaches that I know have a significant portion of legitimate user information in them. So here's how I'm going to handle them:

Back in July, I introduced the concept of unverified breaches which are incidents that have enough legitimacy to take seriously, but not enough represent them in the same class as the like of Dropbox et al I just mentioned. This is pretty much where I'm at with these Chinese incidents so that's how I'm going to handle them - as unverified breaches.

Starting today, I'm going to start feeding some of these big breaches into HIBP. Some of them are millions of records, some of them are tens of millions. One of them is hundreds of millions and as I've outlined above, whilst it's hard to be emphatic about their legitimacy, they're legit enough to warrant inclusion.

Tweet Post Update Email RSS

Hi, I'm Troy Hunt, I write this blog, create courses for Pluralsight and am a Microsoft Regional Director and MVP who travels the world speaking at events and training technology professionals