Troy Hunt: The Red Cross Blood Service: Australia's largest ever leak of personal data

I don't give blood as much as I should. My wife has a much better track record than me, regularly donating not just blood but plasma and platelets as well. I know this not just because it's the sort of thing we talk about, but because her data - along with mine - has been leaked publicly in what I believe is the largest ever leak of Aussie data from a local service.

Because of the coverage this incident will inevitably receive, I'm writing this piece in advance of them publicly disclosing it in order to answer as many of the inevitable questions which will arise as possible. I also want to make it abundantly clear up front that this should not discourage anyone from giving blood in the future because as important as this incident is, it pales in comparison to making a donation that could save lives. I'll come back to that later, let's just start with the facts.

Discovery, verification and disclosure timeline

I run the data breach notification service known as Have I been pwned (HIBP), an ethical project designed to help individuals and organisations understand their exposure on the web which often follows online security incidents such as the LinkedIn and Dropbox hacks. In running this service, I frequently have people contact me with data breaches. Often this is after they've received the data from someone else as part of a trade, sometimes it's provided by the individual who hacked into the system itself and occasionally, it's because they simply found the data lying exposed somewhere.

On Tuesday morning, I was contacted by someone who fell into that last category. He claimed to have data from donateblood.com.au and he provided me with a snippet to prove it - a snippet of my own data. There was my name, my email, gender, date of birth, phone number and the date I'd last donated. He then provided me with the entire data set, a 1.74GB file with 1,286,366 records in a "donor" table which was just one out of a total of 647 different tables. I checked my wife's record and found all the same info as I had albeit across 9 different records reflecting the different occasions she'd donated. In addition to the fields in my data, her data also had our home address and her blood type. There was no doubt in my mind that this data was legitimate.

I queried the sender of the information about how he'd come across the data, expecting it to be as a result of an attack using a technique such as SQL injection, widely regarded as the most serious risk to web security today and frequently the "vector" which leads to the disclosure of data like this. But it actually turned out to be much simpler than that to the point where I initially had trouble grasping what he was saying.

What he'd actually been doing is simply scanning internet IP addresses and looking for publicly exposed web servers returning directory listings. This is literally as simple as going to an address such as http://127.0.0.1 and seeing a list of all the files on the system (sample address only). He'd then look to see if any of those files contained a .sql extension which would indicate a database backup... and that is all. I'll come back to why this data was there a little later.

It can be difficult to know how to proceed after making a discovery like this. I could go direct to the Red Cross who runs the website but there's always the risk of it being swept under the carpet (I had no reason to believe that the Red Cross specifically would do this, but it frequently happens with other organisations). I could go to the Australian Federal Police but frankly, they've got enough really serious crime to deal with as it is. I could go to the media and it would certainly get immediate attention, but it would catch the Red Cross off guard and particularly given the fantastic work they do for the community, that's not something I wanted to see happen.

Ultimately, I elected to reach out to a contact at AusCERT. Many countries have their own CERT (Computer Emergency Response Team) and our local one was a channel I trusted to both take the incident seriously and handle it ethically. AusCERT is a not for profit organisation based out of the University of Queensland and they provide various services to member organisations (membership is a small annual fee) and fortunately, the Red Cross had a pre-existing membership with them. I also knew they were properly equipped with the right people and processes to take something with this degree of sensitivity and do the right thing by those impacted, my wife and I included amongst a huge number of other Aussies.

I spoke to AusCERT Tuesday afternoon and outlined the situation. They reached out immediately to the Red Cross and got back in touch with me Wednesday morning. We spoke again Wednesday evening, Thursday morning and again Thursday afternoon. The constant overarching theme of the discussions was how we could best contain the data and minimise the impact on the donors within there.

As of 12:00 NSW time today, the Red Cross has now issued a public statement which explains the situation and has been covered by the ABC.

What actually happened?

Being conscious that many non-technical people will read this post, let me set some context first. Most organisations have a raft of different, systems, processes, people and partners that handle their data. I spent 14 years in one of the world's largest companies and saw firsthand on countless occasions just how far customer data spreads either by design (working with a marketing agency or a data processing partner) or through lack of proper process (developers with access to customer databases, people managing servers without appropriate skills etc). It's not unusual to see data pass through many hands. It shouldn't happen, but it's extremely common.

In the Red Cross' case, the data that was ultimately leaked was a database backup. That 1.74GB was simply a mysqldump file that had everything in it. Taking a database backup is not unusual (in fact it's pretty essential for disaster recovery), it's what happened next that was the problem.

The database backup was published to a publicly facing website. This is really the heart of the problem because no way, no how should that ever happen. There is no good reason to place database backups on a website, let alone a publicly facing one. There are many bad reasons (usually related to convenience), but no good ones. In fact, I show this anti-pattern in my security workshops; I've just spent the last few weeks training software developers in Europe about how precisely this behaviour is risky and even have a live demo of this in the site I use for my workshops. Often, people don't believe that such an egregiously bad security pattern would ever happen "in the real world", but here we are.

The final piece that made all this possible was having directory browsing enabled on the server. The database backup should never have been there in the first place, but it's highly unlikely it would have been found without directory browsing enabled (the file name would not have been easily guessed, it wasn't as obvious as something like "database.sql"). Showing a public listing of the file contents of the server is a well-known risk and there's rarely a valid justification for this, precisely for the sorts of reasons demonstrated with this incident.

One really important point to make here is that whilst the data originally came from the Red Cross, it ultimately wasn't them that published it to a publicly facing server, rather it was a partner. That doesn't change the end result and certainly the Red Cross has still taken responsibility for the incident, but it's an important detail in the overall chronology of events. Still, it's highly unlikely there was a valid reason for them to provide the partner with such an extensive amount of data and I'm sure there will be many questions asked as to how so much information should have been shared in the first place and indeed how much is shared in the future.

The other issue that's exacerbated the seriousness of this leak is the nature of the data within it. Let's take a look at what was involved.

What data was included?

The obvious one which should already be apparent by now is a list of blood donors. It's not just donors though, it's the appointments they've made and the other data around their identities. I briefly touched on it before, but here's the list in full:

First name

Last name
Gender
Physical address
Email address
Phone number
Date of birth
Blood type
If they'd previously donated
Country of birth
When their record was created
The type of donation (Plasma, Plasmapheresis, Platelet, Plateletpheresis, Whole Blood)
When each donation occurred
Donor eligibility answers

As I mentioned earlier, there were almost 1.3M records in the donor table, but that doesn't mean there were that many actual donors. Of these, 602k had no email address which is entirely feasible given many people would have donated blood by channels which either didn't require them to provide it at registration or they elected to withhold it. There were a total of 413k unique email addresses and many entries that used the same address due to multiple donations (i.e. my wife's). What all this means is that according to the Red Cross' statement today, there were approximately 550k actual people in the data.

As with most data breaches, not all attributes are complete for every person. For example, I mentioned earlier that my blood type wasn't in the data and I suspect that's because if I'm honest, I don't know what it is myself therefore I wouldn't have provided it at registration when I booked the appointment. (There doesn't appear to be any post-donation data in there such as the results of any tests on the blood.)

One attribute of particular sensitivity is the collection of donor eligibility answers. Each donor is asked questions such as whether or not they're on antibiotics, if they're under or over weight and if they've had any recent surgical procedures. They're personal questions, no doubt, but one of them particularly stands out in terms of sensitivity:

In the last 12 months, have you engaged in at-risk sexual behaviour?

Clearly that is a deeply personal, private attribute that could be enormously sensitive if the answer is in the affirmative. Because there are many eligibility questions for each donor, there are a total of 7,343,537 answers in the system and naturally, many of these relate to the question of at-risk sexual behaviour.

Per the title of this post, I believe this incident has the unenviable title of being Australia's largest ever leak of personal data. There was Aussie Farmers Direct with 5k accounts a year ago and Aussie Travel Cover lost a six figure number of records earlier that same year (one person may have multiple records). Both Kmart Australia and David Jones had incidents just before that (both impacting a small subset of online customers) as well as Catch of the Day the year before with an undisclosed number of records exposed, although highly unlikely at the numbers we're talking about here. One thing is certain though - none of them had data anywhere near as sensitive as what the Red Cross holds on blood donors.

Who else has the data?

This is the question which is most concerning and the only answer anyone can confidently give is "we don't know". Part of the reason for this is that the mechanism used by the guy that found it is very simple and very widespread. Scanning the internet for everything from vulnerable code to connected devices to publicly facing backup files is something that happens constantly by many different parties. We saw that recently in the story I wrote up about how Regpack "lost" 324k payment records, again in the exposed Modern Business Solutions MongoDB database a couple of weeks ago and on countless other occasions. Scanning the internet for "things" has just become the norm and sooner or later, data like this will inevitably be found.

Then there's the individual himself who first reached out to me. Trading data such as this is an alarmingly frequent practice and it's common for individuals to exchange it with others. Obviously, that's not something we wanted to see happen and we'd need his support to minimise the exposure. I had a discussion with him on Thursday morning which produced two important outcomes:

He maintains that he hasn't redistributed the data to anyone else

He also agreed to permanently delete the existing Red Cross data he had

However, by his own admission, we can only take his responses at face value. The Red Cross has done the right thing in making a public statement about this and notifying impacted donors; with AusCERT's support, they've approached this as though the data is out there and proceeding with an abundance of caution is the responsible path to take.

HIBP and my own personal data retention

As I mentioned earlier on, people send me data like this to load into HIBP so that impacted individuals can learn of their exposure in incidents they may never have been aware of elsewise. However, this case is unique for two main reasons:

The Red Cross has committed to notifying all impacted parties; there should be nobody who was exposed in this incident that doesn't hear directly from them about what happened, my wife and I included

With the original party who located the data having deleted his copy, mine was the only one we know for sure still existed outside the Red Cross' systems

As a result, I offered to permanently delete the copy I was sent and not load it into HIBP. As of Thursday evening, that's precisely what I did - permanently deleted every trace of it I had. This isn't unprecedented, I took the same steps as part of the clean-up in the wake of the VTech data breach and for all the same reasons it made sense then, it makes sense now. As with VTech, this should give those who were exposed in the incident just a little bit more peace of mind that their data has been contained to the fullest extent possible.

Is it a hack? Or a breach? Or a leak? Does it matter?

One of the things that often happens after an incident like this is fairly selective wording of what actually transpired. For example, the Regpack situation I mentioned above was a very similar situation to this, namely that someone inadvertently published data to a publicly accessible location. Regpack were at pains to point at that their systems were not breached, instead referring to the event as a "data incident". I was critical of this approach at the time because it was obvious their intention was to downplay the severity of the issue as opposed to owning the problem and communicating honestly and transparently (there was other behaviour also consistent with this).

In terms of the Red Cross, it's hard to call this a "hack" simply because it didn't involve exploiting any weaknesses within their software. I mentioned SQL injection earlier on and that's frequently the root cause of breaches where exploiting system flaws is involved, as are attacks such as enumerating direct object references and numerous other methods that rely on faulty code. I've used the term "leak" throughout this post because in my view that's a fairer definition; they inadvertently published the data to the world web and someone simply downloaded it.

But frankly, it makes very little difference to the people in the data set as the end result is the same: their very personal information fell into the hands of someone who should never have had it in the first place.

AusCERT and Red Cross' handling of the incident

There's no escaping the fact that this was a major cock-up on many levels and that's the simple, honest truth. Like many people who report security vulnerabilities or incidents like this, I've had experiences in the past where responsible disclosure and appropriate action on behalf of the organisation involved has not been a smooth experience (000webhost and Nissan are two notable examples).

AusCERT have been absolutely outstanding. I won't go into all the details but they handled this with a professionalism and urgency far beyond what I expected. They were instrumental in helping the Red Cross contain the risk, prepare their public communications and identify the necessary steps to advise impacted parties. I always had complete confidence that they'd handle this precisely in the way that those of us reporting incidents like this would expect and they've done a standout job of that. I'd encourage not just other Aussies to reach out to AusCERT in cases like this, but folks around the world to contact their local CERT if ever they learn of serious digital risks like this (Wikipedia has a list of them).

As for the Red Cross, this was a massive shock to the leadership there. As unforgiving as I tend to be about serious security oversights like this, I can also sympathise with the situation they've found themselves in, especially given their purpose and the role they play within the community. There was never any indication that they wouldn't handle this in precisely the fashion an incident of this type deserves: it was a factual, honest and expeditious process. No sugar-coating what happened, no suppressing information from impacted parties and no blame apportioned to the individual who originally identified the leak. It shouldn't have happened in the first place, but it couldn't have been handled better once it did.

If nothing else, I hope that the outcome of this incident encourages others to exercise responsible disclosure themselves in the future.

Please continue to give blood generously

I was really conscious when I first started looking into this that the incident would make life hard on the Red Cross. It's going to cost them money, it's bad publicity and there's a real chance that people may actually feel less inclined to give blood. I want to lead by example here and do what I should have been doing far more frequently anyway:

My blood donation booking

I've booked an appointment for the first available spot at my local donation centre so come Monday, the Red Cross will have my blood. They also now have my data (again) and yes, it's the correct data with honest answers to all questions. This is not one of those "oh, just fabricate the answers" deals, this is an act that can saves lives. I don't like that my data was exposed in this way but let us not lose focus on life's bigger issues.

If you read this and found it interesting, wherever you are in the world, take a short bit of time out of your day and donate blood. Help find a silver lining in this incident and use it as an opportunity to make a positive difference.

Edit: The source of the leak has now been identified as a contractor named Precedent.

Security

The Red Cross Blood Service: Australia's largest ever leak of personal data