Mastodon

When a nation is hacked: Understanding the ginormous Philippines data breach

Remember when OPM got breached last year? There was a lot of excitement in various parts of the world (namely the US) because here we had a government department (Office of Personnel Management), and they’d just lost 21.5 million records! These records included such sensitive data as names, dates of birth and addresses and by any reasonable measure, it was serious – that’s almost 7% of the country’s population!

Yet somehow, last week’s news that 55 million Filipino voters’ data was now out in the wild went largely unnoticed. Let’s put it down to a very western-centric tech media but move past that and look at this incident for what it is – a ginormous data breach with extremely sensitive information and at 55M individuals, that’s also more than half the country’s population.

Whilst there’s been limited press coverage on the issue, a public statement from the Filipino government has suggested that nothing sensitive was disclosed. As I discovered when I reached out to some of the people involved, this is blatantly wrong. Here’s how it all unfolded.

Background

A couple of weeks back, the COMELEC website (Commission of Elections) was defaced, allegedly by “Anonymous Philippines”:

Defaced comelec.gov.ph website

This is the usual hacktivist ramblings, nothing out of the ordinary for a defacement but inevitably it led to them grabbing quite a bit of data. Just for context, here’s one of the search facilities on the COMELEC website (provided to me by a Filipino who assisted with data verification):

No HTTPS on search with personal sensitive data

It’s a search facility requesting name and sensitive personal information (namely birthdate) over an unencrypted connection. You could try and load the page securely, except that won’t work very well:

Cert is issued for localhost

Often when I’m verifying a breach, I’ll look at the site it allegedly came from and try to get a feel for how likely it is to be legitimate. Oversights like this combined with the W3C compliance GIF and hit counter at the bottom of the page just above the “Copyright © 2001” statement start to paint a pretty clear picture….

W3C logo, hit counter andy 2001 copyright

Breach distribution and contents

Firstly, unlike the OPM data, the Filipino breach has been very broadly distributed. Not only has it been readily available for download from multiple locations on the clear web, it’s been quite extensively torrented too. The genie is well and truly out of the bottle and it won’t be going back in.

The data consists of 76GB worth of (usually) compressed files, most notably a MySQL backup that expands out to 338GB. There’s a raft of other .sql files in the breach as well ranging from a few KB up to hundreds of MB. The breadth of data in these is quite significant; Trend Micro did a write-up on what they found so I won’t repeat it all here, I was more interested in verifying the legitimacy of the breach and conclusively reporting on the accuracy.

Amongst the huge volume of data is a total of 228,605 email addresses. This may sound like a small number out of the 55M records, but according to reports, a lot of the sensitive data such as passport numbers belongs to a “mere” 1.3M overseas voters. It’s entirely conceivable that records are not complete across all these individuals, but at least the email addresses gave me a verification avenue.

Breach verification with “Have I been pwned” subscribers

At the time of writing, I have 367k verified subscribers in Have I been pwned (HIBP), that is they’ve all gone to the notifications page, left their email address then received a confirmation and acknowledged it. I’ve used subscribers in the past to verify breaches where I’m not confident of the authenticity. It’s always worked out very well as I’ve got a large number of people interested in their exposure in data breaches and able to confirm whether information is accurate or not. I give them a small slice of what is allegedly their data and they then confirm the legitimacy.

Part of the reason why I particularly wanted to do that with this breach is because of this statement by COMELEC officials (emphasis mine):

Again, I want to emphasize that the database in our website is accessible to the public. There is no sensitive information there.

Now this could just be misreporting or comments taken out of context, but I’ve seen numerous articles downplaying the severity of the data in a manner that’s just not consistent with what I’ve observed in the data breach. Let’s see what those in there have to say about their data.

Yesterday I emailed a number of HIBP subscribers and got back some pretty quick responses with everyone willing to assist. I found them spread out across two tables in the data breach, the first being a table called “irdoctable2014” which has the following fields:

# FORM_ID, APP_TYPE, REGISTRATION, LASTNAME, FIRSTNAME, MATERNALNAME, SEX, CIVILSTATUS, SPOUSENAME, RESSTREET, RESPRECINCT, RESPRECINCTCODE, RESREGION, RESBARANGAY, RESCITY, RESPROVINCE, MAILSTREET, MAILEMBASSY, MAILCOUNTRY, REGCOUNTRY, REGEMBASSY, REPSTREET, REPBARANGAY, REPCITY, REPPROVINCE, EMAIL, ABROADSTATUS, ABROADSTATUSSPECIF, FLASTNAME, FFIRSTNAME, FMATERNALNAME, MLASTNAME, MFIRSTNAME, MMATERNALNAME, REPLASTNAME, REPFIRSTNAME, REPMATERNALNAME, DOBYEAR, DOBMONTH, DOBDAY, BIRTHCITY, BIRTHPROVINCE, CITIZENSHIP, NATURALIZATIONDATE, CERTIFICATENB, COUNTRYRES, CITYRESYEAR, CITYRESMONTH, PROFESSION, SECTOR, HEIGHT, WEIGHT, MARKS, DISABLED, ASSISTEDBY, TIN, PASSPORTLOST, PASSPORTNB, PASSPORTPLACE, PASSYEAR, PASSMONTH, PASSDAY, REGBARANGAY, REGREGION, REGCITY, REGPROVINCE, REG_DATE, STATIONID, LOCAL_ID, CREATE_TIME, UPDATE_TIME, IS_EXTRACTED, IS_EXPIRED, IS_CANCELLED, CONTACTNUMBER, EXPIRATION_DATE, APPOINTMENT_DATE, APPOINTMENT_TIME, SCHED_TIME, COUNTER_CHANGES, REFERENCENUMBER, ERBDATE, USER_ID, EMAIL_ID, EXTRACTED_DATETIME, IS_DELETE, UPDATED_DATETIME, IS_FRONTPAGE, IS_REPRINT, IS_OV, IS_COUNTED

This is a very large amount of data and reading through those column names, clearly many of them would be considered sensitive personally identifiable data. However, some of the data is encrypted, namely the person’s name and their data of birth. Part of the irony here though is that the email addresses appear in the clear and often contains both the first and last name anyway! Not all the fields are populated but plenty of them are and they contain very personal info. Let me demonstrate by sharing one of the responses I got to my questions in full (responses in bold, data obfuscated by XXX):

Would it be feasible that you have a record in this database? Yes, as I am a registered voter for the coming elections

Is this likely to be legitimate data? Yes

There’s some very specific information about your height (XXX) and weight (XXX) which I assume is metres and kilograms – does that sound right? Yes, that's what I declared when they took my vital statistics and biometrics when I registered last year

Do you recognise the names “'XXX', 'XXX', 'XXX', 'XXX', 'XXX', 'XXX'” – I suspect they may refer to your mother and father, I’m just trying to confirm? Yes, those are my parents

Along with email address (which in this case included the person’s full name), is their “vital statistics and biometrics” as well as their parents’ names which all appear in the clear. There’s also a physical address, gender, marital status, where they were born, where they’re now living, their profession and their phone number. This is very personal information!

Another subscriber provided this confirmation:

XXX and XXX are my parents' middle names. In the Philippine setting, your middle name is usually your mother's surname (prior to her marriage).

I then had a further two other people provide the same emphatic confirmation about their data in the same table. A fifth person who offered support was found in a table called “doctablepost” which was in the 338GB file and contained these fields:

# ID, APPLICATION_ID, FORM_ID, APP_TYPE, ABSENTEE, REGISTRATION, LASTNAME, FIRSTNAME, MATERNALNAME, SEX, CIVILSTATUS, SPOUSENAME, RESSTREET, RESPRECINCT, RESPRECINCTCODE, RESBARANGAY, RESCITY, RESPROVINCE, ABROADSTREET, ABROADZIP, ABSENTIA, ABROADCITY, ABROADCOUNTRY, ABROADPERIOD, ABROADRESCONT, REGCOUNTRY, REGEMBASSY, MAILSTREET, MAILZIP, MAILCITY, MAILCOUNTRY, MAILEMBASSY, REPSTREET, REPBARANGAY, REPCITY, REPPROVINCE, EMAIL, ABROADSTATUS, ABROADSTATUSSPECIF, LASTENTRYDATE, ABSREGISTERED, OLDPRECINCT, OLDREGBARANGAY, OLDREGCITY, OLDREGPROVINCE, OLDREGDATE, FLASTNAME, FFIRSTNAME, FMATERNALNAME, MLASTNAME, MFIRSTNAME, MMATERNALNAME, REPLASTNAME, REPFIRSTNAME, REPMATERNALNAME, DOBYEAR, DOBMONTH, DOBDAY, BIRTHCITY, BIRTHPROVINCE, CITIZENSHIP, NATURALIZATIONDATE, CERTIFICATENB, COUNTRYRES, CITYRESYEAR, CITYRESMONTH, PROFESSION, SECTOR, HEIGHT, WEIGHT, MARKS, DISABLED, ASSISTEDBY, OLD_VIN, VINP1, VINP2, VINP3, VINCONTROLCODE, TIN, PASSPORTLOST, PASSPORTNB, PASSPORTPLACE, PASSYEAR, PASSMONTH, PASSDAY, REGBARANGAY, REGCITY, REGPROVINCE, REG_DATE, INTERNAME, OFFICERNAME, OPERNAME, STATIONID, CDID, SETID, PRINT_FLAG, FINGER_INFO, FINGER_TOPO_COORD, QUALITY, MATCHING_FINGER, TRANSFER_STATUS, TRANSFER_UPDATE_TIME, PAGES_DESCR, LOCAL_ID, CREATE_TIME, UPDATE_TIME, LOCK_USER, LOCK_TIME, PROCESSING, IS_CURRENT, DOC_VERSION, CD_STAT_ENTY, DISAPPROVED, VOTING_HIST1, VOTING_HIST2, OP_CODE, OP_DATE

This is where we now start to get into passports too and indeed this individual's was in there, both the number and the place and date of issue. His name was also encrypted but the passport data wasn’t, nor was his birthday as it was for others in the previous table. I asked him for confirmation of his data:

Would it be feasible that you have a record in this database? Yes

Is this likely to be legitimate data? Yes

The db record appears to be my overseas absentee voter record which I have registered into back in 2012. That also contains my current address, mothers maiden name and the Comelec officer who processed my registration.

Does your passport number end in “XXX” and was it issued in the month of XXX? Yes. Starting with XXX.  It was issued in XXX and will expire XXX XXX

Were you born in [month] XXX? Yes, I am born in XXX

That’s a very emphatic set of responses and he was able to not only confirm passport fragments that I gave him, but provide me with other fragments that lined up with the data in the breach. With five independent confirmations of the data, there’s no doubt in my mind that this is the real deal.

Other data attributes

As serious as the info above is, it’s only scratching the surface. Per the reports linked to earlier, there’s also biometric data relating to fingerprints in the system. This contains columns names such as these:

PRINT_FLAG, FINGER_INFO, FINGER_TOPO_COORD, QUALITY, MATCHING_FINGER

The values within there can be quite detailed and I’ve no reason to think that this isn’t indeed legitimate print data uniquely and biologically identifying the owner. You don’t get to reset that stuff once it’s been released into the wild!

There’s what appears to be a CMS back end for the website. There’s voting history against names (it appears to just be dates rather than the candidate voted for). There’s information about embassies and polling locations and data on what appears to be electoral candidates as well. Clearly some of this should be public, but here you have a whole heap of very sensitive, poorly protected data somehow grouped in with public domain info. Given that most of this all sits in the one database as well, it’s highly liking it was all running behind the public website which may explain how such a broad set of data was obtained.

It’s just an absolute mess of huge volumes of data, tables with suffixes which appear to indicate copies or duplication, draft or temporary data and inconsistent (and frequently insufficient) cryptographic storage of sensitive data. This feels like so many other large, legacy corporate databases I’ve seen which have had numerous developers applying various practices to it over a long period of time. Only difference is, it’s got a heap of highly sensitive information in it and it’s now all public.

HIBP and summary

The 228,605 email addresses in the breach are now searchable in HIBP. I actually had to create five new data classes when loading this breach, that is I’d never seen this information in a breach before: Marital statuses, Biometric data, Physical attributes, Family members' names

The Philippines is not exactly high on the news radar for Western media but this is a breach we should be paying attention to. There’s the potential to do serious damage to those involved and we need to remember that the same classes of data are held by all our governments in our respective corners of the world. It’s too late for those in this breach – a huge amount of personal data is now perpetually out there in the public domain – but let’s see if we can avoid those same mistakes in other parts of the world.

Security
Tweet Post Update Email RSS

Hi, I'm Troy Hunt, I write this blog, create courses for Pluralsight and am a Microsoft Regional Director and MVP who travels the world speaking at events and training technology professionals