Troy Hunt: When children are breached – inside the massive VTech hack

I suspect we’re all getting a little bit too conditioned to data breaches lately. They’re in the mainstream news on what seems like a daily basis to the point where this is the new normal. Certainly the Ashley Madison debacle took that to a whole new level, but when it comes to our identities being leaked all over the place, it’s just another day on the web.

Unless it’s our children’s identities, that’s a whole new level.

When it’s hundreds of thousands of children including their names, genders and birthdates, that’s off the charts. When it includes their parents as well – along with their home address – and you can link the two and emphatically say “Here is 9 year old Mary, I know where she lives and I have other personally identifiable information about her parents (including their password and security question)”, I start to run out of superlatives to even describe how bad that is.

This is the background on how this little device and other online assets created by VTech requested deeply personal info from parents about their families which they then lost in a massive data breach:

InnoTab 3 created by VTech

Breach source, verification and (attempted) disclosure

Let me set some context first because this is clearly a very serious incident and it all began when I was contacted by Lorenzo Bicchierai earlier this week. Lorenzo writes for Motherboard and has often approached me for comments on security incidents in the past. This time, he wanted some help verifying a data breach that had allegedly come from VTech and contained millions of customer records. Someone had gotten in touch with him (I assume as they thought it might make a good story) and he was doing his journo due diligence thing.

Lorenzo passed on the data and I check it out. I found 4.8 million unique customer email addresses in one of the files and it “smelled” good, that is it didn’t have the typical hallmarks that often accompany a fabricated breach. However it wasn’t quite clear where the data had come from, I mean it’s not like you can just go to vtech.com and there’s a login box that tells you whether or not the account exists (incidentally, I did later discover an API that confirms the presence of an email address at login time). I needed further verification so I invoked the help of some Have I been pwned? (HIBP) subscribers.

In order to verify the data via HIBP, I had to call on some supporters. One of the features I added to HIBP very early on was the ability to subscribe to notifications:

This is a free service that sends you an email if your account pops up in a data breach. To date, there have been 290k people sign up and verify their email address (they need to receive an email at that address and click a unique link). Now I’d always intended for this to be a feature that simply notifies people of breaches as appropriate, but I’ve realised lately is that it also means I have an excellent source of individuals supporting the project who can help me verify future data breaches as well.

I took the email addresses from the alleged VTech breach and found 18 recent HIBP subscribers who had a comprehensive set of data in the dump. I emailed them asking for support, essentially saying that I’d been passed a data breach that included their details and if they were willing to assist, I’d send them some non-sensitive data attributes to verify. This was usually their month of birth, the city they live in and the name of their ISP based on their IP Address. All of these attributes were in the data breach and if the HIBP subscriber could confirm them and acknowledge they had a VTech account, I’d be confident it was legitimate.

I received six responses within 24 hours, every one of them confirming their data:

Yes. That's accurate. I did register at vtech so I could download addons for a toy laptop.

Yes. That's my data.
no doubt about that, I registered a vtech account within the last few months .

Yes, That looks like legitimate information. The service would be VTech’s Learning Lodge.

Yes, that looks like me. I lived near [redacted city] at the time and my daughter had one of their pads. I believe we logged in so that we could download apps from their app store and possibly for firmware updates etc.

Yes that is correct. It's an old address, I was with [redacted ISP] at the time so can verify this info ! I would have used the VTECH website for my daughter around that time too !

Yes I did access the VTech learning lodge in 2014 after purchasing a "Cora Cub" for my child. In order to personalise it's voice activated feature, you had to join the learning lodge.
I was with the broadband provider [redacted ISP] at that time. I have since changed services, unfortunately to TALKTALK!

Can’t help but feel sorry for the last person!

This was more than enough to now have complete confidence in the legitimacy of the data. But before loading it into HIBP, it was essential that VTech be aware of the incident too so I pushed Lorenzo on what steps he’d taken. He’s detailed his attempts to get in touch with VTech in the article he’s just published titled One of the Largest Hacks Yet Exposes Data on Hundreds of Thousands of Kids. For many days, he simply couldn’t get anyone to talk with him despite the fact they did actually respond (and redirect him) multiple times. As we discussed this incident in the days following his initial contact, at multiple points we talked about means of getting in touch with them and he reached out via various channels time and time again. It was reminiscent of my trials with 000webhost last month and frankly, I’m both staggered and appalled by the negligence these organisations are showing. Data breaches like this can be enormously damaging for both the customers and the online business alike but whilst I’m enormously sympathetic to the former, when the latter actively ignores multiple attempts at private disclosure even when they know it relates to a serious security incident, it’s hard to feel too sorry for them.

But to their credit, VTech did eventually respond to Lorenzo and acknowledged that prior to his contact they were not aware of a data breach but have since identified an incident on November 14. This roughly corresponds with the dates in the files, although as I’ll show shortly, there are records allegedly created many days after this. In their response, VTech explains the following:

Upon discovering the breach, we immediately made modifications to the security settings on the site to defend against any further attacks.

Unfortunately, this is insufficient and I’ll explain why shortly. They go on to reassure Lorenzo that financial data is just fine:

It is important to note that no payment card or banking information was obtained. Our database does not contain any credit card information and VTech does not process or store any customer credit card data on the Learning Lodge website. To complete the payment or check-out process of any downloads made on the Learning Lodge website, our customers are directed to a secure, third party payment gateway.

Frankly, I couldn’t care less about credit cards and as I’ve explained before, these statements are designed to appease the likes of PCI and are of little consequence to consumers when genuinely sensitive things – irreplaceable things – are lost by a company that suffers a data breach. Let’s take a look at just what they lost.

Understanding the data breach

Here’s what was originally provided to Lorenzo:

The contents of the VTech data breach

The file that immediately jumps out is the big guy at the top – parent.csv. This file has 4,862,625 rows and column headings as follows:

id
email
encrypted_password
first_name
last_name
password_hint
secret_question
secret_answer
email_promotion
active
first_login
last_login
login_count
free_order_count
pay_order_count
client_ip
client_location
registration_url
country
address
city
state
zip
updated_datetime

One of the first thing I look at in a breach like this is how many unique email addresses there are as it helps establish whether there are duplicate records. In this case there were 4,833,678 occurrences matching the pattern I extract email addresses on. A few less than the total rows which is normal due to either duplicates, missing addresses or strings that don’t conform to what believe an email address looks like (I have a pretty liberal regular expression pattern I use).

The next thing I checked was the passwords and whilst the column heading implies they’re encrypted, they’re not. The easiest way to check what’s going on with password storage is just to Google a few of the values stored in the database. For example, let’s take the very first one in the dump: 835af17f41292ba8ea3270f6859757ab

And here it is:

Their password is “welcome81”, it’s that simple. It’s just a straight MD5 hash, not even an attempt at salting or using a decent hashing algorithm. The vast majority of these passwords would be cracked in next to no time; it’s about the next worst thing you do next to no cryptographic protection at all. Speaking of which…

All secret questions and answers are in plain text. The questions are typical (albeit poor) examples such as your favourite colour, where you were born and your first school. In fact, you can see them in context in this screen from a video I’ll show a bit later:

Registration form asking for parent's info

This aligns with the columns from the parent.csv file I referenced earlier and gives me a high degree of confidence it’s at least one of the locations where parents would have entered data.

Normally this would be the end of the story when it comes to processing a data breach. I’d make the data searchable on HIBP, notify impacted subscribers and that would be it. But it’s a different story this time and it’s because of those member CSV files. Let’s take a closer look.

Children’s data

This will all make more sense if you watch the VTech Kid Connect: Getting Started Tutorial first. Just take a few minutes to understand the workflow when you first set up one of their tablets:

As you watch the video, you’ll see the Learning Lodge appear at the 1 minute mark. You would have seen this mentioned earlier on in the feedback I got from impacted customers and you can also see a browser based version of it.

At about the 1 minute 40 mark you’ll see where a child can be setup with a Kid Connect ID:

It then goes on to show how to create a parent account per the earlier image then a little bit later, shows how to create an avatar by taking a photo of your child. This is all consistent with the data that then appears in the master_account.sql file as follows:

id
username
domain
ll_child_id
ll_parent_id
parent_id
country_lang
create_datetime
expired_datetime

This is a self-referencing table, that is it has keys that refer back to itself. What it then means is that you get one record like this for the parent including their email address with an encoded @ symbol:

215836, 'foo%40bar.com', 'kc-im2.vtechda.com', 0, 2700413, 2700413, 'USeng', '2013-12-25 13:55:21', NULL

And then a child record which references it like so:

215841, 'LittleJohnny', 'kc-im2.vtechda.com', 3974296, 0, 2700413, 'USeng', '2013-12-25 13:55:23', NULL

You can see how the parent_id of the child – 2700413 – relates back to the ID of the parent. This is not just a relational parent / child structure, it’s a literal real world biological implementation of that. The parent’s record can then be pulled from the parent CSV file I mentioned earlier on and contains all the values such as address, password and security questions.

So that establishes parents and kids, but the latter doesn’t have much data here, only a parent-created username which (fortunately) doesn’t tell you much about them. However, those member CSV files are altogether more worrying, here’s what’s in them:

id
created_datetime
updated_datetime
parent_id
login_name
password
first_name
dob
product_code
is_avatar_created
account_level
gender
expiry_date
registration_url

There are 227,622 records in those five CSV files and yes, those columns are exactly what they look like – names, birth dates and genders, among other things. But where did they come from? I mean they’re not in the earlier Learning Lodge video, what other assets might VTech have that could collect this sort of data? The answer lies in the registration_url column and it includes these values:

www.planetvtech.com
www.lumibeauxreves.com
www.planetvtech.fr
www.vsmilelink.com
www.planetvtech.de
www.planetvtech.co.uk
www.planetvtech.es
www.proyectorvtech.es
www.sleepybearlullabytime.com
de.vsmilelink.com
fr.vsmilelink.com
uk.vsmilelink.com
es.vsmilelink.com

That’s ordered from the most popular down with 77k instances of www.planetvtech.com down to only 41 examples of es.vsmilelink.com. Each of these is hosted on the same IP address and has the same fundamental implementation; a single Flash file embedded in an ASP.NET page. Clearly they cater for different demographics in terms of kids interests and language preferences. Here’s how the most popular one looks (incidentally, all entries in the data breach for this domain were in a file called memberuk.csv which appears to tie back to the United Kingdom):

No, I didn’t make up the cyber spies bit! Each of the implementations above provides the ability for parents to register on the site:

After registering, there’s a confirmation email and then the ability to start adding kids:

Now we see name, birth day and gender, the sensitive personally identifiable fields consistent with what was disclosed in the leak. It goes on to request an optional “Citizen ID Registration” which I don’t have but the pattern matches the product_code column in the members file from the data dump (60% of the records have a null value in this column). A bit of Googling suggests that this is available for purchasers of VTech products:

This is an important thing to note; products such as the Cyber Rocket are physical products (like the InnoTab at the start of this blog post), which then encourage the creation of digital accounts. This creates a relationship between physical and digital assets by virtue of the Citizen ID.

Further verification that the kids’ data in the form above is what goes into the member tables comes by looking at the ID the new record is assigned. Here’s the response after logging in:

I’m returned an internal identifier of 130,460. The last ID in the dumped member data is 130,446 and it was created on Nov 16, about ten days before the time of writing. The sequence checks out and remember, that table has the registration_url column stating that most of the records came from www.planetvtech.com. The point is that it establishes a high degree of confidence as to where the data in the breach may have originally been entered.

Now here’s where I need to be intentionally vague because despite their assurances that their system is now secure, they still have gaping holes that allow every kid to be matched with every parent. The details of this have been passed on to VTech and I’ll say this much here: there’s no simple fix. The flaws are fundamental and the recommendation I’ve passed on is to take it offline ASAP until they can fix it properly. You just can’t take chances with other people’s data in this way, especially not when they’re kids.

The average age of kids when their account was created is just 5 years old. They have the sorts of login names you’d expect a parent to give their children; affectionate “pet names” in many cases. The kids are almost precisely split between girls and boys and not only has their data already been leaked in this breach, it remains at serious risk due to the implementation of the site.

Major security failings on VTech’s behalf

Let me caveat what I’m about to detail by saying that everything I’m about to share is publicly observable when the systems are used in their intended way. This is all discoverable by using their websites precisely as they were intended to be used which on the one hand means that it’s easily obtainable information by anyone yet on the other, means that they could also have readily identified a whole raft of flaws themselves if only they’d looked.

For example, there is no SSL anywhere. All communications are over unencrypted connections including when passwords, parent’s details and sensitive information about kids is transmitted. These days, we’re well beyond the point of arguing this is ok – it’s not. Those passwords will match many of the parent’s other accounts and they deserve to be properly protected in transit.

Of course once the passwords hit the database we know they’re protected with nothing more than a straight MD5 hash which is so close to useless for anything but very strong passwords (which people rarely create), they may as well have not even bothered. The kids’ passwords are just plain text, but you could almost argue this is ok insofar as it’s not exactly going to open their eBay account or get attackers into their Gmail. I’m sure some people will vehemently disagree with me on that and insist they should be strongly hashed as well, my point is simply that whilst they would be easy to protect properly, they don’t present the same risk profile as the passwords of their parents.

Lack of cryptographic protection for sensitive data is yet another example of where it’s all gone wrong. Those security question and answer pairs are irrevocable pieces of personal information used to establish identity in all sorts of different places. By comparison, have a look at how Patreon handled personal data in my recent post on extortion; addresses, birth dates and other personal info was all encrypted so that when the worst case scenario did eventuate, it wasn’t as bad as it could have been.

Then there are some genuinely alarming practices. For example, here’s the response after I got the password wrong when logging into the LittleMary account I created earlier on planetvtech.com:

Why they’re returning a SQL statement is absolutely beyond me. Lorenzo was told by the person that provided him with the data that the initial point of compromise was due to a SQL injection attack and even without seeing the behaviour above, I would have agreed that was the most likely attack vector. On seeing the haphazard way that internal database objects and queries are returned to the user, I’ve no doubt in my mind that SQL injection flaws would be rampant.

The other rampant practice that’s increasingly frowned upon in the security space it the extensive use of Flash. It appears consistently throughout their online assets and whilst it would have made sense many years ago, the continuous stream of security vulnerabilities it’s presented coupled with the lack of mobile support and ready availability of alternative rich UIs via HTML 5 make it an increasingly rare thing to see such a dependency. There’s a sense of “systems from a bygone era” throughout their assets not just with the Flash dependency but things like the site still reporting ASP.NET 2.0 which was superseded almost six years ago now (.NET 3 or 3.5 will still report as 2.0 based on the X-AspNet-Version header) and the extensive use of WCF and SOAP services. That’s not to say all of this is security gone bad, more that you get the distinct sense VTech’s assets were created a long time ago and then just… left there.

Summary

I’ve got two little kids and as a father, this really made me think about the footprints I’ll make for them online. I personally have a mixed reaction to this event; I’m upset that someone would seek to take this class of data from a system yet on the other hand, the data seems to have been very closely held and I hope it stays that way. But what really disappoints me is the total lack of care shown by VTech in securing this data. It’s taken me not much more than a cursory review of publicly observable behaviours to identify serious shortcomings that not only appear as though they could be easily exploited, evidently have been. Despite the frequency of these incidents, companies are just not getting the message; taking security seriously is something you need to do before a data breach, not something you say afterwards to placate people.

All 4.8M parents are now searchable in HIBP. The children aren’t, but I suspect this will be the first of many times their data will be breached, dumped and traded online.

Update 1: It now appears as though head shots of kids and private chat messages were also exposed. Lorenzo has a full update in his story titled Hacker Obtained Childrens' Headshots and Chatlogs From Toymaker VTech.

Update 2: VTech have issued a subsequent statement and advised that the true extent of the children exposed in the incident is over 6.3 million. Wow.

Security Have I Been Pwned IoT

When children are breached – inside the massive VTech hack

Breach source, verification and (attempted) disclosure

Understanding the data breach

Children’s data

Major security failings on VTech’s behalf

Summary

Troy Hunt

Upcoming Events

Must Read