Troy Hunt: The Tesco hack – here’s how it (probably) happened

As prophesised, it has happened – Tesco has had a serious security incident. The prophecy, for new readers, was my piece on Lessons in website security anti-patterns by Tesco from a couple of years back. The catalyst for that post was this now infamous tweet in response to my pointing out that they had mixed content on an otherwise secure page:

At the time, they had a whole raft of issues ranging from insecure cookies to security misconfiguration leaking internal implementation info to very dated frameworks to a whole bunch of password security craziness. They were also very quiet when it came to actually talking about and addressing the issues, much to the ire of customers who wanted to see some proactivity. It all ended up with them in front of the ICO and whilst no fine was issued, it was clear the information watchdog in the UK wasn’t real impressed with the situation.

Fast forward a year and a half and now we have this – 2,239 Tesco usernames, passwords and loyalty card balances on public display on Pastebin. Now I’m the first person to be sceptical about anything that appears on Pastebin, especially when the “leak” has no attribution and none of the usual song and dance we see from the hacktivists that are usually the ones to publicly leak this sort of data. But then a number of tweets started popping up from people who tested and verified the credentials on Tesco’s website. They worked.

The final verification came via the BBC story on Thousands hit in Tesco.com attack:

All those contacted said their login details were correct and one added the attackers had used them to steal store vouchers.

The story goes on to explain how Tesco is now contacting impacted customers and it looks like they may be compensating them where there has been a loss of reward points or credits or whatever it is the loyalty card gives them. This now becomes a rather serious issue as there’s a financial impact.

The point is that these are indeed genuine Tesco accounts and as such, they’re now searchable on Have I been pwned?

But of course it also raises a very pertinent question – how did they get pwned? There’s no attribution or disclosure of attack vector, so what was it? And perhaps more tellingly, why are there only a couple of thousand accounts when inevitably Tesco has hundreds of thousands if not many millions of accounts?

I can see multiple ways in which this would happen due to their lax security, let’s take a look at some of the ways Tesco accounts could have very easily been compromised due to ongoing security weaknesses.

Were they really hacked?

As this story unfolds, you’ll see lots of contention due to this:

It is thought the list was drawn up by attackers who combed through data stolen in other high-profile security breaches.
Password and email combinations seen in those large breaches were then tried on the Tesco site and resulted in 2,239 hits where the same credentials were used.

As you’ll read towards the end of this post, this is unlikely based on the analysis I’ve done with other “high-profile security breaches”. Assuming this was the case and the couple of thousand accounts we’re seeing weren’t so much pulled from Tesco directly (i.e. by a SQL injection attack) but rather they were verified with Tesco by the attackers simply checking if the credentials worked, this puts them in the clear, right? I mean it’s an attack on an unrelated website which then exploited the fact that customers reused their credentials, that can’t be Tesco’s fault, can it?

The problem is that Tesco’s security profile makes this sort of attack simple. Their approach to security provides numerous avenues for attackers to easily verify the existence of accounts and then establish their passwords. Let me walk you through the issues and do some educated speculation on how this breach may have occurred.

Account disclosure and enumeration

Assuming the attackers needed to pull accounts one by one and they didn’t simply grab the lot in one fell swoop, one of the first things they’d want to do is establish if an account exists on the site. For example, they may collate a list of email addresses from various sources (including previous breaches) and then check for their existence on Tesco’s website.

One way of doing this would be to attempt to login with an account and any random password then see if the site discloses the existence of the account by saying something like “The password for that account is invalid” or “That account does not exist” as opposed to a more generic response that leaks nothing. Of course this is a process that can (usually) be automated so it’s by no means a laborious one.

Fortunately, Tesco got this right:

Generic message saying username and / or password is incorrect

Note the wording: “the email and/or password”. This is good as it doesn’t disclose the existence of the account. That wording doesn’t happen by accident either, clearly it has been phrased so as not to leak the existence (or non-existence) of the account.

But then this happens:

Message on the password reset saying the account does not exist

And there’s your disclosure. The good efforts invested in the login message are entirely undone by the message on the password reset page. I talk about this exact problem in my post on Everything you ever wanted to know about building a secure password reset feature and describe how this message really needs to be generic and an email sent regardless of the existence of the account. Easy.

So that’s the first piece done – establishing the existence of an account is made easy, now let’s look at discovering the password.

Lack of brute-force protection

One of the simplest ways to “extract” the password from a system is simply to ask it what the password is. What I mean by this is that you can simply say “The account I’m interested in is foo@bar.com, is the password P@ssw0rd?” to which the system will either say “yes” or “no”. It does this implicitly by either providing the message we saw earlier on when I tested an invalid account (except by now the attacker knows it’s valid because of the test via the password reset feature), or by logging the person in.

Now, an attacker does not sit there and simply test a password one by one, rather they “brute force” it by testing many passwords in quick succession using automation. If they can test enough passwords fast enough, eventually one of them works. Back in November we saw an attack of this style against GitHub which was quite sophisticated and allegedly employed 40,000 machines in a botnet to hit the service from all sorts of different directions. Fortunately GitHub was smart enough to have defences in place to protect against a brute force attack so the damage was limited but even so, they strengthened their password policy as a precaution (more on that shortly).

The problem with Tesco is that their brute force protection is “lacking”. I’d be more inclined to say “missing altogether” if it weren’t for this statement on their login page:

For security reasons there is a limit to the number of times incorrect details can be entered before your account is locked

Oh good, so we’re ok then? Let’s test that theory with my own account:

I don’t know what their “limit is”, but there’s 20 consecutive failed login attempts against my account with no lockout. I’m still able to successfully login immediately after all those failed attempts. Put that in a hacker context and someone has just been able to hammer the service with random passwords until one finally works. This is a very well-known means of attack and much has been written about it, notably on sources such as OWASP’s Blocking Brute Force Attacks. They talk about mitigations such as account lockout (which has its own issues), using CAPTCHAs and a variety of other more intelligent countermeasures.

Personally, I’d favour something along the lines of what I talk about in my Hack Yourself First Pluralsight Course – gradually degrade the performance of the login on each subsequent failed attempt. Certainly don’t just allow the service to be continually hammered until there’s a hit.

I don’t know what brute force protection Tesco has in place (if there’s really any), but clearly it’s lacking and this is a prime suspect for how accounts might have been breached. But there’s another factor that significantly increases the risk and that’s a lousy password strength policy. Let me show you what I mean.

Enforced weak credentials

If indeed the credentials for the leaked accounts were brute forced (and that does appear to be a likely scenario), the success of the attack depends heavily on the breadth of passwords that need to be tested for each account. The more possible passwords by virtue of more possible characters and more possible range of characters means the more that needs to be tested.

I’ve written much on passwords in the past so I won’t rehash it here (password cryptography pun intended), but clearly limits on length and character range are extremely detrimental. The length should allow practically whatever a user can dream up (cap the field at a few hundred chars if you really like) and definitely not prohibit any characters. Unfortunately, this is how the password rules on Tesco still stand today:

This was one of the key points I called them on originally:

Oh dear, 10 characters. That’s it. Conventional wisdom – any wisdom – states that passwords should be long, random and unique, the more of each, the better (and please, don’t post that f****** XKCD horse battery comic as a counter-argument). So what’s going on at Tesco? Only 10? I’ll tell you what this says to me and it goes back to the password storage point earlier on: someone’s got themselves a varchar(10) under there somewhere and it’s all sitting in plain text. Of course I can only speculate, but the evidence does seem to suggest this on numerous fronts.

Keep in mind also that the guidance I wrote about at the time ignores the case of passwords:

Passwords must be 6 to 10 chars and not case sensitive

And that six character minimum which allows only a single character type? Absolutely ridiculous. All of this dramatically decreases the character space of passwords which in turn dramatically increases the likelihood of an account being brute forced. This practice almost certainly played a part in the breach if brute force was indeed involved.

Remember how I mentioned GitHub strengthening their password approach? Try this on for size:

GitHub password recommendations

Whilst I’m not a fan of pass phrases and strongly believe in the role of a password manager, it’s a hell of a long way in front of Tesco’s approach. This is a very clearly laid out set of recommendations encouraging length and character variety. Consider this against some of the passwords from the Tesco breach:

190564
abcd1234
password
pickle

Seriously, how long do you think it’s going to take to brute force those?!

Attacking password reuse

The other very likely possibility in terms of how the accounts were exposed is that the attacks simply took any one of numerous publicly disclosed breaches where the passwords appeared in plain text and simply tested them on Tesco. As I’ve written before, password reuse is rampant (59% of accounts in that particular test) so indeed this would come as no surprise.

The thing is though, we (as software developers) know this – people are habitual creatures who reuse creds for all sorts of convenience reasons. But this is precisely why we need to protect against this risk and we need to do so by implementing basic practices such as enforcing a workable minimum criteria and allowing for a decent range of length and characters in the password plus protecting against brute force attacks not just as I’ve described above, but even against a known list of credentials from another breach.

Look at it this way: Tesco is running a loyalty program that has a financial upside for attackers. It has already been attacked in the past and clearly there is something of financial value to evildoers. Under these circumstances, what possible reason would there be for allowing an attacker to login to the system thousands of times over and over again? Something is clearly missing.

Other potential vectors

The attacks I’ve outlined above are the most likely vectors, but there are a number of other contenders that I wouldn’t rule out. For example, they had XSS issues I later wrote about which opens up a whole range of possibilities. If an attacker can execute arbitrary script in the browser by having a victim follow a malicious link then scraping credentials can be a real cinch.

Of course the other possibility you can’t rule out is that the passwords have indeed been pulled directly from their database. One of the most significant points in my original post was that credentials weren’t stored with the appropriate cryptography and indeed at the time, Tesco would actually email them to you. The bottom line is that there are means of retrieving plain text passwords which is alluded to by that terribly ambiguous tweet in the opening of this post. Since then they’ve stopped emailing passwords and moved onto to resets which is good news, but have they actually fixed the underlying data storage? Are all the passwords now secured with a strong hashing algorithm? Or are they still either plain text or asymmetrically encrypted and just one tiny little SQL injection attack away from being exposed publicly? One can only speculate…

Then there’s the web server – it’s still IIS 6. As I previously wrote in the original Tesco post, IIS 6 is well and truly out-dated, in fact it’s only a couple of months short of celebrating its 11th birthday along with the launch of Windows Server 2003. The OS is now in its twilight years with mainstream support having ended nearly four years ago and extended support dropping off next year. The web server has now been superseded four times and is really not where you want to be putting your valuable web software in the year 2014.

To that effect, the web server is still reporting .NET 2.0 as the underlying app version which now goes back to 2005, but of course it could actually be running on anything between 2.0 and 3.5 (released in 2007) given those later releases still report as 2.0 via the server headers. But it’s not reporting as 4.0 and that now goes back four years so they’re still running on software somewhere between seven and nine years old.

Why are IIS and ASP.NET versions important? OWASP frequently talks about keeping frameworks current and indeed newer versions of the technology address newer attack patterns and provide various improvements in the security of web applications both implicitly through improvements in the product and by making new facilities available to the developer. But regardless, when you see software running on a web server that’s so old it’s outlasted most family pets, you begin to question how much love and attention the whole thing gets.

Account reuse in Have I been pwned?

It’s always interesting to see how many of the accounts in a new breach are already in HIBP when I import the data. Here’s what I found when I pulled in the Tesco breach today:

Import result for HIBP showing 332 of 2,239 rows were "updated"

That’s about 15% of accounts “updated”, in other words, those email addresses already appeared in other breaches I had in the system. This is actually quite consistent with the figures I normally see in that between about 10% and 20% of accounts were re-pwned. Bad news for them, but it also tells us something else…

There’s plenty of speculation that these accounts were already breached in a system that wasn’t Tesco’s and that the attackers simply brute forced Tesco with those logins. If that’s the case, then these accounts didn’t come from any of the sources the more than 160 million records I already have in the system came from. Of course HIBP is not exhaustive and there a numerous breaches both known and unknown that could have been the original source of this incident, but what I can say for sure is that the data didn’t come from any of the biggies I’ve loaded in otherwise that 15% would be a hell of a lot closer to 100%.

Moving forward…

2,239 accounts have been breached and are now floating around in the wild, so that’s it, right? Probably not.

What would concern me if I was in Tesco’s shoes is that clearly someone has a workable attack vector that’s exploiting their accounts. Whether they’re brute forcing accounts one by one or simply testing for reused credentials from other breaches, the fact remains that accounts have been compromised en masse. I would not for a moment assume that the extent of the damage is only a couple of thousand accounts, that’s almost certainly only the tip of the iceberg.

Many of the serious security problems that Tesco had in mid-2012 remain both in terms of discrete risks I called out (such as password strength), and as a cultural approach to security in general. There are still numerous easily observable risks discoverable simply by browsing the website, who knows what might lie beneath that and is readily discoverable with a little probing.

Security Have I Been Pwned Tesco

The Tesco hack – here’s how it (probably) happened