Mastodon

Why No HTTPS? Here's the World's Largest Websites Not Redirecting Insecure Requests to HTTPS

As of today, Google begins shipping Chrome 68 which flags all sites served over the HTTP scheme as being "not secure". This is because the connection is, well, not secure so it seems like a fairly reasonable thing to say! We've known this has been coming for a long time now both through observing the changes in the industry and Google specifically saying "this is coming". Yet somehow, we've arrived at today with a sizable chunk of the web still serving traffic insecurely:

Who are these people?! After all the advanced warnings combined with all we know to be bad about serving even static sites over HTTP, what sort of sites are left that are neglecting such a fundamental security and privacy basic? I wanted to find out which is why today, in conjunction with Scott Helme, we're launching Why No HTTPS? You can find it over at WhyNoHTTPS.com (served over HTTPS, of course), and it's a who's who of the world's biggest websites not redirecting insecure traffic to the secure scheme:

The World's Most Popular Websites Loaded Insecurely

And just to keep it interesting for local players, we've also broken it down by country:

Australia - The Most Popular Websites Loaded Insecurely

The site is live and you can go and check it out now. But stick around and read more if you're interested in how we put this together because what first seemed like a good way to spend an afternoon just wiped out the better part of my last week!

Data Source

For the last few years, Scott has been continuously crawling the Alexa Top Million websites and publishing 6-monthly reports on their security things. One of those things is their use of HTTPS or more specifically, if the site redirects any insecure requests to the secure scheme. Back in Feb when he last collated the figures it was just over 38% of the world's largest sites, but that number was almost a third larger than just 6 months ago. Continuing that growth rate would take the number to something similar to Cloudflare's above as of now.

Because he's the sharing kind of guy, Scott also just released all the data from his crawler via a new site at crawler.ninja:

crawler.ninja

If people want to go and pull the same data as we've used here, go and check out what Scott's published. So this should all be easy, right? All I had to do was stand a curated version of Scott's list up on a new website and slice it up by country...

Resolving Domain Names to Countries is Hard

In my naivety, the plan was simple: get Scott's list, get a list of which country each domain belongs to, join it all up and it's job done. Ha! No.

I started by setting the domain's country per Alexa's definition. Obviously Baidu is China and Fedex is the US, no problems there. But what about Cambridge University Press? Apparently, it's Chinese:

cambridge.org in Alexa

How about the Daily Mail? American!

dailymail.co.uk in Alexa

The domain literally has the UK TLD in it yet somehow, it's American. Mind you, when I browse to dailymail.co.uk I end up with a very localised Australian version:

Daily Mail in Australia

And over here, their domain resolves to 104.117.181.234 which is an Akamai IP in the US but if I VPN into London, it begins resolving to 23.54.158.17 which is an Akamai IP in the UK. In short, I totally get how attributing a domain name back to a country of origin can go wrong.

Supplementing the Alexa results, I also pulled in a heap of other info about domains, IPs and the countries they resolve to (big thanks to "the crowd" that assisted with this). I used that to help fill in a bunch of the gaps left after the Alexa data was pulled in.

The reason why starts to make more sense when you consider how the country is defined:

The rank by country is calculated using a combination of average daily visitors to this site and pageviews on this site from users from that country over the past month. The site with the highest combination of visitors and pageviews is ranked #1 in that country.

So maybe there's just lots of Chinese folks interested in reading Cambridge uni content and more people from America reading the Daily Mail on the .uk TLD than there are Brits. Mind you, Have I Been Pwned is flagged as American and if I'm honest, I don't know what country it should be flagged as. Australian because I run it? American because it's hosted in the US? Or American because that's the single largest country in terms of audience volumes? Other? This is far from an exact science and there are many ambiguous cases, but certainly the Cambridge and Daily Mail ones seem clear to me.

Then there's also TLDs and the Daily Mail situation above is a perfect example of where the last part of the domain name makes the origin pretty clear. Shouldn't .uk domains always "belong" to the UK? I'm sure there are cases where they actually shouldn't, but in terms of attributing websites to countries it seemed like a pretty fair idea. On the surface of it, that sounds like it makes sense, until you get to a country like Tuvalu. What's so special about this little island nation with a population of only 10k people? Their country TLD is .tv which means you get sites like cima4u.tv in Egypt, 33sk.tv in Saudi Arabia and zimuzu.tv in China all using the little Polynesian island's TLD because hey, ".tv" sounds kinda cool, right?

Ultimately, I decided to manually fix a few cases which were obviously wrong and towards the top of the list, but then default primarily to Alexa's country definition followed by country TLDs where there wasn't any other data available. There will be many wrong entries, mostly for the likes of little countries like Tuvalu with cool domains and I welcome any improvements people can help with on that front (ideally a list of domains mapped to 2-char country codes!)

But hey, so long as the site is counted, does it really matter too much which country it's counted against? I mean at least it's clear that it's enabling insecure communications, right? Uh, no, this is where things get really weird...

Varying Results Across the Globe

Depending on where you are in the world, visiting a website might get you different results. I'm sure most of you know that already, but we weren't prepared for just how weird things can get and believe me, we've seen some very weird stuff before!

It all started when I was reviewing the Aussie list and saw the Australian Government Department of Home Affairs towards the top of the pile. I'm thinking "crikey, this'll get them some headlines", and then I loaded the site:

Australian Home Affairs Website Loaded Over HTTPS

So clearly Scott just isn't computering very well, right? But when I called him on it he presented the evidence that caused his crawler to identify the site as not redirecting to HTTPS:

Home Affairs over HTTP

The title tag says it all - he got a maintenance page which didn't redirect to a secure connection. Now keep in mind that we were both testing this at precisely the same time so what we're seeing here is a vastly different experience depending where in the world you are.

Now here's the challenge - in a case like this, should Home Affairs be flagged as not secure and included on our website? It might redirect for me but if it doesn't for some people in other parts of the world, has HTTPS been done "properly"? In the case of Home Affairs, they don't have HSTS enabled either which means people could genuinely be MitM'd when going to the site in that situation.

We went back and forth on this and in the end, we decided the most useful thing to do was to re-scan every site in this report from my end and if I see it redirecting to HTTPS, drop it from the list. Problem solved, right? No, things just got weirder.

For example, here's Scott loading the NVIDIA site from his home in the UK:

NVIDIA - Scott

And here's me loading it from Australia:

NVIDIA - Troy

And here's a government site for teachers in Bangladesh:

teachers.gov.bd

And here it is again:

www.teachers.gov.bd

The difference? Other than the obvious HTTPS situation, the "naked" domain doesn't redirect to HTTPS but the www version does. Let's get weirder still and I'll share Scott's words exactly:

Scott talking about teachers.gov.bd

OMFG WTF indeed. That site actually has an HSTS header too but without it being pre-loaded (which it's not), you need to actually get through to the site in the first place for it to stick.

So I took the top 100 sites for each country I'd identified and re-scanned them all from my end. In total, this meant 12,363 separate domains and only 56 of them redirected to HTTPS. Those domains are as follows:

9now.com.au, ajhackett.com, and6.com, asarar.com, bmcargo.com, bmoinvestorline.com, businessinsider.com, camelia.lt, canaldigital.no, discountsqatar.com, esf.edu.hk, expressodasilhas.cv, fb-killa.pro, fileagram.com, funsaber.net, gorila.sk, hfiles.xyz, homeaffairs.gov.au, homebook.pl, immi.gov.au, jobkorea.co.kr, jobomas.com, jobtome.com, just-eat.ca, just-eat.dk, just-eat.ie, justeat.it, just-eat.no, k-citymarket.fi, kongregate.com, lu.se, lundi.am, mm1ink.net, motc.gov.mm, musictri.be, newsghana.com.gh, nvidia.com, oi.com.br, patch.com, pennsy.cm, piluli.kharkov.ua, pis.sk, pro24h.info, rabodirect.co.nz, registronacional.com, so.ch, softbank.jp, teachers.gov.bd, tiscali.it, tvplusnewtabsearch.com, uio.no, vodafone.it, williamhill.com, williamhill.es, windguru.cz, zeris-nuclis.pt

So here's the bottom line: these sites are all on WhyNoHTTPS.com because even though under some circumstances they're redirecting, under others they're not. Now mind you, that list of 56 sites represents 0.45% of the 12,363 I scanned so even whilst they're anomalous, they have very little bearing on the overall objective of the project. And just to be clear, there'll also be sites not on the list which still operate over non-secure connections, they just weren't doing it when Scott's crawler hit it. This is not an exact science, folks.

I was honestly on the fence about how to handle these and I felt, for example, that the Home Affairs website shouldn't be there because the response seems exceptional, but the NVIDIA site definitely should as the entire browsing experience Scott has in the UK is over an insecure connection. I don't even know what I think about the Bangladeshi teachers site beyond echoing Scott's OMFG WTF sentiment. Eventually, we decided that the fairest and most manageable thing to do was to depend solely on Scott's crawler. If a site isn't doing HTTPS consistently right, then it may end up on the list and if you're responsible for one of those, the best way to get it off the list is to always redirect insecure requests to secure ones under all circumstances.

Future

The original intention had been to keep this list updating on a regular cadence and that's still the plan, but it's not happening yet. A heap of time disappeared when I wasn't expecting it to and I had to pull out all stops to complete this in time and meet other commitments as well. But I do intend to properly automate those updates because I think this is a really interesting list to track. My hope from this exercise is that the list is used to petition the various sites to get their HTTPS things in order and I really want to see the "reformed" sites drop off and the Alexa rank of those remaining getting continually higher.

One of the things we didn't end up doing due to a combination of time and lack of reliable data was to categorise sites and show reports on that basis. For example, what are the largest airlines, banks, shopping sites etc that still aren't properly HTTPS'ing their things? If someone is willing to contribute a reliable list, I'll happily update the site. I did have some people make some great suggestions of sources for the data, but frankly I really need someone to grab Scott's 1M list and put a category next to each one, I'm just too tight on time at present to try and track it all down myself.

I know the data we've collated on this website will cause questions to be asked, suggestions to be made and inevitably, amendments to be requested. Use the comments section below, I'm sure many people will have good ideas around how we can make this data more useful and ultimately, help accelerate the push to a secure by default web. For now, a site is only going to drop off the list if it does HTTPS correctly when the crawler next comes by and I update the site, but I'll manually amend any incorrect country definitions if people spot obvious faults and have the correct values available for me to load.

I hope this little site helps people give organisations the push they need to go HTTPS. Sometimes, they just need a little bit of help:

Edit: If you're wondering why you test a site on the list and find it successfully redirecting to HTTPS, please do read the post above before commenting. There are many different circumstances where the same site will or will not redirect. Of course there's also the possibility the site has rectified their HTTPS implementation since the scan in which case they'll drop off after the next set of data is published. Regardless, we can't feasibly investigate and reproduce each finding counter to Scott's crawler and as it is, there's a lot of reports coming through which simply aren't consistent with tests from other browsers in other parts of the world right now. The bottom line remains this: when it's done properly, everyone gets redirected to the secure scheme under all circumstances and that's what'll cause the site to drop off this list.

Security SSL
Tweet Post Update Email RSS

Hi, I'm Troy Hunt, I write this blog, create courses for Pluralsight and am a Microsoft Regional Director and MVP who travels the world speaking at events and training technology professionals