Troy Hunt: To route or not to route, that is the question

When I wrote about Building a safer web with ASafaWeb earlier in the week, I talked about using the process to share some experiences. This one made me go a bit cross-eyed and it’s a combination of an idiosyncrasy within ASP.NET routing and a more philosophical question about the semantic intent of a route.

The situation was that I needed to construct a URL on the ASafaWeb website which contained the address of the site to be scanned and could be accessed via an HTTP GET request. The reason I want to tackle it this way is so that this URL can be passed around in the fashion of “Hey, look at the scan result I just got” and all the information required to execute an identical scan is encapsulated within the address.

Plan “A”

Plan “A” was to include the URL in a route parameter so that a scan for troyhunt.com would look like this:

http://asafaweb.com/scan/troyhunt.com

That’s all well and good and it’s easily achievable by registering a route such as this:

routes.MapRoute("Scan", "Scan/{*url}", new { controller = "Scan",
action = "Index" });

This happily passes off to the ScanController and executes the Index action which performs the scan like so:

//
// GET: /Scan/asafaweb.com

public ActionResult Index(string url)
{
  // Start scanning stuff...

But what if the URL to be scanned was beneath the root of the domain? What if it looked like this:

http://asafaweb.com/scan/troyhunt.com/Search

Well that’s fine because the URL pattern includes a wildcard for everything after the "”Scan/” path; that’s why there’s a star in “{*url}”. But it begins to introduce a semantic problem in that the forward slash in the URL is implying structure when in fact it’s just part of the parameter. Let’s hold off on the philosophy for now and we’ll come back to it in a bit.

How about this pattern:

http://asafaweb.com/scan/troyhunt.com/Search#result1

Now we’ve got a real problem because the hash is implying a fragment identifier and is treated as a special character in the URL. When the controller reads in that “url” parameter, the hash and everything after it is omitted. Clearly the route parameter needs to be URL encoded so instead of the one you see above, it ends up looking more like this:

http://asafaweb.com/scan/troyhunt.com%2fSearch%23result1

Right, so that gets over that hurdle. Up until now, I’ve been cheating a little and not including the protocol in the URL to be scanned. Dropping off the “http://” keeps the whole thing more succinct and it’s almost always implied where not specifically stated anyway. So what happens when it’s an HTTPS address? Applying the same URL encoding principle we’ll get this:

http://asafaweb.com/scan/https%3a%2f%2ftroyhunt.com%2fSearch%23result1

And this is where it all starts to go really wrong:

Encoded route causing an HTTP 400 in Casini

Ah crap. Now we’ve got a whole new problem because the “:” character (encoded as %3a), is another special character and it’s appearance is causing an HTTP 400 “Bad request” even though it’s encoded. But it only happens when running directly from Visual Studio via F5 (Casini obviously playing a role here).

But I often test directly against my local instance of IIS so let’s see what’s going on there:

Encoded route causing an invalid URL in IIS

Ok, potentially some validation missing from further upstream so I trace out the URL value that’s passed into the scan controller:

https:/troyhunt.com/Search#result1

See the problem? The double forward slash before the domain has been treated as an escape pattern and resolved to just a single slash. Ugh.

But wait – it gets worse! So far I’ve only shown what happens when the URL is entered directly into the address bar. Obviously that’s not going to happen in the live system and instead it will involve entering a URL into a text box which then requests the scan page by passing the address in an HTTP GET request. For now, this text box just sits on the home page which is attached to a view model with an attribute called “ScanUrl” and an action like this:

[HttpPost]
public ActionResult Index(ScanRequestViewModel model)
{
  if (ModelState.IsValid)
  {
    return RedirectToRoute("Scan", new { url = model.ScanUrl });

This looks ok and the route name maps neatly to the route I created earlier on. Let’s now try passing that last URL – the HTTPS one with the hash – to the action above:

RedirectToRoute causing an HTTP 400

See anything odd? Other than the big freakin’ error message? The URL has encoded the colon and hash symbols but not the forward slashes. Actually, the two slashes before the domain have been unescaped into one again but the one after the domain is still alive and kicking. How can this be? Isn’t RedirectToRoute meant to URL encode the route values?

I turned to Stack Overflow and asked why RedirectToRoute is double-encoding the "/" in parameters that are URLs in the routeValues. I got a great answer from David Duffett which boils down to the fact that the redirect isn’t encoding, it’s escaping. Now this is both odd and important because the end result is that a route value with a forward slash simply can’t be correctly encoded into a fashion which is suitable for passing around via a URL. And don’t even think about trying to encode before hitting RedirectToRoute because whilst your forward slash will be successfully encoded into %2f, the redirect will then encode the percent sign so you end up with double-encoding. Not pretty.

Plan “B”

Without implying that I simply gave up on the route pattern (that’s only partially true!), I decided to run with plan “B” – the trusty old query string. This turns that last address with the lot into something more like this:

http://asafaweb.com/scan?url=https%3a%2f%2ftroyhunt.com%2fSearch%23result1

Complexity wise within MVC, this is a slightly simpler proposition as there’s no custom route but more importantly, it just works! That last code snippet that was redirecting to a route is now redirecting to an action:

return RedirectToAction("Index", "Scan", new { Url = model.ScanUrl });

It doesn’t miss forward slash encoding, it doesn’t escape or double-encode, it just does what you’d expect.

Admittedly I’m writing this tired and like I said earlier, the whole encoding thing has made me a bit cross-eyed over the last day and a bit so I could well be missing something obvious. But from where I’m ~~sitting~~ slouched into a near slumber, URL parameters just don’t play nice with routes.

Is a URL a route parameter anti-pattern?

Let me get a bit philosophical for a moment – is a URL a route anti-pattern? I mean should you ever have a route which includes a parameter which is another URL? The whole idea of routing is to present a user friendly address which implies some semantics about the resource it’s accessing or the action it’s performing; does a potentially lengthy route parameter full of escape characters really do this?

Regardless of the implementation challenges faced above, there’s something about a URL like this which just feels, well, a little bit wrong:

http://asafaweb.com/scan/https%3a%2f%2ftroyhunt.com%2fSearch%23result1

Is it just me? Or is a URL totally out of place in a route?

.NET ASafaWeb MVC

To route or not to route, that is the question

Plan “A”

Plan “B”

Is a URL a route parameter anti-pattern?

Troy Hunt

Upcoming Events

Must Read