Troy Hunt: Everything you wanted to know about SQL injection (but were afraid to ask)

This content is now available in the Pluralsight course "Ethical Hacking: SQL Injection"
Pluralsight Logo

Put on your black hats folks, it’s time to learn some genuinely interesting things about SQL injection. Now remember – y’all play nice with the bits and pieces you’re about to read, ok?

SQL injection is a particularly interesting risk for a few different reasons:

It’s getting increasingly harder to write vulnerable code due to frameworks that automatically parameterise inputs – yet we still write bad code.
You’re not necessarily in the clear just because you use stored procedures or a shiny ORM (you’re aware that SQLi can still get through these, right?) – we still build vulnerable apps around these mitigations.
It’s easily detected remotely by automated tools which can be orchestrated to crawl the web searching for vulnerable sites – yet we’re still putting them out there.

It remains number one on the OWASP Top 10 for a very good reason – it’s common, it’s very easy to exploit and the impact of doing so is severe. One little injection risk in one little feature is often all it takes to disclose every piece of data in the whole system – and I’m going to show you how to do this yourself using a raft of different techniques.

I demonstrated how to protect against SQLi a couple of years back when I wrote about the OWASP Top 10 for .NET developers so I’m not going to focus on mitigation here, this is all about exploiting. But enough of the boring defending stuff, let’s go break things!

All your datas are belong to us (if we can break into the query context)

Let’s do a quick recap on what it is that makes SQLi possible. In a nutshell, it’s about breaking out of the data context and entering the query context. Let me visualise this for you; say you have URL that includes a query string parameter such as “id=1” and that parameter makes its way down into a SQL query such as this:

SELECT * FROM Widget WHERE ID = 1

The entire URL probably looked something like this:

Pretty basic stuff, where it starts to get interesting is when you can manipulate the data in the URL such that it changes the value passed to the query. Ok, changing “1” to “2” will give you a different widget and that’s to be expected, but what if you did this:

http://widgetshop.com/widget/?id=1 or 1=1

That might then persist through to the database server like so:

SELECT * FROM Widget WHERE ID = 1 OR 1=1

What this tells us is that the data is not being sanitised – in the examples above the ID should clearly be an integer yet the value “1 OR 1=1” has been accepted. More importantly though, because this data has simply been appended to the query it has been able to change the function of the statement. Rather than just selecting a single record, this query will now select all records as the “1=1” statement will always be true. Alternatively, we could force the page to return no records by changing “or 1=1” to “and 1=2” as it will always be false hence no results. Between these two alternatives we can easily assess if the app is at risk of an injection attack.

This is the essence of SQL injection – manipulating query execution with untrusted data – and it happens when developers do things like this:

query = "SELECT * FROM Widget WHERE ID = "+ Request.QueryString["ID"];
// Execute the query...

Of course what they should be doing is parameterising the untrusted data but I’m not going to go into that here (refer back to part one of my OWASP series for more info on mitigation), instead I want to talk more about exploiting SQLi.

Ok, so that background covers how to demonstrate that a risk is present, but what can you now do with it? Let’s start exploring some common injection patterns.

Joining the dots: Union query-based injection

Let’s take an example where we expect a set of records to be returned to the page, in this case it’s a list of widgets of “TypeId” 1 on a URL like this:

http://widgetshop.com/Widgets/?TypeId=1

The result on the page then looks like so:

3 widgets returned to the page

We’d expect that query to look something like this once it hits the database:

SELECT Name FROM Widget WHERE TypeId = 1

But if we can apply what I’ve outlined above, namely that we might be able to just append SQL to the data in the query string, we might be able to do something like this:

http://widgetshop.com/Widgets/?TypeId=1 union all select name from sysobjects where xtype='u'

Which would then create a SQL query like so:

SELECT Name FROM Widget WHERE TypeId = 1 union all select name from sysobjects where xtype='u'

Now keep in mind that the sysobjects table is the one that lists all the objects in the database and in this case we’re filtering that list by xtype “u” or in other words, user tables. When an injection risk is present that would mean the following output:

3 widgets returned to the page followed by 2 internal table names

This is what’s referred to as a union query-based injection attack as we’ve simply appended an additional result set to the original and its made its way out directly into the HTML output – easy! Now that we know there’s a table called “User” we could do something like this:

http://widgetshop.com/Widgets/?TypeId=1 union all select password from [user]

SQL Server gets a bit uppity if the table name of “user” is not enclosed in square brackets given the word has other meanings in the DB sense. Regardless, here’s what that gives us:

3 widgets returned to the page followed by a password

Of course the UNION ALL statement only works when the first SELECT statement has the same number of columns as the second. That’s easily discoverable though, you just try going with a bit of ”union all select ‘a’” then if that fails “union all select ‘a’, ‘b’” and so on. Basically you’re just guessing the number of columns until things work.

We could go on and on down this path and pull back all sorts of other data, let’s move on to the next attack though. There are times when a union-based attack isn’t going to play ball either due to sanitisation of the input or how the data is appended to the query or even how the result set is displayed to the page. To get around that we’re going to need to get a bit more creative.

Making the app squeal: Error-based injection

Let’s try another pattern – what if we did this:

http://widgetshop.com/widget/?id=1 or x=1

Hang on, that’s not valid SQL syntax, the “x=1” piece won’t compute, at least not unless there’s a column called “x” so won’t it just throw an exception? Precisely, in fact it means you’ll see an exception like this:

Invalid column name 'x'

This an ASP.NET error and other frameworks have similar paradigms but the important thing is that the error message is disclosing information about the internal implementation, namely that there is no column called “x”. Why is this important? It’s fundamentally important because once you establish that an app is leaking SQL exceptions, you can do things like this:

http://widgetshop.com/widget/?id=convert(int,(select top 1 name from sysobjects where id=(select top 1 id from (select top 1 id from sysobjects where xtype='u' order by id) sq order by id DESC)))

That’s a lot to absorb and I’ll come back to it in more detail, the important thing is though that it will yield this result in the browser:

Conversion failed when converting the varchar value 'Widget' to data type int.

And there we have it, we’ve now discovered that there is a table in the database called “Widget”. You’ll often see this referred to as “Error-based SQL injection” due to the dependency on internal errors. Let’s deconstruct the query from the URL:

convert(int, (
    select top 1 name from sysobjects where id=(
      select top 1 id from (
        select top 1 id from sysobjects where xtype='u' order by id
      ) sq order by id DESC
    )
  )
)

Working from the deepest nesting up, get the first record ID from the sysobjects table after ordering by ID. From that collection, get the last ID (this is why it orders in descending) and pass that into the top select statement. That top statement is then only going to take the table name and try to convert it to an integer. The conversion to integer will almost certainly fail (please people, don’t name your tables “1” or “2” or any other integer for that matter!) and that exception then discloses the table name in the UI.

Why three select statements? Because it means we can go into that innermost one and change “top 1” to “top 2” which then gives us this result:

Conversion failed when converting the varchar value 'User' to data type int.

Now we know that there’s a table called “User” in the database. Using this approach we can discover all the column names of each table (just apply the same logic to the syscolumns table). We can then extend that logic even further to select data from table columns:

Conversion failed when converting the varchar value 'P@ssw0rd' to data type int.

In the screen above, I’d already been able to discover that there was a table called “User” and a column called “Password”, all I needed to do was select out of that table (and again, you can enumerate through all records one by one with nested select statements), and cause an exception by attempting to convert the string to an int (you can always append an alpha char to the data if it really is an int then attempt to convert the whole lot to an int which will cause an exception). If you want to get a sense of just how easy this can be, I recorded a little video last year where I teach my 3 year old to automate this with Havij which uses the technique.

But there’s a problem with all this – it was only possible because the app was a bit naughty and exposed internal error messages to the general public. In fact the app quite literally told us the names of the tables and columns and then disclosed the data when we asked the right questions, but what happens when it doesn’t? I mean what happens when the app is correctly configured so as not to leak the details of internal exceptions?

This is where we get into “blind” SQL injection which is the genuinely interesting stuff.

Hacking blind

In the examples above (and indeed in many precedents of successful injection attacks), the attacks are dependent on the vulnerable app explicitly disclosing internal details either by joining tables and returning the data to the UI or by raising exceptions that bubble up to the browser. Leaking of internal implementations is always a bad thing and as you saw earlier, security misconfigurations such as this can be leveraged to disclose more than just the application structure, you can actually pull data out through this channel as well.

A correctly configured app should return a message more akin to this one here when an unhandled exception occurs:

Error. An error occurred while processing your request.

This is the default error page from a brand new ASP.NET app with custom errors configured but again, similar paradigms exist in other technology stacks. Now this page is exactly the same as the earlier ones that showed the internal SQL exceptions but rather than letting them bubble up to the UI they’re being hidden and a friendly error message shown instead. Assuming we also couldn’t exploit a union-based attack, the SQLi risk is entirely gone, right? Not quite…

Blind SQLi relies on us getting a lot more implicit or in other words, drawing our conclusions based on other observations we can make about the behaviour of the app that aren’t quite as direct as telling us table names or showing column data directly in the browser by way of unions or unhandled exceptions. Of course this now begs the question – how can we make the app behave in an observable fashion such that it discloses the information we had earlier without explicitly telling us?

We’re going to look at two approaches here: boolean-based and time-based.

Ask, and you shall be told: Boolean-based injection

This all comes down to asking the right questions of the app. Earlier on, we could explicitly ask questions such as “What tables do you have” or “What columns do you have in each table” and the database would explicitly tell us. Now we need to ask a little bit differently, for example like this:

http://widgetshop.com/widget/?id=1 and 1=2

Clearly this equivalency test can never be true – one will never be equal to two. How an app at risk of injection responds to this request is the cornerstone of blind SQLi and it can happen in one of two different ways.

Firstly, it might just throw an exception if no record is returned. Often developers will assume that a record referred to in a query string exists because it’s usually the app itself that has provided the link based on pulling it out of the database on another page. When there’s no record returned, things break. Secondly, the app might not throw an exception but then it also won’t display a record either because the equivalency is false. Either way, the app is implicitly telling us that no records were returned from the database.

Now let’s try this:

1 and
(
  select top 1 substring(name, 1, 1) from sysobjects where id=(
    select top 1 id from (
      select top 1 id from sysobjects where xtype='u' order by id
    ) sq order by id desc
  )
) = 'a'

Keeping in mind that this entire block replaces just the query string value so instead of “?id=1” it becomes “?id=1 and…”, it’s actually only a minor variation on the earlier requests intended to retrieve table names. In fact the main different is that rather than attempting to cause an exception by converting a string to an integer, it’s now an equivalency test to see if the first character of the table name is an “a” (we’re assuming a case-insensitive collation here). If this request gives us the same result as “?id=1” then it confirms that the first table in sysobjects does indeed begin with an “a” as the equivalency has held true. If it gives us one of the earlier mentioned two scenarios (an error or shows no record), then we know that the table doesn’t begin with an “a” as no record has been returned.

Now all of that only gives us the first character of the table name from sysobjects, when you want the second character then the substring statement needs to progress to the next position:

select top 1 substring(name, 2, 1) from sysobjects where id=(

You can see it now starts at position 2 rather than position 1. Of course this is laborious; as well as enumerating through all the tables in sysobjects you end up enumerating through all the possible letters of the alphabet until you get a hit then you have to repeat the process for each character of the table name. There is, however, a little shortcut that looks like this:

1 and
(
  select top 1 ascii(lower(substring(name, 1, 1))) from sysobjects where id=(
    select top 1 id from (
      select top 1 id from sysobjects where xtype='u' order by id
    ) sq order by id desc
  )
) > 109

There’s a subtle but important difference here in that what’s it doing is rather than checking for an individual character match, it’s looking for where that character falls in the ASCII table. Actually, it’s first lowercasing the table name to ensure we’re only dealing with 26 characters (assuming alpha-only naming, of course), then it’s taking the ASCII value of that character. In the example above, it then checks to see if the character is further down the table than the letter “m” (ASCII 109) and then of course the same potential outcomes as described earlier apply (either a record comes back or it doesn’t). The main difference is that rather than potentially making 26 attempts at guessing the character (and consequently making 26 HTTP requests), it’s now going to exhaust all possibilities in only 5 – you just keep halving the possible ASCII character range until there’s only one possibility remaining.

For example, if greater than 109 then it must be between “n” and “z” so you split that (roughly) in half and go greater than 115. If that’s false then it must be between “n” and “s” so you split that bang in half and go greater than 112. That’s true so there’s only three chars left which you can narrow down to one in a max of two guesses. Bottom line is that the max of 26 guesses (call it average of 13) is now done in only 5 as you simply just keep halving the result set.

By constructing the right requests the app will still tell you everything it previously did in that very explicit, rich error message way, it’s just that it’s now being a little coy and you have to coax the answers out of it. This is frequently referred to as “Boolean-based” SQL injection and it works well where the previously demonstrated “Union-based” and “Error-based” approaches won’t fly. But it’s also not fool proof; let’s take a look at one more approach and this time we’re going to need to be a little more patient.

Disclosure through patience: Time-based blind injection

Everything to date has worked on the presumption that the app will disclose information via the HTML output. In the earlier examples the union-based and error-based approaches gave us data in the browser that explicitly told us object names and disclosed internal data. In the blind boolean-based examples we were implicitly told the same information by virtue of the HTML response being different based on a true versus a false equivalency test. But what happens when this information can’t be leaked via the HTML either explicitly or implicitly?

Let’s imagine another attack vector using this URL:

http://widgetshop.com/Widgets/?OrderBy=Name

In this case it’s pretty fair to assume that the query will translate through to something like this:

SELECT * FROM Widget ORDER BY Name

Clearly we can’t just starting adding conditions directly into the ORDER BY clause (although there are other angles from which you could mount a boolean-based attack), so we need to try another approach. A common technique with SQLi is to terminate a statement and then append a subsequent one, for example like this:

http://widgetshop.com/Widgets/?OrderBy=Name;SELECT DB_NAME()

That’s a pretty innocuous one (although certainly discovering the database name can be useful), a more destructive approach would be to do something like “DROP TABLE Widget”. Of course the account the web app is connecting to the database with needs the rights to be able to do this, the point is that once you can start chaining together queries then the potential really starts to open up.

Getting back to blind SQLi though, what we need to do now is find another way to do the earlier boolean-based tests using a subsequent statement and the way we can do that is to introduce is a delay using the WAITFOR DELAY syntax. Try this on for size:

Name;
IF(EXISTS(
  select top 1 * from sysobjects where id=(
    select top 1 id from (
      select top 1 id from sysobjects where xtype='u' order by id
    ) sq order by id desc
  ) and ascii(lower(substring(name, 1, 1))) > 109
)) 
WAITFOR DELAY '0:0:5'

This is only really a slight variation of the earlier examples in that rather than changing the number of records returned by manipulating the WHERE clause, it’s now just a totally new statement that looks for the presence of a table at the end of sysobjects beginning with a letter greater than “m” and if it exists, the query then takes a little nap for 5 seconds. We’d still need to narrow down the ASCII character range and we’d still need to move through each character of the table name and we’d still need to look at other tables in sysobjects (plus of course then look at syscolumns and then actually pull data out), but all of that is entirely possible with a bit of time. 5 seconds may be longer than needed or it may not be long enough, it all comes down to how consistent the response times from the app are because ultimately this is all designed to manipulate the observable behaviour which is how long it takes between making a request and receiving a response.

This attack – as with all the previous ones – could, of course, be entirely automated as it’s nothing more than simple enumerations and conditional logic. Of course it could end up taking a while but that’s a relative term; if a normal request takes 1 second and half of the 5 attempts required to find the right character return true then you’re looking at 17.5 seconds per character, say 10 chars in an average table name is about 3 minutes a table then maybe 20 tables in a DB so call it one hour and you’ve discovered every table name in the system. And that’s if you’re doing all this in a single-threaded fashion.

It doesn’t end there…

This is one of those topics with a heap of different angles, not least of which is because there are so many different combinations of database, app framework and web server not to mention a whole gamut of defences such as web application firewalls. An example of where things can get tricky is if you need to resort to a time-based attack yet the database doesn’t support a delay feature, for example an Access database (yes, some people actually do put these behind websites!) One approach here is to use what’s referred to as heavy queries or in other words, queries which by their very nature will cause a response to be slow.

The other thing worth mentioning about SQLi is that two really significant factors play a role in the success an attacker has exploiting the risk: The first is input sanitisation in terms of what characters the app will actually accept and pass through to the database. Often we’ll see very piecemeal approaches where, for example, angle brackets and quotes are stripped but everything else is allowed. When this starts happening the attacker needs to get creative in terms of how they structure the query so that these roadblocks are avoided. And that’s kind of the second point – the attacker’s SQL prowess is vitally important. This goes well beyond just your average TSQL skills of SELECT FROM, the proficient SQL injector understands numerous tricks to both bypass the input sanitisation and select data from the system in such a way that it can be retrieved via the web UI. For example, little tricks like discovering a column type by using a query such as this:

http://widgetshop.com/Widget/?id=1 union select sum(instock) from widget

In this case, error-based injection will give tell you exactly what type the “InStock” column is when the error bubbles up to the UI (and no error will mean it’s numeric):

Operand data type bit is invalid for sum operator.

Or once you’re totally fed up with the very presence of that damned vulnerable site still being up there on the web, a bit of this:

http://widgetshop.com/Widget/?id=1;shutdown

But injection goes a lot further than just pulling data out via HTTP, for example there are vectors that will grant the attacker shell on the machine. Or take another tangent – why bother trying to suck stuff out through HTML when you might be able to just create a local SQL user and remotely connect using SQL Server Management Studio over port 1433? But hang on – you’d need the account the web app is connecting under to have the privileges to actually create users in the database, right? Yep, and plenty of them do, in fact you can find some of these just by searching Google (of course there is no need for SQLi in these cases, assuming the SQL servers are publicly accessible).

Lastly, if there’s any remaining doubt as to both the prevalence and impact of SQLi flaws in today’s software, just last week there was news of what is arguably one of the largest hacking schemes to date which (allegedly) resulted in losses of $300 million:

The indictment also suggest that the hackers, in most cases, did not employ particularly sophisticated methods to gain initial entry into the corporate networks. The papers show that in most cases, the breach was made via SQL injection flaws -- a threat that has been thoroughly documented and understood for well over than a decade.

Perhaps SQLi is not quite as well understood as some people think.

Security SQL Injection

Everything you wanted to know about SQL injection (but were afraid to ask)