Wednesday, August 10, 2011

Overcoming SQL 08’s globally insensitive time zones using .NET

Wednesday, August 10, 2011

I seem to spend a lot of time involved with web apps which end up having a lot of geographical diversity. Either they sit in a server in one country then get used by folks somewhere else or more often than not, they face audiences of a global nature spread out across varying time zones. And even if they do end up co-located, chances are it won’t always stay that way so there’s always a desire to add in a little future-proofing.

When SQL 08 came along there seemed to be some new hope for making this process a little easier through the introduction of a few new date and time related data types, particularly the datetimeoffset type. Unfortunately all that glitters is not gold in this case and the new data type can be a real “gotcha”. Here’s how to build in that geo-awareness from the ground up using the new datetime2 data type and without getting caught with your metaphorical pants down.

What’s a datetime2 and how is it different to a datetime?

The classic datetime data type was pretty rudimentary and had some serious limitations. The one that regularly caused headaches for me was that the range was limited to 1st January 1753 through 31st December 9999. Now that might sound like a pretty liberal range but start consuming data from an Oracle based system which allows a minimum date value of 1st January 4712 BC and you can see the impedance mismatch. Now, if you consume from that upstream system and someone hasn’t done a particularly good job of their data validation (or they have a legitimate need for dates less than 1753), then someone else’s lack of due diligence is now your problem.

A datetime2 data type in SQL 08 will let us go down to 01/01/01 so the old datetime range problem is pretty much abolished (there aren’t many use cases out there for BC-based dates). The other thing you get on top of range is precision so instead of being limited to 3.33 milliseconds you can now go all the way down to 100 nanoseconds. That’s probably not important for your average web app but you can now also define the precision in the data type just like you typically would with a decimal so you can always save a couple of bytes by being less specific.

So what now with the humble datetime type that served us so well for so many years? According to the TechNet documentation:

Use the time, date, datetime2 and datetimeoffset data types for new work.

So in short, datetime is dead.

What’s a SQL 08 datetimeoffset?

For all intents and purposes, a datetimeoffset is a datetime2 with time zone awareness in the range of plus or minus 14 hours from GMT plus a couple of bytes storage penalty to track it. What this means is that we can store the date plus time and the time zone offset within one field.

Let’s see it in practice: say I want to store today’s date and time at midday from two different sources where one of them relates to midday in Sydney and the other one relates to midday in New York. Note the offset component of the following date declarations:

DECLARE @SydneyMidday DATETIMEOFFSET
SET @SydneyMidday = '2010-08-10 12:00:00 +10:00'

DECLARE @NewYorkMidday DATETIMEOFFSET
SET @NewYorkMidday = '2010-08-10 12:00:00 -5:00'

If we stored these values as datetime or datetime2 data types (without the offset component, of course), they’d be considered identical times. However, if we now run the following:

SELECT DATEDIFF(hh, @SydneyMidday, @NewYorkMidday)

We get a result of 15 so the datediff function has been able to successfully identify there’s actually more than half a day between these dates even though they’re both strictly the same date and time, they just belong to different time zones. But the problem is that regardless of how many hours Sydney is actually ahead of New York (it’s not always the same), you’ll always get 15. This is a problem.

The fly in the daylight saving ointment

Now that we’ve established what the datetimeoffset can do, let’s look at what it can’t do, namely cater for daylight saving time (DST). Let’s run the above scenario again but this time we’ll fast forward two months to 10 October. Here’s how those dates look:

SET @SydneyMidday = '2010-10-10 12:00:00 +10:00'
SET @NewYorkMidday = '2010-10-10 12:00:00 -5:00'

When we run the same date diff as before, we get 15 again but the problem here is that on October 10, Sydney has already gone into DST whilst New York remains in DST so the result is wrong by one hour. Here’s how the time zones look:

August September October November December
No DST No DST DST starts 2nd Oct DST DST
DST DST DST DST ends 9th Nov No DST

Because we wind our clocks forward one hour in Sydney on October 2nd we effectively become UTC +11 and the correct result from that last datediff statement should be 16. It was off by an hour.

For many people this may seem like a bit of an edge-case but take the situation where we want to calculate the hours in Sydney between, say, 20:00 on October 1st and 06:00 on October 2nd. The datetimeoffset approach will give us 10 but the correct answer is 9 because we lose an hour overnight.

Bottom line: You can’t trust the datetimeoffset data type for any sort of calculation which could involve DST as it simply doesn’t have the fidelity to know when it stops and starts at different locations even though we know the UTC offset. The problem is described quite well over on one of the MSDN blogs about Using time zone data in SQL Server 2008 where they end up importing time zone data into bespoke tables and manually calculating the differences.

The problem with daylight saving time

The problem is simply this: we don’t consistently know when it’s going to start and we don’t consistently know when it’s going to end. For example, next year – and in the foreseeable years – Sydney will start DST on the first Sunday in October. That’s an easy algorithm but it hasn’t always been that way. Until 2008 it would start on the last weekend in October. Want to make it really hard? In 2000 we started on August 27 in time for the Olympic games.

So what all of this means is:

  1. There is no stable algorithm that you can apply to calculate when DST started (or finished, for that matter).
  2. You don’t know what dates DST will take in the future beyond the presently accepted algorithm of the day – and that could change or an arbitrary date could be introduced.

In short, DST calculations are erratic, unpredictable and fraught with exceptions so it’s not the sort of data you really want to take responsibility for maintaining.

The problem with time zones

Time zones make the whole problem even worse because they’re unpredictable and unstable too! Take a look at the following from my Windows 7 time zone settings:

Windows 7 Time Zone Settings

The above image shows six different time zones for Australia, half of which fall bang on the same longitude and even have the same UTC offset. In fact quite clearly, our four most eastern states (and our landlocked capital), all line up pretty neatly under each other and would logically be the same time zone:

Australian states

So why do we have three separate time zones? Well firstly, Queensland doesn’t observe DST so they need their own (it’s “Brisbane” in the previous list). Then we have Tasmania which until 2008 had their own start date (albeit the same end date as New South Wales), so they get their own entry under “Hobart” and then we have New South Wales, Victoria and Canberra all grouped together because those guys get on well enough to agree on DST.

But we don’t have a monopoly on odd DST behaviour down under, take a look at this one:

The "Indiana (East)" time zone

If – like me – Indiana over in the US of A is a bit of a foreign place to you, here’s how it looks:

Indiana in the USA showing a split time zone

So here we have a case where one state – and a relatively small one at that – is broken down into separate time zones. The change means the western part observes the time zone known as “Central Time (US & Canada)” whilst the eastern part observes “Eastern Time (US & Canada)”. So where is “Indiana (East)”? It was effectively deprecated back in 2006 yet its memory clearly remains by virtue of the time zone still existing on our PCs.

The role of the Windows registry

All the bits and pieces we saw above in the time zone section are driven out of the Windows registry under the “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones” key. In fact if you take a look in the Registry Editor, you’ll see them all sitting down there:

The Windows registry time zones

Here’s the interesting bit: see how some time zones have a little expando icon next to them? Take a look inside the “AUS Eastern Standard Time” entry:

Historical data in the Windows registry

Here you can see there are a couple of binary entries with names relating to years. This is what Microsoft is referring to as “Dynamic Daylight Saving Time (Dynamic DST)” in that it is able to store the effective dates of DST over time. Based on what you’ve just read above, this is obviously pretty important as the only way we can accurately track DST is by having historical data about the date changes. However, the catch is that the data only goes back a few years (more on that shortly).

Here’s the really important bit: this data is all maintained via cumulative updates from Microsoft – just take a look at the rate of change in that link for something you probably thought was pretty static. Note also that the dates the cumulative updates were issued only go back to August 08 which means there are some pretty big historical gaps. The Sydney example from earlier on where we moved the DST from the end of October to the start is missing so obviously this just missed the cut.

But the bottom line is that so long as you’re actively patching your server or taking them via Windows Update for your desktop, you’ll be kept up to date when governments make screwy decisions in the future. The beauty of this is that all you need to do is keep doing what you’ve (hopefully) always been doing your machine will simply know what’s going on in the world of DST and time zones.

Working with time zones in .NET

Now that you understand the idiosyncrasies around DST and time zones, let’s get back to code. Back in .NET 3.5 we got our hands on the TimeZoneInfo class which gives us a bunch of neat features which leverage the time zone data stored in the registry. Because of this, we can write DST aware code such as what you see below. What we’re going to do here is create two dates in the Sydney time zone – one before DST began last year and one after – and then convert them back to UTC. The one before should show a 10 hour offset from UTC and the one after an 11 hour offset.

const string ausEst = "AUS Eastern Standard Time";
var estZone = TimeZoneInfo.FindSystemTimeZoneById(ausEst);

// Midday the day before DST begins
var preDst = new DateTime(2010, 10, 2, 12, 00, 00);
      
// Midday the day DST has already begun
var postDst = new DateTime(2010, 10, 3, 12, 00, 00);

var preDstUtc = TimeZoneInfo.ConvertTimeToUtc(preDst, estZone);
var postDstUtc = TimeZoneInfo.ConvertTimeToUtc(postDst, estZone);

Console.WriteLine("The pre-DST time is {0} UTC.", preDstUtc);
Console.WriteLine("The post-DST time is {0} UTC.", postDstUtc);

And we get the following (keep in mind my localisation settings will show dd-mm-yyyy):

.NET being aware of DST in 2010

Take a look at those time zones – exactly what we’d expect! Remember also that UTC has no DST so it’s a constant value regardless of the time of year. But what if we try rolling back a few years to 2008? Remember, this was when DST started at the end of October instead of at the start (the 26th was the rollover date), but it wasn’t captured in the cumulative updates:

.NET not being aware of DST in 2008

This is clearly wrong; back in 2008, October 26 was the first day of DST. And we can see why it’s wrong if we try applying the test to the first Sunday of the month:

.NET applying the wrong logic to DST in 2008

See the problem? The registry doesn’t have the information about how DST was previously calculated so has defaulted to the current mechanism being the first Sunday of October.

Just to show that the registry is aware of now defunct DST settings, back in late 06 through early 09, our western most state (Western Australia), had a play with DST then decided to ditch it. The dates shuffled a bit but when they kicked off in 08 it was on the last Sunday of October (the 26th). Here’s what we see when running the above script against the “W. Australia Standard Time” time zone:

.NET being aware of DST in Western Australia in 2008

Western Australia is +8 UTC so the times appear later but the important thing is that the registry is aware of the oddball case they had back a few years ago. Post 2008 it doesn’t recognise any DST over in the west.

SQL 08’s role in all this

Unfortunately SQL 08 simply doesn't have the ability to natively store either a time zone (note – not a static offset – an actual time zone that’s DST aware) or convert between different time zones, including UTC (again – actual DST aware time zones). So what do people do?

There’s always the angle of pumping all this data out into a couple of tables in which can then make it readily available in SQL Server. Of course the problem is you’ve got to then maintain this and as we saw from the cumulative updates page, the data changes pretty frequently. It would be reasonably manageable with a small number of time zones but if you want to be “globally ready”, things are going to start getting a bit trickier.

The other problem is that you still don’t have any native way of converting between time zones. What you end up doing is what was done in the previous link and manually joining up data and assumedly doing your own conversions at some point.

But you could always get a bit more fancy and leverage the .NET CLR as was suggested to me today by the author of Expert SQL Server 2005 Development:

Recommendation to use the CLR in SQL by Adam Machanic

This comment in particular says to me there’s an opportunity going wanting in SQL Server at the moment. Converting between time zones just seems like such a fundamental function to perform against data, even despite the idiosyncrasies of their nature.

Applying the UTC common denominator

Let’s get to the pointy end of the discussion; all of the above simply leads us down to two issues to solve when we have a .NET layer to do some of the heavy lifting:

  1. We need a common denominator to store all datetime2 values in. Storing them in local time causes a bunch of other challenges because they’re all on different time zones.
  2. We need a means of converting from and to a local time, to and from UTC. There are times where we still need to receive or show dates and times in local time.

The first item is easy – store UTC dates in a datetime2 field. UTC is the one common denominator on which we base all our other time zones (i.e. UTC –5, UTC +10, etc.) and products like SQL Server and .NET are very well aware of it through functions such as GETUTCDATE and DateTime.UtcNow respectively. But one thing we can do in .NET which we can’t natively do in SQL Server is convert the data backwards and forwards between the local time at a location, not just a time zone offset and we’ve now seen this in action:

var preDstUtc = TimeZoneInfo.ConvertTimeToUtc(preDst, estZone);

Of course it’s just as easy going back the other way too:

var preDstLocal = TimeZoneInfo.ConvertTimeFromUtc(preDstUtc, estZone);

So after a very long preamble, the actual work to be done is very easy, I just wanted to illustrate the importance of the native time zone functionality in .NET given the complexity of what it’s actually working with.

Use cases

Now that we can easily go backwards and forwards between UTC and other time zones, there are a bunch of different use cases where we need to handle dates. But we don’t always need to apply time zone conversions; here are the scenarios that immediately come to mind and remember, the idea is to persist the data in UTC and do any conversions back on the .NET layer:

  1. Time stamps: Practices such as automatically stamping a create date or a modified date can just immediately grab the current UTC time either via .NET or SQL and store it directly in the DB.
  2. User entered dates: Dates that are just that – a date – can go directly into a SQL 08 date type field. For example, a birthdate or an annual leave day have no time component so there’s not usually a need to do any conversions.
  3. User entered dates with time components: In a scenario where, for example, a user is scheduling an event to happen at a particular time, they’re going to logically be working in their local time zone so conversion to UTC needs to happen before sending it to the data layer.
  4. Displaying dates with time components: Where data from the previous points needs to be retrieved and presented back to the user, the conversion from UTC to local time needs to happen.

Of course “local” can also be a bit variable and depending on the application, it may be system wide, just apply to a sub-component of the system (i.e. just the “Australia” instance), or be user-specific (i.e. configuring your time zone in a forum so you see everything in local time).

Ultimately, the litmus test is simply this: can you take the app and put it on a different server in a different location running a different time zone and everything still works identically? If not, you’ve missed a conversion somewhere and you’re going to be locked in to the current model in terms of both hosting and audience.

Edge cases

There are a few problems with the approach outlined above – it’s certainly not perfect:

  1. Incomplete time zone data: As we saw with Sydney in 08, the DST date is out by three weeks so converting a UTC date to local during that period (or vice versa if there was a need), will lead to an incorrect time off by one hour.
  2. Unknown future changes: If someone enters a local date and time for the middle of October in five years from now then we convert it to UTC based on the current logic, it could ultimately be proven wrong and there’s nothing we could do about it.
  3. The need for local time on the data layer: Of course none of this helps if you actually need local time on the data layer. Perhaps there’s some reporting happening, perhaps the data is being passed off to a BI layer but either way, you’re stuck with UTC.

The edge cases can be mitigated by storing local user time in the database but then you’ve got all the problems associated with a vast amount of data spread across time zones and you’ve got a real problem if someone wants to actually change their time zone. Certainly there are cases where this is a better balance, but it’s also a bit of a future-proofing anti-pattern.

Summary

The frustration of this post is that there simply isn’t a clean, fool proof way to store dates in a global system without the risk of time zone conflict. But on balance, most requirements will happily accept the idiosyncrasies of the registry not being quite up to date on some historical data. After all, a one hour conversion error against a small window of time which occurs infrequently isn’t too bad.

Tags:

comments powered by Disqus

Leaving comments is awesome, please do. All I ask is that you be nice and if in doubt, read Comments on troyhunt.com for guidance.