For a while now, I’ve been putting off a task to configure a sync process for a particular piece of enterprise data. This data is populated into a single table in a production environment on a nightly basis but also needed to be synced down into the test and development environments every now and then. Without going into too much detail about the nature of the data, it consists of about 700,000 records which change either via updates or insertions. Normally I don’t like taking production data down into other environment (there are simply better ways), but the nature of this data called for keeping the environments in sync so the developers could do their job.
I’d been sitting on this requirement for a while as I wasn’t relishing what could become a time consuming, laborious task. This was a table with over 40 columns including lots of foreign keys (one of them self-referential), and a calculated field so as well as being careful not to create any referential integrity problems, I needed it to happen fast so as not to play havoc with a production environment. In my mind, it was going to mean manually writing some form of ETL either directly in TSQL or via SSIS or even going down a SQL replication route. It was possibly even a SQL MERGE task but these particular environments were still stuck on SQL 05 so that route was out.
One quick caveat: I’m not a DBA, I’m a developer who works with databases. There may have been other angles to come at this from, but the solution I arrived at is fast, simple and easy to monitor. The fact that it ended up being a 15 minute job on my weekend and I didn’t mind giving up a little of my valuable Sunday on this particular task was a very nice result!
