Monday, 19 July 2010

Rocking your SQL Source Control world with Red Gate

Monday, 19 July 2010

I knew it was going to be good before even seeing it. After all, SQL Source Control is from the guys who brought us SQL Compare and Data compare, two of my all-time favourite tools in the “stuff that would be a real pain to do without” category. They’re tools I tend to berate developers for not having and have regularly waxed lyrical about in the past, albeit it within 140 characters; until now.

Versioning database objects very much fits into the same realm in that it’s a nightmare to do without a dedicated tool. The simplicity with which we version other applications files – HTML, images, classes, etc – hasn’t been readily achievable in the database world. Sure, there are various mechanisms out there to script objects out into the file system and version those but it’s a real pain to actually synchronise back into other environments.

To make database versioning practical it needs to integrate seamlessly into the development process, that is it needs to align to the tools and practices developers use. In the database world that means it has to play nice with SQL Server Management Studio (henceforth known only as SSMS), just as tools like VisualSVN or AnkSVN play nice with Visual Studio.

Given my previous positive experiences with Red Gate products I thought I’d write this blog a little differently. Rather than try and learn it inside out then come across as all knowledgeable, I‘m going to write about the process as I experience it for the first time. I think it will give a little more real world context to how other people approach the tool, let’s see how it pans out.

Full disclosure

Uh, there isn’t any. I wasn’t prompted or incentivised by anyone to write this. Red Gate just makes damn good products worth sharing :)

Getting set up

The first run of SSMS (I’m using 08 but SQL Source Control is also compatible with 05), after installation gives you some info about the current state of your versioned database plus a bit of “DB versioning 101” explaining the two different models developers typically work in:

image

Selecting a database in the Object Explorer will clear the red warning text and replace it with the DB name whilst selecting the “Create new link to source control…” link prompts the following dialogue:

image

It’s going to be Subversion all the way today so I’ve plugged in the trunk path of my repository which I’ve created in a local version of VisualSVN Server. I’ve already checked in an empty ASP.NET web app to the root and here’s where I hit the first snag:

image

Let’s try that again after creating a dedicated “Data” folder in the trunk and we get quite a different result:

image

With the database now associated to a path in source control we can begin actually committing some objects:

image

What has actually happened?

imageAfter running the process above, I took a look back into the repository. Sure enough there were a bunch of new folders in the “Data” path and a new commit message stating:

Database project file added by Red Gate SQL Source Control

Doing an update on the working directory pulled down all the folders you’ll see to the right. It’s an extensive list but I think it’s worth sharing because quite frankly, I want to know what’s going on in my repository and so should you. It also gives you a pretty clear idea of the sort of database objects the tool can version. It’s an impressive list!

The process also created a RedGate.scc file in the root of Data folder. At this stage it’s zero bytes but I’m guessing we’ll see the flesh out a little as we start creating some objects in the DB.

One thing I would have liked to have seen is some confirmation that a commit had actually been made. From a communication perspective, knowing the current revision number is useful information.

I also would have preferred to have some control over what the commit message said. Yes, it’s descriptive and well-formed and all those nice source-controlly sort of things but it’s also pretty vanilla. Adding multiple databases would result in multiple identical commit messages which is a bit too generic for my liking.

Finally – and this is admittedly a bit nit-picky – the last screen grab above talks about “To get the database into source control, commit the objects” whereas in reality the commit has already been made (albeit only a folder structure). A little more clarity would be preferable.

To be honest though, these are really minor grievances and in no way diminish the value of the product. They’re just little things which I think could be tidied up in later versions (hey, this is only v1!).

Maintaining a healthy relationship

One of the strengths of the product is that there are no server dependencies. Get yourself a plain old SQL database and a plain old SVN repository and you’re good to go. But how is the relationship retained? How does the client know of – and persist – the association?

I took a look through the “C:\Users\[my username]\AppData\Local\Red Gate\SQL Source Control 1” folder and came across LinkedDatabases.xml. Here’s what’s inside:

<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<LinkedDatabaseStore version="1" type="LinkedDatabaseStore">
  <LinkedDatabaseList type="LinkedDatabaseList" version="2">
    <value version="3" type="LinkedDatabase">
      <DatabaseId version="2" type="DatabaseId">
        <ServerAndInstanceName>(local)</ServerAndInstanceName>
        <DatabaseName>SqlSourceControlTest</DatabaseName>
      </DatabaseId>
      <ISrcCLocation version="1" type="SvnLocation">
        <RepositoryUrl>http://localhost:8080/svn/SqlSourceControlTest
/trunk/Data/</RepositoryUrl>
      </ISrcCLocation>
      <IWorkspaceId version="1" type="SvnWorkspaceId">
        <RootPath>C:\Users\[my username]\AppData\Local\Red Gate\SQL Source
Control 1\WorkingBases\xhptrc3o.gih</RootPath>
      </IWorkspaceId>
    </value>
  </LinkedDatabaseList>
</LinkedDatabaseStore>

imageSo we have the database server name in ServerAndInstanceName then the database name itself in DatabaseName followed up by the SVN repository path in RepositoryUrl.

Browsing through to the path defined in the RootPath node - it’s a folder named xhptrc3o.gih, not a file - shows something intriguing; each of the objects we saw represented by a folder in SVN earlier on is contained in this path (see image to the right). So obviously Red Gate is simply outputting each DB change to the local WorkingBase folder, treating this as the working directory then syncing from there. Looking at the folder properties on xhptrc3o.gih shows all the usual repository information in the TortoiseSVN tab.

There’s your association done and dusted, albeit it in your local app data so a new machine means all the associations need to be configured again. Likewise, a new developer on a project will also need to create their own associations.

Of course if you’re to have a low-friction, server installation free product then you’re going to have to persist the relationships in the client anyway. Consider it part of the setup just as checking out the app and configuring it locally is.

Cleaning up the trunk

The problem we have now is that I created the repository and added my Visual Studio solution to the trunk, then also added the Data folder to the trunk. Once I updated my working directory on the file system from the trunk I got the entire database set of folders you see above.

We don’t need this on the file system because it serves absolutely no purpose to us there. I ended up just creating an “App” folder in the trunk and moving the Visual Studio .sln and project folder into there via the TortoiseSVN Repository Browser, deleting the Data folder from the working directory then doing a “switch”.

Creating some objects

image Back in SSMS, the database we linked to source control now goes from the familiar yellow to green, indicating at a glance which DBs have been linked. Beyond this one small change, there is no other indication on any of the other nodes beneath this (tables, procedures, etc).

imageTo get us started, I’ve created a basic “Customer” table and then a couple of stored procedures in the typical get-all and get-single pattern. As soon as this is done we start getting visual indications as to pending changes (see image on the right).

Firstly, the DB itself now gets a little blue circle as do the “Tables” and “Programmability” nodes beneath it and the nested “Stored Procedures” node. This is obviously the equivalent of the red circle and exclamation mark TortoiseSVN will show you for pending changes on the file system.

Heading back to the SQL Source Control tab, let’s take a look at the “Commit Changes” tab beneath that:

image [ click to enlarge ]

Right, so now we’re seeing all the objects we just created with a “New” icon next to them and a diff window at the bottom. Being new objects, there’s obviously nothing to diff them against so the bottom right panel stays empty. This is where we also start to see some of the Red Gate genetics come through; I’ve seen this window in SQL Compare many times before!

Towards the top of the window we have a text box with a prompt for “Comment to add on commit”. Let’s leave something descriptive then hit the “Commit” button:

image

Giving this prompt the OK takes us back to the “Commit Changes” window which is, of course, now empty. The little blue circle on the green database icon (remember, the one which used to be yellow?), has now gone and we’re back to where we were before we started creating objects.

Switching back to the Repository Browser and looking at the log, we can see four files have been committed:

image

The three .sql files are pretty obvious; they simply contain the objects we just created and committed via SSMS. The odd one out though is the RedGateDatabaseInfo.xml file. Let’s take a look inside:

<?xml version="1.0" encoding="utf-16"?>
<DatabaseInformation>
  <ScriptFileEncoding>UTF8</ScriptFileEncoding>
  <DefaultCollation>Latin1_General_CI_AS</DefaultCollation>
  <DefaultSchema>dbo</DefaultSchema>
  <DefaultUser>dbo</DefaultUser>
  <DefaultFilegroup>PRIMARY</DefaultFilegroup>
  <DatabaseVersion>10</DatabaseVersion>
  <MaxDataFileSize>10485760</MaxDataFileSize>
  <WriteToFileOptions>
    <Prefixes>
      <None />
      <Table>Tables</Table>
      <StoredProcedure>Stored Procedures</StoredProcedure>
      <View>Views</View>
      <Default>Defaults</Default>
      <FullTextCatalog>Storage\Full Text Catalogs</FullTextCatalog>
      <Function>Functions</Function>
      <Role>Security\Roles</Role>
      <Rule>Rules</Rule>
      <User>Security\Users</User>
      <UserDefinedType>Types\User-defined Data Types</UserDefinedType>
      <Trigger />
      <DdlTrigger>Database Triggers</DdlTrigger>
      <Assembly>Assemblies</Assembly>
      <Synonym>Synonyms</Synonym>
      <XmlSchemaCollection>Types\XML Schema Collections</XmlSchemaCollection>
      <MessageType>Service Broker\Message Types</MessageType>
      <Contract>Service Broker\Contracts</Contract>
      <Queue>Service Broker\Queues</Queue>
      <Service>Service Broker\Services</Service>
      <Route>Service Broker\Routes</Route>
      <EventNotification>Service Broker\Event
Notifications</EventNotification>
      <PartitionScheme>Storage\Partition Schemes</PartitionScheme>
      <PartitionFunction>Storage\Partition Functions</PartitionFunction>
      <Field />
      <Index />
      <Schema>Security\Schemas</Schema>
      <ServiceBinding>Service Broker\Remote Service Bindings</ServiceBinding>
      <Certificate>Security\Certificates</Certificate>
      <SymmetricKey>Security\Symmetric Keys</SymmetricKey>
      <AsymmetricKey>Security\Asymmetric Keys</AsymmetricKey>
      <CheckConstraint />
      <FullTextStoplist>Storage\Full Text Stoplists</FullTextStoplist>
      <Data>Data</Data>
    </Prefixes>
    <DataWriteAllFilesInOneDirectory>true</DataWriteAllFilesInOneDirectory>
  </WriteToFileOptions>
  <DataFileSet>
    <Count>0</Count>
  </DataFileSet>
</DatabaseInformation>

Ok, it’s a little wordy but once again, I want to know what’s going into my repository. So it looks like we have a bunch of basic DB information such as the collation and version plus we have the WriteToFileOptions node which appears to reflect the names of the folders created in source control. I’m not sure how configurable this is meant to be but let’s keep an eye on subsequent commits and see if it changes at all.

One thing I was a little wary about when setting out to investigate this tool is product lock-in. I don’t want to get into a situation where because I’ve versioned my database with a particular tool I’m forever committed to using it, or worse still, anyone else that touches the project has to be committed to the same tool.

Fortunately this isn’t the case with SQL Source Control because every versioned object is simply a standalone SQL script. Sure, you need the software to generate and sync them but you don’t need anything special to read them and you could conceivably create and run them without the Red Gate tool. Yes, it would be clunky, but the point is that you retain your autonomy.

I’ll pause for a moment before moving on because this is a significant point in the process. We now have all the database objects we’ve created under source control. This, in itself, is a big step forward from where a lot of developers will be right now.

Changing objects

Moving on to changes, I’ve changed a data type in the Customer table and added a new column then navigated back to the SQL Source Control tab. This is pretty similar to the original “Commit Changes” window we saw above except now the change type is “Edit” and we’ve actually got a proper diff happening at the bottom of the screen:

image [ click to enlarge ]

Looking back at the Repository Browser again, there’s only the single file changed this time and a quick diff shows exactly the same results as we see above. It doesn’t get much easier than that!

Playing nice with others

So far the experience has been pretty positive and just the fact that it’s so easy to get the database under source control is a very positive thing indeed. But of course a lot of the value of source control is in the value it brings to collaboration. Being able to regularly share code changes with a broader team is core to the value proposition of any source control management system.

To demonstrate this, I’ve created a second database called SqlSourceControlTest_Developer2. I’ve gone through the same process as the first database in terms of selecting it in the Object Explorer then entering the path of the Data folder in Subversion. This time though, the result is a little different:

image

Obviously it’s figured out we already have a database under source control in this location. The window title “Cannot Create A New Link” is possibly a bit misleading but the last sentence tells the real story. After I clicked OK I did go back and check the SVN log just to be sure nothing had changed but we were still at the last revision number from the change commit earlier on so all good on that front.

Onto new things, let’s take a look at the “Get Latest” tab:

image [ click to enlarge ]

Predictably, the three objects we originally created and committed to source control are all sitting down there as “New” change types. Let’s “Get Latest”:

image

imageCool, very straight forward. Expanding the database in the Object Explorer expectedly shows that everything we created earlier in the first database now exists in the second database.

This is another good place to pause for a moment. Many of the source control management practices out there involve scripting objects - sometimes automatically - but then usually running them manually against the target database. In this regard, the ability to apply the full change set from source control at the click of a button is a very powerful thing.

Thinking back again to what Red Gate do well, this is really just a SQL Compare process with the Subversion repository as the source and the linked database as the target followed up by a “Synchronise” process. Neat.

Dealing with conflict

Another common scenario for source control to deal with is conflicts. What happens when the same code is changed by different parties in a non-merge friendly fashion? Let’s see how SQL Source Control deals with it.

Firstly, I’ve changed the Postcode column data type in the Developer2 database to an int. I’m going to commit it using the right-click short cut on the database just to illustrate something a little different (and probably more familiar to regular users of SVN and Visual Studio):

image

This brings us back around to the familiar “Commit Changes” tab. I’ve just left a brief message and then pushed it through to source control.

Now let’s go back to the original database and change the same column but this time we’ll make the data type a varchar(10). After doing this I’ve loaded the SQL Source Control tab back up and we’ve got something a little new:

image [ click to enlarge ]

What I find interesting about this is that unlike TortoiseSVN or many of the other SVN clients, we’re getting the conflict message before we attempt to commit. The only way this can happen is if the client connects back to the repository and loads the current state before giving you the opportunity to commit. As a result, I’m regularly seeing this when switching between source control tabs:

image

SQL Source Control does appear to be a little eager in going back to the repository. Immediately flicking between the “Commit Changes” and “Get Latest” tabs fires this message up on every click. It’s a brief delay – only a couple of seconds – but that’s to a locally hosted SVN instance. I suspect it’s just comparing revision numbers rather than actually pulling down every object but it will be interesting to see how this behaves over the WAN.

Getting back to the issue at hand, behind the “Resolving conflicts” window above we have the option to either “Keep mine” or “Take theirs”. There is no merge available. This is important to understand so here’s a repeat of the last paragraph in the window above:

Note that merging is not currently supported.

To merge changes from two conflicting versions, choose to keep or take one version, and then manually modify it to include the changes from the other version. This is simpler if you copy the other object's creation script from the Object Differences pane.

It’s a little bit clunky but again, it’s reminiscent of the native behaviour of SQL Compare and to be honest, I’ve never had the need to merge with that and I’ve run it hundreds of times in the past. Having said that, the message above does appear to be leaving the door open for future enhancements and this is version 1 so there may well be something a little fancier in the pipeline.

In this case I decided to “Take theirs” which greyed out my local version on the left of screen:

image

After this I still needed to switch back to the “Get latest” tab but after doing this and taking the update the column was successfully changed to an int so obviously everything ran just fine.

To be a little bit antagonistic, I changed the Suburb data type to nvarchar(240) then loaded up the “Commit Changes” tab and entered a message but did not send the change to SVN. I then checked out the Data folder to the local file system and changed the data type to nvarchar(260), committed it then flicked back to SSMS and tried to commit, forcing a conflict on the server side:

image

That’s a pretty reasonable response!

Help! Get me out of here (aka “revert”)

One of the great things about source control is that it allows you to explore taking your code in a radical direction with the confidence that if you really, really screw it up you can always revert back to a known good copy. SQL Source Control supports this functionality directly within the context of a changed object:

image

This action presents a list of changed objects beneath the current node (there’s only one in this case because I selected the sproc directly), after which the rollback is swift:

image

Doesn’t get much easier than that! Of course you do need to remember that we’re versioning objects and not content so the undo is not going to save you if you delete all your records.

Semi-atomic commits

One of the great features of Subversion is the enforcement of atomic commits. Change a dozen files, try to commit and if any one file can’t go all the way through to the repository, the commit fails and the transaction rolls back. The beauty of this model for developers is that it means the repository shouldn’t get into a state which could potentially “break the build” because of an incomplete commit.

The problem we’ve got when attempting to holistically version an application directly from the IDEs is that the data layer is being committed autonomously from the app layer. Assuming this happens in sequence (perhaps from SSMS then Visual Studio), there is a window where someone else may pull down the data layer only and if that includes a dropped column the app layer has a dependency on, things start to break.

In this regard, commits do become kind of semi-atomic. Sure, the actual SVN transaction is atomic but the fact you need two of them to holistically commit the application does leave a window where problems can occur.

This actually got me thinking; who’s the product pitched at? Is it application developers who have traditionally treated the database as second class source control citizen? Or is it the database specialist who has typically not versioned their work at all? I wonder this simply because the practices and dependencies are a little different and it’s interesting to think of how their working styles might be unified.

There is never a reason to use source control to version your data

This will be painfully obvious to most people, but I’ve seen it done before, and more than once too. Source control management exists to version, um, source code, it’s not there to version content and backing up .mdf, .ldf or backup files is never ok.

The list of problems this creates is endless; continually changing versions with no functional change resulting in endless pointless updates, huge redundancy from binary files which can’t be delta-ed, inability to diff between versions and so on and so forth.

This is the realm of disaster recovery and retention strategies, not of source control management. Don’t do it!

What’s missing

There are a few features missing which you might be used to with other SVN clients. For example, there’s no way to view the commit history of the repository, you’ll need to use another client like TortoiseSVN for that. By extension, it also means you’ve got to get out of SSMS to see what objects were changed in a particular commit, what the log message was or to perform a blame on an object.

There’s also no “Update to revision” so if you want to roll back to a previous version (beyond the scope of just a revert), you’ll need look up the previous version in another clients then copy and paste the content back over in SSMS. There’s also no spell check on your commit messages so there’s a bit more margin for error there.

Perhaps these are not entirely fair comparisons because TortoiseSVN is a pretty richly featured client intended for a broad range of purposes (at least in terms of target technologies), whilst SQL Source Control is obviously a very targeted tool. Regardless, the bottom line is that you’ll probably end up using the both of them for your database versioning.

Gotchas

There are a few use cases specific to databases which I can see posing some challenges to developers. Keep in mind though these aren’t so much SQL Source Control as a product problems, rather they’re issues related to the nature of versioning database objects.

The big difference to versioning files directly on the file system is that the success of object changes can depend on the underlying data. Adding a unique constraint, for example, won’t be successfully received if the guy doing the update has data in their local copy which breaks that constraint. For all intents and purposes, this activity again “breaks the build”, as the change can’t be run. Ok, it’s not breaking the compilation of a Visual Studio project in the physical sense, but it may mean that the project won’t run because a dependency is incomplete.

Another problem is that it’s easy to inadvertently drop tables containing data which was not intended to be lost. Say both Developer A and Developer B have an “Orders” table in their local working copy, complete with a bunch of data. If Developer A drops the table and commits the change then Developer B blindly takes the update (it could be mixed in with dozens of other changes), his Orders table and any data in it is going to be dropped. I wonder, in this case, if SQL Source Control couldn’t give a little warning message first. Sure, development databases should only contain development data but I can still envisage scenarios where this could cause some pain.

There’s also still a gap in so far as data required to establish the application in the first place goes. For example, an application requiring reference data to run is still going to need to have it separately scripted and stored. This is not a problem SQL Source Control has set out to address but still, it might be a good feature for the future.

Wrap-up

I had high hopes for this product and on the whole I’m pretty impressed. The Red Gate lineage is clear – which is a very good thing – and the tool is super easy to use. There are a few areas I think could do with a touch up in future releases but they’re pretty inconsequential in the entire scheme of things.

To be honest, the real story here is that database source control management is made just as easy as versioning the other layers of the app. Just the act of having the database versioned is nine tenths of the battle and would be a huge step forward for many developers. Certainly this could well form an important component in my work and kudos should go to Red Gate for making the whole thing so easy!

comments powered by Disqus

Leaving comments is awesome, please do. All I ask is that you be nice and if in doubt, read Comments on troyhunt.com for guidance.