Troy Hunt: Azure WebJobs are awesome and you should start using them right now!

These real world experiences with Azure are now available in the Pluralsight course "Modernizing Your Websites with Azure Platform as a Service"
Pluralsight Logo

No really, they’re totally awesome! I used Azure WebJobs in the very early days and whilst they served a purpose, I wasn’t blown away with them at the time. In fact I went on to use Worker Roles for the back end paste processing behind Have I been pwned? (HIBP) and whilst they served me well, there were also aspects that weren’t as slick as what the broader Azure ecosystem is.

Recently I had cause to build another back end process for HIBP (one I’ll talk more about in detail in a later post) so I thought I’d come back and visit WebJobs again. The extent of what I was able to do, the ease with which it all happened the time it took just totally blew me away. There were a few things in particular though that really struck me while building out this new feature using WebJobs and I wanted to capture and share those here.

What I ended up deciding to do is to rebuild a part of HIBP using a WebJob, namely the part that looks for new pastes in a queue then goes and retrieves them from Pastebin and sends out notification emails to those impacted. Converting this from a Worker Role really highlighted where WebJobs shine.

1. They’re free

No really, they don’t cost a cent. So long as you have an Azure website to throw them into, you can keep rolling out WebJobs to your heart’s content and never see it hit your bottom line. Now of course they will consume resources on your website and if you make enough of them that work hard enough you’re going to need to have some discussions about scale, but most of the time websites on even the smallest scale infrastructure are sitting around with single digit CPU utilisation figures, at least they should be if they’re well-optimised.

A Worker Role (or a Web Role, for that matter), comes with a cost and while it’s not much for what it is, you’re still getting an entire virtual machine there and that’s going to cost dollars. They can be great when you want grunt, isolation and no dependencies on a website, but none of those things mattered for me while cost really mattered.

2. They work automagically with message queues

In my old Worker Role implementation, I wrote all the plumbing to retrieve a message from the queue, delete it if everything processed successfully and then have a little nap before repeating. It was all bundled up inside my worker role and constituted a fair whack of code plus error handling and logging.

Now, my code looks like this:

public static void ProcessQueueMessage([QueueTrigger("pasteurl")] string pasteUrl, TextWriter log)

This is using a continuously running WebJob with the method above fired off via a queue trigger. In other words, when a new message appears in the “pasteurl” queue it’s automatically passed to this method whereupon I can read that “pasteUrl” parameter (just the URL of the paste to process) and do whatever I’d like with it. After the process runs the message will be deleted from the queue. If the process fails, well, that’s the other neat thing…

3. They’re resilient to errors

Imagine this – you screw up and have an exception somewhere (it’s not just me that does this, right???) but all of this is happening in a background process that needs to somehow be “resilient” to this. What “resilient” means will differ case by case but right of the box with WebJobs you get a really neat feature called poison messages.

It works like this: your WebJob is invoked by a new message appearing in the queue. That message is sucked into the job and “something” goes wrong. Doesn’t matter what it is, it just goes wrong and an unhandled exception is raised. So it tries again. And again. And then twice more again. If it fails five times in a row (and this is configurable) then the message is deemed “poisoned” and automatically placed into a queue of the same name but appended with “-poison”. It looks like this:

Poisoned message queues

You can see two queues here (“callback” and “pasteurl”) and each has their own poison queue where failed messages go. Of course I still need to implement the logic to deal with the poisoned messages, but they’re now shoved off to the side and not continuously failing my WebJob forever and a day.

4. They deserialise your messages without you even asking!

The paste retriever web job is a simple one insofar as the message in the queue is nothing more than a URL passed as a string. How about this one though:

public static void ProcessQueueMessage([QueueTrigger("callback")] CallbackInstruction callbackInstruction, TextWriter log)

This is related to a request to implement callbacks when a breach or paste occurs and as you can see, it’s passing a “CallbackInstruction” type into the WebJob. This is a type I’ve defined so that it contains a whole bunch of info about what I want the WebJob to do and when I insert it into the queue via another process, it gets serialised into JSON. I can then pass that JSON into the WebJob via the queue trigger and it’s automatically deserialised for me!

This does great things for your app composition as you can start to break things down into smaller more discrete services and easily orchestrate different parts of your app via queue messages. Of course you could always manually deserialise things, but every instance where manual code is taken away is a very good thing IMHO.

5. They auto-deploy from GitHub with the website

Here’s how you deploy a WebJob:

GitHub for Windows Sync button

That is all. This is GitHub for Windows which is great for simple stuff and awesome for releasing a WebJob if you’re publishing from Git. It just gets automatically deployed with the website (refer to Get Started with the Azure WebJobs SDK for how it’s added to the site in Visual Studio). Of course like deploying a website, you can always publish direct from Visual Studio or employ other manual processes, but there’s nothing like auto-deployment from source control for speed, repeatability and downright ease of use.

6. You can easily track them via the management portal

It looks like this:

List of existing WebJobs

This sits directly inside the website in the management portal so it’s very easy to reach. You can then drill down into a WebJob:

Functions invoked in the WebJob

And then drill down even further into an individual execution of the WebJob:

Individual invocation of the WebJob

The little lighting bolt tells us that the WebJob was invoked due to the appearance of a new message in the “pasteurl” queue (we’ll see this symbol again a bit later on). The output at the bottom comes from log entries in the code:

log.WriteLine("Processing paste URL: {0}", pasteUrl);

So the bottom line is that you get all this output that helps enormously in monitoring the service and it’s automatically there in the portal. Oh – that includes full stack traces of unhandled exceptions too, naturally all secured within the portal. Neat.

7. Config is a breeze

Here’s what my App.config in the Visual Studio project looks like:

Connection strings in the app

And here’s what my config settings in the Azure web site looks like:

Connection strings in the Azure website

Because the job runs in the web site, all those connection strings and app settings are picked up from the place I already had them defined. I don’t need to put them under source control or manually apply them, it’s just set up correctly on deployment. Can’t get easier than that!

8. You can chuck as many as you want on a single website

This is another neat thing about WebJobs; you can just keep firing more into the solution:

Adding a WebJob to the solution

I’ve looked and so far I’ve not seen a practical limit for the number of WebJobs in a single site (do leave a comment if you have info to the contrary), so I’ve just continued to add them as required. Of course you may hit service scale limits at some point, but that’s another issue altogether.

9. They’re ridiculously easy to test

Testing background processes can get a bit painful when you have no direct user interface. Of course I could just dump stuff in the message queue, but this makes it even easier:

Test running the WebJob

This will take that URL, pass it into my WebJob and make the magic happen:

Log of the test run

You can see the “Ran from Dashboard” lightning bolt just below the success message so I know it’s been fired off manually. And that’s it – dead simple!

10. They’re parallelised and they auto scale-out with your site

One of the things about running a Worker Role is that it’s just that – a Worker Role. You can always scale it out to more instances, but of course you then pay for those. Alternatively, you can async or multi-thread your code, but that’s code you need to write. None of this is to beat up on Worker Roles because they have a valuable function, but ultimately they weren’t as good a fit for my purposes as WebJobs.

When you have a WebJob, you can do stuff like this:

Multiple simultaneous functions running

These all started running at the same time when I dumped 14 test URLs into the queue. They all ran simultaneously just like you’d expect, well, a website to do! This is because they’re triggered by the message queue which means this:

By default, the SDK gets a batch of 16 queue messages at a time and executes the function that processes them in parallel.

You can always tweak the limit and while you’re there, define how many times the message can dequeue before being deemed “poisoned” as well as how frequently the queue will be polled. As a sanity check, here are the times the WebJob inserted the records into the database which shows how the WebJob was running simultaneously on the one instance of a web site (remember they take ~20s each to run):

2015-01-27 19:45:31.67
2015-01-27 19:45:31.62
2015-01-27 19:45:15.37
2015-01-27 19:45:13.58
2015-01-27 19:45:13.54
2015-01-27 19:45:12.75
2015-01-27 19:45:12.35
2015-01-27 19:45:11.25
2015-01-27 19:45:09.77
2015-01-27 19:45:09.29
2015-01-27 19:45:09.27
2015-01-27 19:45:09.27
2015-01-27 19:45:09.26
2015-01-27 19:45:09.26

But wait, there’s more…

These guys may all start to chew up resources on the website and if you’ve got your autoscale configured correctly, if they chew up enough resources your website will scale out to more instances. So now you can scale out to a (default) maximum of 10 instances of the website processing 24 simultaneous queue messages each (I know, that’s 50% more than 16, it’ll pop another 16 off after processing half of them) or in other words, 240 simultaneous messages. Or you can up the 16 limits and, well, you get the idea.

Summary

WebJobs are awesome and they give you some really neat options around the composition of your app. Stuff like this makes it dead easy to, say, immediately return a response to the user in the browser and tell them “Hey, it’s underway” while the WebJob goes off and processes the order you just put in the queue or does other magic. Of course queues themselves are nothing new, but this does make things particularly easy.

Keep in mind also that WebJobs can be fired off by the appearance of a blob or on a schedule orchestrated by the Azure Scheduler which has a price starting at free. They can also be invoked by hitting an endpoint and if you’d like, you can even attach a debugger to them.

I’ll be using them a lot more in HIBP for all sorts of things, mainly long-running or background processes. The fact that you can churn them out so quickly, parallelise them, scale them out, monitor them easily and they’re right at my favourite price point – the free one – makes them totally awesome.

Have I Been Pwned Azure

Azure WebJobs are awesome and you should start using them right now!