I was checking the NewRelic stats on “Have I been pwned?” the other day (you do have the free NewRelic service on your Azure websites, right?!) and I came across a most unsavoury blip on my otherwise very impressive (if I do say so myself) app response time graph:
Nooo – my beautiful stats! Suddenly I’ve gone from small millisecond numbers to this massive 40 second blight on my graph – what gives?! My problem, of course, is that I did a deployment. Just a tiny deployment with some text changes to a single .cshtml file but it meant that my web transactions ended up looking like this:
Kudu, for the uninitiated, is the secret sauce that enables Azure to deploy so efficiently from GitHub and it just so happens that it spent an awful lot of time working its magic on this very small change. Now keep in mind that the chart above is the total duration of those process across the report period so when you’re seeing just over 9 seconds on NotifyMe.Post, that’s a lot of separate calls to that resource. So how many calls were made to Kudu? Let’s check:
Let me go slightly off topic for a sec – this data is awesome. The ability to monitor this information in real time and at such a granular level totally rocks and goes a long way to solving elusive performance problems. I mentioned you can get it for free on Azure websites, didn’t I?
Moving on, what the image above tells us is that there was only one call to the Kudu service (see the “Avg calls” columns) so we can rule out the process being hit multiple times. No, something made that Kudu.Services.FetchHandler.ProcessRequest step hang for a hell of a long time.
So what went wrong? GitHub slow to respond? Network misbehaving? Azure have an internal problem? Who knows, but what I do know is that I don’t particularly want Kudu messing up my graphs with a process totally outside my control. I’ve looked at my app response time on many occasions in the past and seen Kudu stealing the limelight when what I really want to see is how the site is performing after that change I just pushed. We need the sucker gone from NewRelic’s reports.
And we can’t do that. This was originally going to be one of those “Here’s something that annoyed me and how I fixed it” posts, it has now unfortunately becomes one of those “Here’s something that annoyed me and there’s nothing I can do about it” posts. I tried looking for a configuration to ignore this transaction in the .NET agent but whilst you can ignore things like particular error codes, you can’t do anything about request path. Likewise in the NewRelic portal – you can’t just say “Hey, ignore anything that pops up in this path / transaction / namespace / etc”.
The only option that came even close was the ability to call IgnoreApdex inside the process being called, but not only is that very intelligent, I can’t drop that within the Kudu service running on the platform. This is a process outside the bounds of my control so what I really need to do is ignore it at either the data collection level or at the point it’s reported on. This just wasn’t going to cut it.
So I took to the Twitters:
Nothing back of substance, unfortunately, although if you read through the responses you’ll see a NewRelic engineer getting involved and a couple of follow-ups on my part but then the trail goes cold.
So there we are with a very inconclusive result. To my mind it would make sense to make this configurable within the config file but then again, at least in this case it’s a pretty minor blight. Do I really care if my stats get skewed every time I deploy to Azure? Not much, but I can also envisage use cases where processes consistently throw out stats and the issue becomes more pressing. Of course if anything changes or I get some feedback on how to resolve this, I’ll update the post for those who come later.
All that said, definitely go and get yourself some NewRelic, it totally rocks and to be able to pull it into your Azure website for free poses a damn good value proposition!