Thursday, July 21, 2011

No rest for the Wicked

Oh my Gods. And I thought I didn't have enough time before. There literally aren't enough hours in the day. And my attempts to create extra hours (mostly squeezed in just after midnight) backfired rather badly.

You see, we went live last week. The space shuttle launched, and then so did we. ("the last shuttle is away... aw.") And I didn't think we'd told anyone about it, but somehow we've already got two customers. Two paying members! How did they even find out?

There's two people running around inside my application and I don't know where they are. Well, I do. Sort of. There's logs. But I miss being able to just lean over their shoulder and ask "why are you doing that?". Analytics is nice, but not enough. I want screenshots.

What's terrifying is how many things are still broken. Well, I say 'broken'... I mean they work, but not all the time. From the end user perspective this means occasionally having to press 'delete' twice, or unjam a queue by restarting a server, which I'm told isn't that big a deal.

For me, however, every system jam is a failure in my logic. I'm just Vulcan enough that it personally annoys every damn time, deep down.

My system must be a perfect system, with all bugs and errors banished to the land of wind and ghosts.

In practice, there's a point of diminishing returns. No matter how good I make my code, I'm still subject to the workings of the internet, various phone companies, and often Microsoft. So long as my error rate is lower than that lot, I'm doing well.

And there's also a point beyond which you make things worse. You have one line of code that does something, and it occasionally fails. So you write another line of code to catch the exception and perform the fallback... but that could fail too. So you add another line. Soon 90% of your code is exception handlers which almost never occur, (and then all at once!) and so are full of bugs due to lack of testing. And the module doesn't fail cleanly anymore, it always half succeeds, leading to some of the most subtle and bizzare bugs I've ever seen.

Fail fast, fail hard. These are actually the cornerstones of a reliable system, because you can always retry. Just keep pounding 'reload' until it works, and trust in Idempotency.

No comments:

Post a Comment