Death Sheep from Hell (fenton) wrote,
Death Sheep from Hell
fenton

  • Mood:
  • Music:

Bus, how do I hate thee? Let me count the ways...

So. I currently write software for Qwest IT, the in-house software development company for Qwest-the-holding-company. Particularly, I'm working on a project which involves writing a sort of "one stop shopping network" virtual database - a place where you can go to get all your data needs, if you're writing customer-facing software (like, say, the engine that drives the Qwest.com website), and not have to worry about what system it *really* lives on.

This system, obviously, has to communicate with the other systems around it that want to ask it questions, and which it has to ask questions of. This communication takes place over "the Bus", which is basically the equivalent of a giant old-style "party line" where everyone can talk all at once (though it at least allows you to narrow down what you want to listen to). The key point here is that the application has to build a connection to the Bus in order to get anything whatsoever useful done.

We have old, crappy software for talking to the Bus; if it drops the connection for some reason, it can't build a new one - you have to restart the entire application. Doing so with this same crappy software can take up to half an hour. Fixing this is underway (in fact, I'm the person primarily responsible for getting this completed); the new stuff comes up in more like a minute, and can (in theory) cope with needing to rebuild a connection when it vanishes. This is important, because somewhere between twice a month and twice a week, we do get disconnected from the Bus, and the on-call person (which rotates by week) has to log in and kick the thing to make it reset and talk to the Bus again.

The team responsible for the Bus patched it last weekend, to address this problem. Since then, we have been getting disconnected something like a dozen times a day. Now, let's do the math; at 30 minutes per cycle, 12 times a day, that's 6 hours out of 24 that we're not able to talk to other applications. Applications that are, you know, sort of critical to being able to do little things like sell products to customers. And, of course, the on-call pager goes off for each of those dozen times, and that person has to log in and beat on the system.

As of last Monday morning, that on-call person was me. I forget how many incidents I had Monday evening, and I think I've lost track of some last night, but I know it was at least four, and possibly up to six, occurances. Many of them at times like 2 AM, when I am, in theory, asleep. Or wish I was, the past two nights.

My director, bless his non-pointy-haired head, pointed out to my manager that when the pager is going off quite this often, it really is sort of necessary to rotate it more often than weekly, and spell the folks who are otherwise going to quickly go psychotic dealing with being unable to sleep. (Note: my manager isn't actually all that bad; he is, in fact, a personal friend of the entire household. He was just being dense and, as far as I can tell, thinking "Well, Joel's got it covered.")

Oh, and ysabel, who just so happens to work in the same group with me, was having trouble sleeping last night, so *she* also got woken up every damn time.

Anyway. Off to bed, for a night of blissfully pager-less sleep.

Subscribe
  • Post a new comment

    Error

    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments