I don’t believe in intermittent issues.
Okay Jason, I’ll bite. Why not?
If you search enough there is always an explanation.
Jason runs the network team for a large telecommunication company. He’s earned his stripes in the trenches. He’s very good at what he does and what he knows.
In this case, he’s wrong.
I mean MOST of the time he’s completely accurate. IT people are always balancing getting the systems back online and searching for root cause of the outage. However, we don’t get paid to do research. Rebooting a server clears everything that’s in memory and reloads the programs from the hard disk. If your server is having an issue because something in memory is corrupt, often rebooting it will clear the problem and at the same time destroy your best evidence for figuring out what went wrong.
Sure, there are still log files and some other data bits, but often rebooting is an admission that “I’m not going to work on figuring out the issue any more. I’m going to try to get my system back up and running.”
And that is what IT engineers get paid to do. . .keep the system online. If you reboot and the problem goes away, but you never figured out what the problem was, you have no way of knowing if the problem will come back. But, customers can once again access your company’s webpage.
That’s what Jason was referring to. If you are willing to take the time and check enough variables you can explain every outage. . .But, not always.
Back when I was supporting WordPerfect Office 3.0, we talked to the programmer one time about how files got named. Every email was a separate file and had long names like
7F851B41
Well, we have an algorithm that runs on the client. It looks at the clock, message length and the information about the Post Office and Domain and then builds it from that.
So, if two people happened to send email at exactly the same time there’s a remote possibility we could end up with duplicate file names?
Sure. . .but you’d never replicate it in testing.
What separates a good engineer from a great engineer is that both of them are good at getting the system back online and keeping it running smoothly. The great engineer has the ability to think outside the box and spot root cause of a problem while still maintaining system integrity.
But they both still have to reboot the server on occasion.
Rodney M Bliss is an author, columnist and IT Consultant. He lives in Pleasant Grove, UT with his lovely wife and thirteen children.
Follow him on
Twitter (@rodneymbliss)
Facebook (www.facebook.com/rbliss)
LinkedIn (www.LinkedIn.com/in/rbliss)
or email him at rbliss at msn dot com