5 sysadmin horror stories

Image by:

Opensource.com

Happy System Administration Appreciation Day!

The job ain't easy. There are constantly systems to update, bugs to fix, users to please, and on and on. A sysadmin's job might even entail fixing the printer (sorry). To celebrate the hard work our sysadmins do for us, keeping our machines up and running, we've collected five horror stories that prove just how scary / difficult it can be.

Do you have your own sysadmin horror story? Let us know in the comments below.

Screech! Crash! Boom.

from David Both

Back in the late 1970s I was working for IBM as a customer engineer in a small town in northwest Ohio. There were a lot of what were, even then, very old unit record devices like keypunches, card sorters and other, similar devices that I worked on quite frequently. There were also some more modern mid-range and mainframe computers that we serviced. On one late summer evening, I was working on one of the keypunches and was more or less a witness to the worst thing that can happen to a business.

It seems that this company had hired a new night operator who had been on the job for only a couple weeks. He was following the instructions in the run book to the letter while running payroll and loaded the payroll diskpack on one of the large IBM disk drives, probably an IBM 3350, and started it up. At that point the newly minted operator heard a very loud screeching sound and the disk failed to come on line.

As a more experienced operator would have known, the drive had suffered a head crash, or what IBM called Head-Disk Interference (HDI). This meant that both the heads and the disk itself were damaged.

The new operator then placed the same disk pack on a different drive unit with exactly the same result. He knew that was not good, but he had been told where the backup payroll disk pack was located, so he proceeded to load that onto the first, already damaged drive unit. When he tried to load it, that combination also resulted in the same bone-chilling screech. He now figured that he should call the lead operator who immediately rushed on-site and, after hearing what had happened, fired the poor newbie on the spot.

It only took the IBM field engineer a few hours to rebuild the two damaged drive units, but it took the company weeks to recover all of the lost data by hand. Sometimes a single backup is not enough, and complete operator training is of paramount importance.

The accidental spammer

An anonymous story

It's a pretty common story that new sys admins have to tell: They set up an email server and don't restrict access as a relay, and months later they discover they've been sending millions of spam email across the world. That's not what happened to me.

I set up a Postfix and Dovecot email server, it was running fine, it had all the right permissions and all the right restrictions. It worked brilliantly for years. Then one morning, I was given a file of a few hundred email addresses. I was told it was an art organization list, and there was an urgent announcement that must be made to the list as soon as possible. So, I got right on it. I set up an email list, I wrote a quick sed command to pull out the addresses from the file, and I imported all the addresses. Then, I activated everything.

Within ten minutes, my server nearly falls over. It turns out I had been asked to set up a mailing list for people we'd never met, never contacted before, and who had no idea they were being added to a mailing list. I had unknowingly set up a way for for us to spam hundreds of people at arts organizations and universities. Our address got blacklisted by a few places, and it took a week for the angry emails to stop. Lesson: Ask for more information, especially if someone is asking you to import hundreds of addresses.

The rogue server

from Don Watkins

I am a liberal arts person who wound up being a technology director. With the exception of 15 credit hours earned on my way to a Cisco Certified Network Associate credential, all of the rest of my learning came on the job. I believe that learning what not to do from real experiences is often the best teacher. However, those experiences can frequently come at the expense of emotional pain. Prior to my Cisco experience, I had very little experience with TCP/IP networking and the kinds of havoc I could create albeit innocently due to my lack of understanding of the nuances of routing and DHCP.

At the time our school network was an active directory domain with DHCP and DNS provided by a Windows 2000 server. All of our staff access to the email, Internet, and network shares were served this way. I had been researching the use of the K12 Linux Terminal Server (K12LTSP) project and had built a Fedora Core box with a single network card in it. I wanted to see how well my new project worked so without talking to my network support specialists I connected it to our main LAN segment. In a very short period of time our help desk phones were ringing with principals, teachers, and other staff who could no longer access their email, printers, shared directories, and more. I had no idea that the Windows clients would see another DHCP server on our network which was my test computer and pick up an IP address and DNS information from it.

I had unwittingly created a "rogue" DHCP server and was oblivious to the havoc that it would create. I shared with the support specialist what had happened and I can still see him making a bee-line for that rogue computer, disconnecting it from the network. All of our client computers had to be rebooted along with many of our switches which resulted in a lot of confusion and lost time due to my ignorance. That's when I learned that it is best to test new products on their own subnet.

Licensing woes

Another anonymous story

Working at a small non-profit organisation, the CEO of the company would only pay for software by the company he owned stock in; everything else, he had the IT department use illegally (purchase one copy, distribute many). He did this by making it a requirement of the job that certain software was on every computer, but he never authorised the purchase of a a site license or more licenses than what we had to begin with.

I was new to IT and had a grand scheme of how I'd convince people to use free and open source versions of the software, but when the company's CEO and culture explicitly permits illegal use of software, open source can be a tough sell (aside from when it fills in the gaps that the closed source software can't do anyway, but then it's not replacing anything, so the problem remains).

I left the job after it became clear that management truly understood what they were doing and why it was wrong, and had no intention of ever rectifying it. I did this partly because I didn't approve of the ethics (if you're going to use software that requires a license, then pay the licensing fee; that's part of the deal), and partly because I was pretty sure that if the lawyers came knocking, the organization was not going to indemnify the IT department (more likely, they'd throw us under the bus).

Sure enough, about a year after I'd left, they got hit with a lawsuit from one of the companies they were using illegally. I moved on to a company that uses about 90% open source software (some of it paid, some of it $0).

Cover the hole!

from Don Watkins

It was early 2004 and I had recently attended Red Hat system administration training. I was looking for ways to apply my new found knowledge when the Western New York Regional Information Center began looking for a pilot school to try Lotus Notes on a Linux server. I volunteered our school district for the pilot.

Working with a Linux experienced microcomputer support specialist supplied by the regional information center, we used a spare rackmount server and installed Red Hat Enterprise Linux on it. As part of our installation, we configured the server to use an included DDS3 tape drive to backup the email server once a day. Each day I would simply insert a tape marked for each of the five days of the week for a two week cycle we used ten tapes. Everything worked well for a period of time until our tape drive ceased to work properly. Email is mission critical. What were we going to do with a non-functioning tape drive?

Necessity is frequently the mother of invention. I knew very little about BASH scripting but that was about to change rapidly. Working with the existing script and using online help forums, search engines, and some printed documentation, I setup Linux network attached storage computer running on Fedora Core. I learned how to create an SSH keypair and configure that along with rsync to move the backup file from the email server to the storage server. That worked well for a few days until I noticed that the storage servers disk space was rapidly disappearing. What was I going to do?

That’s when I learned more about Bash scripting. I modified my rsync command to delete backed up files older than ten days. In both cases I learned that a little knowledge can be a dangerous thing but in each case my experience and confidence as Linux user and system administrator grew and due to that I functioned as a resource for other. On the plus side, we soon realized that the disk to disk backup system was superior to tape when it came to restoring email files. In the long run it was a win but there was a lot of uncertainty and anxiety along the way.