What does it mean to be a sysadmin hero?

Two stories from the community on sysadmin heroics: What does it mean to you?
141 readers like this.
Open Force superhero characters

By Jason van Gumster via GitLab, CC BY-SA 4.0 International

Sysadmins spend a lot of time preventing and fixing problems. There are certainly times when a sysadmin becomes a hero, whether to their team, department, company, or the general public, though the people they "saved" from trouble may never even know.

Enjoy these two stories from the community on sysadmin heroics. What does it mean to you?

I worked as a system administrator for a contract with the Army National Guard in the early 2000s. I was involved in a project to pilot a new content management system. This system would enable distribution of online educational materials to classrooms across the country. The state of Montana was chosen for the initial pilot and test phase. I traveled to Montana and deployed several servers running Red Hat Linux plus the content management software in their data centers.

A few days later, I received a call at my regular office from a worried guardsman with urgent news. One of the servers would not boot. Due to the circumstances, there was no way to easily troubleshoot.

The testing was scheduled to commence the following week, which left little time to get the server back online. A delay in the project would be costly. I needed to solve the problem fast. Fortunately, we had several identical servers in our local data center. I used one of them to reinstall the operating system and applications, and then configured it identically to the problem server back in Montana. I then pulled the hard drive, packed it safely, and overnighted it to the Montana National Guard Armory.

The guardsman called me the next morning to say he had the replacement drive. I instructed him on how to remove and replace the boot drive. After doing so, he pressed the power button. We waited for several silent seconds before he informed me that he could see lights and hear the sound of drive activity. I began pinging the server and after a few positive responses, I was able to SSH into it. This was a great sign!

Everyone was relieved that the server was online again so that testing could get underway.

If you want to know more about this program, there is an article here. (It takes a really long time to load for some reason, so I saved it as a PDF just in case.) 

Alan Formy-Duval

Humans love good stories. In IT, stories about heroic feats of coding and cabling go back to the first computers and the bugs that lived inside them. They’re all loved. They’re loved more if the audience wasn’t part of the fallout of what created the story.

Sysadmins tend to be left holding the bag when events turn sour. That fact affords us the often unwanted honor of being cast as the protagonist for one of these stories. Antagonists can be anything from bad weather or dug up cables, to mistyped commands, or simply human error. Because we operate in an industry built around generated conflict and drama, the legendary epics in our industry usually involve sysadmins battling the thoughtless developer. It’s an old trope, but a good one that gets lots of laughs and amazed stares when the stories are told.

I’ve always been someone who’s loved to share these stories with my peers and friends. The camaraderie and the laughs are important to me. These stories are ice breakers and scene closers when on stage, in a conference room, or just when having a beer with your friends after a hard day. But this year, I’ve begun to think about our storytelling tradition a little differently. The heroes we should be talking about around the water cooler aren’t the sysadmins who fix the problem with a flourish at 3am on Sunday. The true heroes in the industry are the sysadmins who prevent the problem from ever happening at 3pm on a Tuesday.

When I talk to my customers about building effective solutions, I focus the conversation around two core principles. First, I implore them to not rabbit hole themselves with shiny objects and base their solution around proven, supportable technology. Yes, shiny new tech can provide value in some use cases. But most likely it just adds complexity that drives down stability and maintainability. These factors all work together to ultimately slow down adoption by their end users.

Platforms that don’t grow are platforms that don’t last. I don’t want to work on a platform that won’t be around for its first upgrade. Violating this principle creates systems that require Herculean efforts to keep alive. No one benefits from that situation, even if you do get a good story out of it.

The second principle I drive home every time I get a chance is to focus on fundamental knowledge, and understand how the technology we’re implementing actually works. We’ve focused for a long time in our industry on marketing our products as fast to deploy, and easy to manage, but that’s almost always a thin veneer. Every IT system designed by humans will ultimately break at some point in its lifecycle. If you don’t understand what’s happening when that system goes sideways, you don’t have a chance of recovering without writing a new saga to talk about at lunch for weeks to come.

It took me much longer than I’m comfortable with to figure out that the same hero stories we all enjoyed are a result of not sticking to the fundamental principles that I value the most in any solution I have a hand in creating. So, when Sysadmin Day rolls around this year, I won’t be lifting my glass to the heroes who built a bad system and kept it alive through extreme circumstances and too much caffeine. I’ll tip my hat and share a drink with the boring people in the middle of our industry who specialize in preventing the hero moments. A boring weekend with the on-call phone is the most heroic thing I’m ever going to ask from my sysadmin brethren from now on.

—Jamie Duncan

User profile image.
Opensource.com publishes stories about creating, adopting, and sharing open source solutions. Follow us on Twitter @opensourceway.


2nd story, Jamie Duncan, right on! I made a similar comment on another thread here on opensource.com .... Being the head of an IT department, I much more value things running quietly and predictably. I know I have good people that will put in the heroic effort when needed, and all of us certainly have those "all nighter" stories to remember.

One thing that pains many sysadmins is the lack of recognition you get for well running systems. People get used to good environments and don't understand what it takes to keep them running. I find more junior sysadmins get disheartened over this. That makes it doubly important for the manager or leader of that department to interact, groom, train, mentor, and guide the team at every opportunity. I truly appreciate the people working for me and keeping morale up is important.

Not that I'm complaining, but I find very few people go out of their way to thank sysadmins for their constant attention to the important details. For anyone reading this, it goes a long way if you offer any simple praise or thanks to some effort well done -- or simply expressing gratitude for not being subjected to lots of system outage.

It also helps boost sysadmin morale when you actually read their e-mail notices, and respond. Bonus points if you are helpful! How many of us sysadmins send multiple important notices about issues/maintenance, only then to be barraged with panic escalations when your carefully communicated and pre-planned work begins.

To me, a sysadmin hero is someone who is technically competent, keeps the level of system outage very low, is genuinely helpful to end-users, and has a great work ethic to do the right thing when the need arises.

Every time Arch gets updated, I feel like a hero sysadmin!

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.