Ops: It's everyone's job now

The last decade was all about teaching sysadmins to write code. The next challenge will be teaching operations to software developers.
605 readers like this.
Perl tricks for system administrators


Today is Sysadmin Appreciation Day. Turn to your nearest and dearest systems administrator and be sure to thank them for the work they do.

"Ops is over."
"Sysadmins? That's so old school."
"All the good engineering teams are automating operations out of existence."

Do you hear this a lot? I do. People love to say that ops is dead. And sure, you can define "ops" to mean all kinds of unpleasant things, many of which should die. But that would be neither accurate nor particularly helpful.

Ops is how you get stuff done.

Here's my definition of operations: Operations is the constellation of your org's technical skills, practices, and cultural values around designing, building, scaling and maintaining systems."  Ops is the process of delivering value to users. Ops is where beautiful theory meets stubborn reality.

In other words, ops is how you get stuff done. It's not optional. You ship software, you do ops. If business is the "why" and dev is the "what," ops is the "how." We are all interwoven and we all participate in each other's mandates.


Twenty years ago ops engineers were called "sysadmins," and we spent our time tenderly caring for a few precious servers. And then DevOps came along. DevOps means lots of things to lots of people, but one thing it unquestionably meant to lots and lots of people was this: "Dear Ops: learn to write code."

Business is the "why," dev is the "what," and ops is the "how."

It was a hard transition for many, but it was an unequivocally good thing. We needed those skills! Complexity was skyrocketing. We could no longer do our jobs without automation, so we needed to learn to write code. It was non-optional. 


It's been 10-15 years since the dawn of the automation age, and we're already well into the early years of its replacement: the era of distributed systems.

Consider the prevailing trends in infrastructure: containers, schedulers, orchestrators. Microservices. Distributed data stores, polyglot persistence. Infrastructure is becoming ever more ephemeral and composable, loosely coupled over lossy networks.  Components are shrinking in size while multiplying in count, by orders of magnitude in both directions.  

We are in the early days of a new era of distributed systems.

And then on the client side: take mobile, for heaven's sake.  The combinatorial explosion of (device types * firmwares * operating systems * apps) is a quantum leap in complexity on its own. Mix that in with distributed cache strategy, eventual consistency, datastores that split their brain between client and server, IoT, and the outsourcing of critical components to third-party vendors (which are effectively black boxes), and you start to see why we are all distributed systems engineers in the near and present future.

All this change demands another fundamental shift in thought and approach. You aren't just writing code: you're building systems. Distributed systems require dramatically more focus on operability and resiliency. Compared to the old monoliths that we could manage using monitoring and automation, the new systems require new assumptions:

  • Distributed systems are never "up"; they exist in a constant state of partially degraded service. Accept failure, design for resiliency, protect and shrink the critical path.
  • You can't hold the entire system in your head or reason about it; you will live or die by the thoroughness of your instrumentation and observability tooling
  • You need robust service registration and discovery, load balancing, and backpressure between every combination of components
  • You need to learn to integrate third-party services; many core functions will be outsourced to teams or companies that you have no direct visibility into or influence upon
  • You have to test in production, and you have to do so safely; you cannot spin up a staging copy of a large distributed system

What do all of these have in common? They're all hallmarks of great operations engineering. And they're no longer optional either. In other words: "Dear software engineers: time to learn ops."

Dear software engineers: It's time to learn ops.

Ops: it's everyone's job now

If the first wave of DevOps transformation focused on leveling up ops teams at writing code, the second wave flips the script. You simply can't develop quality software for distributed systems without constant attention to its operability, maintainability, and debuggability. You can't build modern software without a grounding in ops.

This transformation is well underway, and the evidence is everywhere—venture dollars pouring into "ops for devs" tooling, the maturing consensus that devs must share the on-call rotation, software engineers popping up at traditionally ops-minded conferences, etc. Ops for devs is officially here.

This is a good thing! It was good for ops to learn to write code, and it is good for devs to learn to own their own services. All of these changes lead to better software, tighter feedback loops, more robust practices in the face of still-exploding complexity.

So no, ops isn't going anywhere. It just doesn't look like it used to. Soon it might even look like a software engineer.

User profile image.
Engineer and cofounder/CEO of Honeycomb, a nextgen tool for helping software engineers understand their containers/schedulers/microservicified distributed systems and polyglot persistence layers. Likes: databases, operations under pressure, expensive whiskey. Hates: databases, flappy pages, cheap whiskey. Probably swears more than you.


I like your article, especially the list of assumptions about new system. However, I am a developer new in Ops fields, so, I am looking for good professional, real world references and documentation.
Would you have any recommandation?
Thank you.


Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.