Revisiting the Unix philosophy in 2018

The old strategy of building small, focused applications is new again in the modern microservices environment.
411 readers like this
411 readers like this
Getting started with SQL

Opensource.com

In 1984, Rob Pike and Brian W. Kernighan published an article called "Program Design in the Unix Environment" in the AT&T Bell Laboratories Technical Journal, in which they argued the Unix philosophy, using the example of BSD's cat -v implementation. In a nutshell that philosophy is: Build small, focused programs—in whatever language—that do only one thing but do this thing well, communicate via stdin/stdout, and are connected through pipes.

Sound familiar?

Yeah, I thought so. That's pretty much the definition of microservices offered by James Lewis and Martin Fowler:

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.

While one *nix program or one microservice may be very limited or not even very interesting on its own, it's the combination of such independently working units that reveals their true benefit and, therefore, their power.

*nix vs. microservices

The following table compares programs (such as cat or lsof) in a *nix environment against programs in a microservices environment.

  *nix Microservices
Unit of execution program using stdin/stdout service with HTTP or gRPC API
Data flow Pipes ?
Configuration & parameterization Command-line arguments,

environment variables, config files
JSON/YAML docs
Discovery Package manager, man, make DNS, environment variables, OpenAPI

Let's explore each line in slightly greater detail.

Unit of execution

The unit of execution in *nix (such as Linux) is an executable file (binary or interpreted script) that, ideally, reads input from stdin and writes output to stdout. A microservices setup deals with a service that exposes one or more communication interfaces, such as HTTP or gRPC APIs. In both cases, you'll find stateless examples (essentially a purely functional behavior) and stateful examples, where, in addition to the input, some internal (persisted) state decides what happens.

Data flow

Traditionally, *nix programs could communicate via pipes. In other words, thanks to Doug McIlroy, you don't need to create temporary files to pass around and each can process virtually endless streams of data between processes. To my knowledge, there is nothing comparable to a pipe standardized in microservices, besides my little Apache Kafka-based experiment from 2017.

Configuration and parameterization

How do you configure a program or service—either on a permanent or a by-call basis? Well, with *nix programs you essentially have three options: command-line arguments, environment variables, or full-blown config files. In microservices, you typically deal with YAML (or even worse, JSON) documents, defining the layout and configuration of a single microservice as well as dependencies and communication, storage, and runtime settings. Examples include Kubernetes resource definitions, Nomad job specifications, or Docker Compose files. These may or may not be parameterized; that is, either you have some templating language, such as Helm in Kubernetes, or you find yourself doing an awful lot of sed -i commands.

Discovery

How do you know what programs or services are available and how they are supposed to be used? Well, in *nix, you typically have a package manager as well as good old man; between them, they should be able to answer all the questions you might have. In a microservices setup, there's a bit more automation in finding a service. In addition to bespoke approaches like Airbnb's SmartStack or Netflix's Eureka, there usually are environment variable-based or DNS-based approaches that allow you to discover services dynamically. Equally important, OpenAPI provides a de-facto standard for HTTP API documentation and design, and gRPC does the same for more tightly coupled high-performance cases. Last but not least, take developer experience (DX) into account, starting with writing good Makefiles and ending with writing your docs with (or in?) style.

Pros and cons

Both *nix and microservices offer a number of challenges and opportunities

Composability

It's hard to design something that has a clear, sharp focus and can also play well with others. It's even harder to get it right across different versions and to introduce respective error case handling capabilities. In microservices, this could mean retry logic and timeouts—maybe it's a better option to outsource these features into a service mesh? It's hard, but if you get it right, its reusability can be enormous.

Observability

In a monolith (in 2018) or a big program that tries to do it all (in 1984), it's rather straightforward to find the culprit when things go south. But, in a

yes | tr \\n x | head -c 450m | grep n

or a request path in a microservices setup that involves, say, 20 services, how do you even start to figure out which one is behaving badly? Luckily we have standards, notably OpenCensus and OpenTracing. Observability still might be the biggest single blocker if you are looking to move to microservices.

Global state

While it may not be such a big issue for *nix programs, in microservices, global state remains something of a discussion. Namely, how to make sure the local (persistent) state is managed effectively and how to make the global state consistent with as little effort as possible.

Wrapping up

In the end, the question remains: Are you using the right tool for a given task? That is, in the same way a specialized *nix program implementing a range of functions might be the better choice for certain use cases or phases, it might be that a monolith is the best option for your organization or workload. Regardless, I hope this article helps you see the many, strong parallels between the Unix philosophy and microservices—maybe we can learn something from the former to benefit the latter.

mh9 pic
Michael is a Developer Advocate for Kubernetes and OpenShift at Red Hat where he helps appops to build and operate apps. His background is in large-scale data processing and container orchestration and he's experienced in advocacy and standardization at W3C and IETF. Before Red Hat, Michael worked at Mesosphere, MapR and in two research institutions in Ireland and Austria.

5 Comments

The real point of the unix philosophy isn't processes and pipes I don't think. If you look at any of the examples they use, they aren't building tools and then connecting them. They take existing, powerful, tools and compose them together. The real point is to avoid doing too much work by taking advantage of existing work. The majority of microservice pedagogy doesn't work this way. You build all your services yourself, and then compose them together. Unix tools use pipes in order to make it easy to compose the tools. If everything operates on text, then it's very likely that two tools will be able to work together. If you are building the tools yourself, you have complete control over how they interface. You can use a JSON or binary IPC, or your language's calling convention. The major similarity might be that both methods split up units of work into different processes. But I think in part the reason for doing this might be different. Unix does it this way to save on memory. If the second process in a pipe can't operate until the first one is done, then you have to save all of the first one's output in memory. By operating in parallel, you only have to store a single line at a time. Microservices on the other hand are interested in using parallelism to split up work across cpu cores and across machines.

That all said, you can use microservices to this end. I have recently investigated using a tool called stunnel on my server. The premise is simple: stunnel accepts tls requests on multicast, then forwards the raw tcp to something running on localhost. Rather than having to deal with tls in every application you write, you can let the stunnel devs deal with it for you. This is composition ala unix with the interface being tcp sockets (which were modeled on pipes after all).

Thanks for you comment syrrim and believe it or not, Doug McIlroy himself commented on this via email. Below the verbatim message I received from him today:

> One commenter on your "Revisiting Unix philosphy" note missed the mark on why pipes.
> I wrote the following in reply, but balked when I was asked to register. I don't hide, least
> of all behind the walls of social media. You are welcome to pass it on.
>
> "Unix does it this way to save memory." I've never heard that one before.
> Piping certainly avoids the aggravation of allocating memory, cluttering programs with its name,
> and freeing it. Piping may also save memory bandwidth and latency. But saving memory per se
> has rarely, if ever, been the point.
>
> Nevertheless, your observation, "By operating in parallel, you only have to store a single line at a time,"
> is crucial. It allows one to interact with a pipeline in real time. And it allows a pipeline to provide nonstop
> service. Neither is possible in the no-IPC intermediate-file model.

In reply to by syrrim (not verified)

I guess that's why Linux is being taken over by a single process that does just about everything. So much for the Unix philosophy.

I'm not sure what you meant by "Linux is being taken over by a single process that does just about everything". Would you like to elaborate?

In reply to by Jay Sanders (not verified)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.