I think this post sounds logical but reaches an incorrect conclusion. It’s true that the first thing people think of when discussing the “need” for root access is port numbers under 1024 and modifying kernel settings. But the post uncritically accepts the notion that “users should have the minimum access rights needed to do their job.” This idea, or best practice is widely accepted today, but it’s implications are rarely discussed. The increase in this thinking can and does significantly increase risk of production failures and security incidents.
Here’s why: when a production failure is thoroughly investigated it’s common that there are many contributory root causes - often four, five or more. My experience is that human ignorance will often be part of at least two or three of those factors. When “system administrator” became an established career path it became less common for programmers to have root privileges - all for for very logical, good intentions. The unintended consequence of this is that developers, as a group, have less understanding of how their software actually behaves in production. Linux has evolved and become a much more observable operating system, with sophisticated tracing tools - many of which require root. At the same time, today’s multisocket, multicore hardware is more powerful, and more complex. Yet as hardware becomes more powerful it’s clear that performance and security seem to remain stationary. For application software to make full use of having desire developers need to understand how their code interacts with the machine - something that the popularity of VMs, containers, and Java (3 huge positive developments) all obscure. Unfortunately following security best practices can lead us to less secure environments where no one person understands the entire system.
Just to join the dots further... Here’s what I mean by a developer understanding how a system uses resources:
“Application X has over 100 threads. Three are hot threads that spin on a core- the market data event handler and the two worker threads. Most of the remaining threads are thread per connection threads that are cold most of the time. Then there are four warm threads - the logger, the persister... We want the market data event handler to always run on socket one bevsuse the market data NIC is on the second PCI-X slot and that thread can stay NUMA local. This app is one where Latency is more important than throughput so we don’t use the default NIC Interrupt coalescing settings,...”
The best way to validate that these preconditions are true is to use ethtool, perf-test (both require root). So the question shouldn’t be “do we need root access?” but rather “does it make sense to have root access?” I have seen far more damage caused by developers and SAs not understanding how their systems behave than by the occasional human errors made at a root shell. I think that root access should be audited - one of the best learning experiences about a host can be simply to execute:
history | more