A debugger is one of those pieces of software that most, if not every, developer uses at least once during their software engineering career, but how many of you know how they actually work? During my talk at linux.conf.au 2018 in Sydney, I will be talking about writing a debugger from scratch... in Rust!
In this article, the terms debugger/tracer are interchangeably. "Tracee" refers to the process being traced by the tracer.
The ptrace system call
Most debuggers heavily rely on a system call known as
ptrace(2), which has the prototype:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
This is a system call that can manipulate almost all aspects of a process; however, before the debugger can attach to a process, the "tracee" has to call
ptrace with the request
PTRACE_TRACEME. This tells Linux that it is legitimate for the parent to attach via
ptrace to this process. But... how do we coerce a process into calling
fork/execve provides an easy way of calling
fork but before the tracee really starts using
fork will also return the
pid of the tracee, which is required for using
Now that the tracee can be traced by the debugger, important changes take place:
- Every time a signal is delivered to the tracee, it stops and a wait-event is delivered to the tracer that can be captured by the
waitfamily of system calls.
execvesystem call will cause a
SIGTRAPto be delivered to the tracee. (Combined with the previous item, this means the tracee is stopped before an
execvecan fully take place.)
This means that, once we issue the
PTRACE_TRACEME request and call the
execve system call to actually start the program in the tracee, the tracee will immediately stop, since
execve delivers a
SIGTRAP, and that is caught by a wait-event in the tracer. How do we continue? As one would expect,
ptrace has a number of requests that can be used for telling the tracee it's fine to continue:
PTRACE_CONT: This is the simplest. The tracee runs until it receives a signal, at which point a wait-event is delivered to the tracer. This is most commonly used to implement "continue-until-breakpoint" and "continue-forever" options of a real-world debugger. Breakpoints will be covered below.
PTRACE_SYSCALL: Very similar to
PTRACE_CONT, but stops before a system call is entered and also before a system call returns to userspace. It can be used in combination with other requests (which we will cover later in this article) to monitor and modify a system call's arguments or return value.
strace, the system call tracer, uses this request heavily to figure out what system calls are made by a process.
PTRACE_SINGLESTEP: This one is pretty self-explanatory. If you used a debugger before, this request executes the next instruction, but stops immediately after.
We can stop the process with a variety of requests, but how do we get the state of the tracee? The state of a process is mostly captured by its registers, so of course
ptrace has a request to get (or modify!) the registers:
PTRACE_GETREGS: This request will give the registers' state as it was when a tracee was stopped.
PTRACE_SETREGS: If the tracer has the values of registers from a previous call to
PTRACE_GETREGS, it can modify the values in that structure and set the registers to the new values via this request.
PTRACE_POKEUSER: These allow reading from the tracee's
USERarea, which holds the registers and other useful information. This can be used to modify a single register, without the more heavyweight
Modifying the registers isn't always sufficient in a debugger. A debugger will sometimes need to read some parts of the memory or even modify it. The GNU Project Debugger (GDB) can use
ptrace has the functionality to implement this:
PTRACE_POKETEXT: These allow reading and writing a word in the address space of the tracee. Of course, the tracee has to be stopped for this to work.
Real-world debuggers also have features like breakpoints and watchpoints. In the next section, I'll dive into the architectural details of debugging support. For the purposes of clarity and conciseness, this article will consider x86 only.
ptrace is all cool, but how does it work? In the previous section, we've seen that
ptrace has quite a bit to do with signals:
SIGTRAP can be delivered during single-stepping, before
execve and before or after system calls. Signals can be generated a number of ways, but we will look at two specific examples that can be used by debuggers to stop a program (effectively creating a breakpoint!) at a given location:
Undefined instructions: When a process tries to execute an undefined instruction, an exception is raised by the CPU. This exception is handled via a CPU interrupt, and a handler corresponding to the interrupt in the kernel is called. This will result in a
SIGILLbeing sent to the process. This, in turn, causes the process to stop, and the tracer is notified via a wait-event. It can then decide what to do. On x86, an instruction
ud2is guaranteed to be always undefined.
Debugging interrupt: The problem with the previous approach is that the
ud2instruction takes two bytes of machine code. A special instruction exists that takes one byte and raises an interrupt. It's
int $3and the machine code is
0xCC. When this interrupt is raised, the kernel sends a
SIGTRAPto the process and, just as before, the tracer is notified.
This is fine, but how do we coerce the tracee to execute these instructions? Easy:
PTRACE_POKETEXT, which can override a word at a memory location. A debugger would read the original word at the location using
PTRACE_PEEKTEXT and replace it with
0xCC, remembering the original byte and the fact that it is a breakpoint in its internal state. The next time the tracee executes at the location, it is automatically stopped by the virtue of a
SIGTRAP. The debugger's end user can then decide how to continue (for instance, inspect the registers).
Okay, we've covered breakpoints, but what about watchpoints? How does a debugger stop a program when a certain memory location is read or written? Surely you wouldn't just overwrite every instruction with
int $3 that could read or write some memory location. Meet debug registers, a set of registers designed to fulfill this goal more efficiently:
DR3: Each of these registers contains an address (a memory location), where the debugger wants the tracee to stop for some reason. The reason is specified as a bitmask in
DR5: These obsolete aliases to
DR6: Debug status. Contains information about which
DR3caused the debugging exception to be raised. This is used by Linux to figure out the information passed along with the
SIGTRAPto the tracee.
DR7: Debug control. Using the bits in these registers, the debugger can control how the addresses specified in
DR3are interpreted. A bitmask controls the size of the watchpoint (whether 1, 2, 4, or 8 bytes are monitored) and whether to raise an exception on execution, reading, writing, or either of reading and writing.
Because the debug registers form part of the
USER area of a process, the debugger can use
PTRACE_POKEUSER to write values into the debug registers. The debug registers are only relevant to a specific process and are thus restored to the value at preemption before the process regains control of the CPU.
Tip of the iceberg
We've glanced at the iceberg a debugger is: we've covered
ptrace, went over some of its functionality, then we had a look at how
ptrace is implemented. Some parts of
ptrace can be implemented in software, but other parts have to be implemented in hardware, otherwise they'd be very expensive or even impossible.
There's plenty that we didn't cover, of course. Questions, like "how does a debugger know where a variable is in memory?" remain open due to space and time constraints, but I hope you've learned something from this article; if it piqued your interest, there are plenty of resources available online to learn more.
Get the highlights in your inbox every week.