How the GDB debugger and other tools use call frame information to determine the active function calls

Get the active function call from your debugger.
1 reader likes this.

In my previous article, I showed how debuginfo is used to map between the current instruction pointer (IP) and the function or line containing it. That information is valuable in showing what code the processor is currently executing. However, having more context for the calls that lead up to the current function and line being executed is also extremely helpful.

For example, suppose a function in a library has an illegal memory access due to a null pointer being passed as a parameter into the function. Just looking at the current function and line shows that the fault was triggered by attempted access through a null pointer. However, what you really want to know is the full context of the active function calls leading up to that null pointer access, so you can determine how that null pointer was initially passed into the library function. This context information is provided by a backtrace, and allows you to determine which functions could be responsible for the bogus parameter.

One thing’s certain: Determining the currently active function calls is a non-trivial operation.

Function activation records

Modern programming languages have local variables and allow for recursion where a function can call itself. Also, concurrent programs have multiple threads that may have the same function running at the same time. The local variables cannot be stored in global locations in these situations. The locations of the local variables must be unique for each invocation of the function. Here’s how it works:

  1. The compiler produces a function activation record each time a function is called to store local variables in a unique location.
  2. For efficiency, the processor stack is used to store the function activation records.
  3. A new function activation record is created at the top of the processor stack for the function when it’s called.
  4. If that function calls another function, then a new function activation record is placed above the existing function activation record.
  5. Each time there is a return from a function, its function activation record is removed from the stack.

The creation of the function activation record is created by code in the function called the prologue. The removal of the function activation record is handled by the function epilogue. The body of the function can make use of the memory set aside on the stack for it for temporary values and local variables.

Function activation records can be variable size. For some functions, there’s no need for space to store local variables. Ideally, the function activation record only needs to store the return address of the function that called this function. For other functions, significant space may be required to store local data structures for the function in addition to the return address. This variation in frame sizes leads to compilers using frame pointers to track the start of the function’s activation frame. Now the function prologue code has the additional task of storing the old frame pointer before creating a new frame pointer for the current function, and the epilogue has to restore the old frame pointer value.

The way that the function activation record is laid out, the return address and old frame pointer of the calling function are constant offsets from the current frame pointer. With the old frame pointer, the next function’s activation frame on the stack can be located. This process is repeated until all the function activation records have been examined.

Optimization complications

There are a couple of disadvantages to having explicit frame pointers in code. On some processors, there are relatively few registers available. Having an explicit frame pointer causes more memory operations to be used. The resulting code is slower because the frame pointer must be in one of the registers. Having explicit frame pointers may constrain the code that the compiler can generate, because the compiler may not intermix the function prologue and epilogue code with the body of the function.

The compiler’s goal is to generate fast code where possible, so compilers typically omit frame pointers from generated code. Keeping frame pointers can significantly lower performance, as shown by Phoronix’s benchmarking. The downside of omitting frame pointers is that finding the previous calling function’s activation frame and return address are no longer simple offsets from the frame pointer.

Call Frame Information

To aid in the generation of function backtraces, the compiler includes DWARF Call Frame Information (CFI) to reconstruct frame pointers and to find return addresses. This supplemental information is stored in the .eh_frame section of the execution. Unlike traditional debuginfo for function and line location information, the .eh_frame section is in the executable even when the executable is generated without debug information, or when the debug information has been stripped from the file. The call frame information is essential for the operation of language constructs like throw-catch in C++.

The CFI has a Frame Description Entry (FDE) for each function. As one of its steps, the backtrace generation process finds the appropriate FDE for the current activation frame being examined. Think of the FDE as a table, with each row representing one or more instructions, with these columns:

  • Canonical Frame Address (CFA), the location the frame pointer would point to
  • The return address
  • Information about other registers

The encoding of the FDE is designed to minimize the amount of space required. The FDE describes the changes between rows rather than fully specify each row. To further compress the data, starting information common to multiple FDEs is factored out and placed in Common Information Entries (CIE). This makes the FDE more compact, but it also requires more work to compute the actual CFA and find the return address location. The tool must start from the uninitialized state. It steps through the entries in the CIE to get the initial state on function entry, then it moves on to process the FDE by starting at the FDE’s first entry, and processes operations until it gets to the row that covers the instruction pointer currently being analyzed.

Example use of Call Frame Information

Start with a simple example with a function that converts Fahrenheit to Celsius. Inlined functions do not have entries in the CFI, so the __attribute__((noinline)) for the f2c function ensures the compiler keeps f2c as a real function.

#include <stdio.h>

int __attribute__ ((noinline)) f2c(int f)
    int c;
    c = (f-32.0) * 5.0 /9.0;
    return c;

int main (int argc, char *argv[])
    int f;
    scanf("%d", &f);
    printf ("%d Fahrenheit = %d Celsius\n",
            f, f2c(f));
    return 0;

Compile the code with:

$ gcc -O2 -g -o f2c f2c.c

The .eh_frame is there as expected:

$ eu-readelf -S f2c |grep eh_frame
[17] .eh_frame_hdr  PROGBITS   0000000000402058 00002058 00000034  0 A  0   0  4
[18] .eh_frame      PROGBITS   0000000000402090 00002090 000000a0  0 A  0   0  8

We can get the CFI information in human readable form with:

$ readelf --debug-dump=frames  f2c > f2c.cfi

Generate a disassembly file of the f2c binary so you can look up the addresses of the f2c and main functions:

$ objdump -d f2c > f2c.dis

Find the following lines in f2c.dis to see the start of f2c and main:

0000000000401060 <main>:
0000000000401190 <f2c>:

In many cases, all the functions in the binary use the same CIE to define the initial conditions before a function’s first instruction is executed. In this example, both f2c and main use the following CIE:

00000000 0000000000000014 00000000 CIE
  Version:                   1
  Augmentation:              "zR"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address column: 16
  Augmentation data:         1b
  DW_CFA_def_cfa: r7 (rsp) ofs 8
  DW_CFA_offset: r16 (rip) at cfa-8

For this example, don’t worry about the Augmentation or Augmentation data entries. Because x86_64 processors have variable length instructions from 1 to 15 bytes in size, the “Code alignment factor” is set to 1. On a processor that only has 32-bit (4 byte instructions), this would be set to 4 and would allow more compact encoding of how many bytes a row of state information applies to. In a similar fashion, there is the “Data alignment factor” to make the adjustments to where the CFA is located more compact. On x86_64, the stack slots are 8 bytes in size.

The column in the virtual table that holds the return address is 16. This is used in the instructions at the tail end of the CIE. There are four DW_CFA instructions. The first instruction, DW_CFA_def_cfa describes how to compute the Canonical Frame Address (CFA) that a frame pointer would point at if the code had a frame pointer. In this case, the CFA is computed from r7 (rsp) and CFA=rsp+8.

The second instruction DW_CFA_offset defines where to obtain the return address CFA-8. In this case, the return address is currently pointed to by the stack pointer (rsp+8)-8. The CFA starts right above the return address on the stack.

The DW_CFA_nop at the end of the CIE is padding to keep alignment in the DWARF information. The FDE can also have padding at the end of the for alignment.

Find the FDE for main in f2c.cfi, which covers the main function from 0x40160 up to, but not including, 0x401097:

00000084 0000000000000014 00000088 FDE cie=00000000 pc=0000000000401060..0000000000401097
  DW_CFA_advance_loc: 4 to 0000000000401064
  DW_CFA_def_cfa_offset: 32
  DW_CFA_advance_loc: 50 to 0000000000401096
  DW_CFA_def_cfa_offset: 8

Before executing the first instruction in the function, the CIE describes the call frame state. However, as the processor executes instructions in the function, the details will change. First the instructions DW_CFA_advance_loc and DW_CFA_def_cfa_offset match up with the first instruction in main at 401060. This adjusts the stack pointer down by 0x18 (24 bytes). The CFA has not changed location but the stack pointer has, so the correct computation for CFA at 401064 is rsp+32. That’s the extent of the prologue instruction in this code. Here are the first couple of instructions in main:

0000000000401060 <main>:
  401060:    48 83 ec 18      sub        $0x18,%rsp
  401064:    bf 1b 20 40 00   mov        $0x40201b,%edi

The DW_CFA_advance_loc makes the current row apply to the next 50 bytes of code in the function, until 401096. The CFA is at rsp+32 until the stack adjustment instruction at 401092 completes execution. The DW_CFA_def_cfa_offset updates the calculations of the CFA to the same as entry into the function. This is expected, because the next instruction at 401096 is the return instruction (ret) and pops the return value off the stack.

  401090:    31 c0        xor        %eax,%eax
  401092:    48 83 c4 18  add        $0x18,%rsp
  401096:    c3           ret

This FDE for f2c function uses the same CIE as the main function, and covers the range of 0x41190 to 0x4011c3:

00000068 0000000000000018 0000006c FDE cie=00000000 pc=0000000000401190..00000000004011c3
  DW_CFA_advance_loc: 1 to 0000000000401191
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r3 (rbx) at cfa-16
  DW_CFA_advance_loc: 29 to 00000000004011ae
  DW_CFA_def_cfa_offset: 8

The objdump output for the f2c function in the binary:

0000000000401190 <f2c>:
  401190:	53                   	push   %rbx
  401191:	89 fb                	mov    %edi,%ebx
  401193:	bf 10 20 40 00       	mov    $0x402010,%edi
  401198:	e8 93 fe ff ff       	call   401030 <puts@plt>
  40119d:	66 0f ef c0          	pxor   %xmm0,%xmm0
  4011a1:	f2 0f 2a c3          	cvtsi2sd %ebx,%xmm0
  4011a5:	f2 0f 5c 05 93 0e 00 	subsd  0xe93(%rip),%xmm0        # 402040 <__dso_handle+0x38>
  4011ac:	00 
  4011ad:	5b                   	pop    %rbx
  4011ae:	f2 0f 59 05 92 0e 00 	mulsd  0xe92(%rip),%xmm0        # 402048 <__dso_handle+0x40>
  4011b5:	00 
  4011b6:	f2 0f 5e 05 92 0e 00 	divsd  0xe92(%rip),%xmm0        # 402050 <__dso_handle+0x48>
  4011bd:	00 
  4011be:	f2 0f 2c c0          	cvttsd2si %xmm0,%eax
  4011c2:	c3                   	ret

In the FDE for f2c, there’s a single byte instruction at the beginning of the function with the DW_CFA_advance_loc. Following the advance operation, there are two additional operations. A DW_CFA_def_cfa_offset changes the CFA to %rsp+16 and a DW_CFA_offset indicates that the initial value in %rbx is now at CFA-16 (the top of the stack).

Looking at this fc2 disassembly code, you can see that a push is used to save %rbx onto the stack. One of the advantages of omitting the frame pointer in the code generation is that compact instructions like push and pop can be used to store and retrieve values from the stack. In this case, %rbx is saved because the %rbx is used to pass arguments to the printf function (actually converted to a puts call), but the initial value of f passed into the function needs to be saved for the later computation. The DW_CFA_advance_loc 29 bytes to 4011ae shows the next state change just after pop %rbx, which recovers the original value of %rbx. The DW_CFA_def_cfa_offset notes the pop changed CFA to be %rsp+8.

GDB using the Call Frame Information

Having the CFI information allows GNU Debugger (GDB) and other tools to generate accurate backtraces. Without CFI information, GDB would have a difficult time finding the return address. You can see GDB making use of this information, if you set a breakpoint at line 7 of f2c.c. GDB puts the breakpoint before the pop %rbx in the f2c function is done and the return value is not at the top of the stack.

GDB is able to unwind the stack, and as a bonus is also able to fetch the argument f that was currently saved on the stack:

$ gdb f2c
(gdb) break f2c.c:7
Breakpoint 1 at 0x40119d: file f2c.c, line 7.
(gdb) run
Starting program: /home/wcohen/present/202207youarehere/f2c
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/".

Breakpoint 1, f2c (f=98) at f2c.c:8
8            return c;
(gdb) where
#0  f2c (f=98) at f2c.c:8
#1  0x000000000040107e in main (argc=<optimized out>, argv=<optimized out>)
        at f2c.c:15

Call Frame Information

The DWARF Call Frame Information provides a flexible way for a compiler to include information for accurate unwinding of the stack. This makes it possible to determine the currently active function calls. I’ve provided a brief introduction in this article, but for more details on how the DWARF implements this mechanism, see the DWARF specification.

Will Cohen with sunflowers
William Cohen has been a developer of performance tools at Red Hat for over a decade and has worked on a number of the performance tools in Red Hat Enterprise Linux and Fedora such as OProfile, PAPI, SystemTap, and Dyninst.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.