Compiler projects using llvm
# Instruction referencing for debug info

This document explains how LLVM uses value tracking, or instruction
referencing, to determine variable locations for debug info in the code
generation stage of compilation. This content is aimed at those working on code
generation targets and optimisation passes. It may also be of interest to anyone
curious about low-level debug info handling.

# Problem statement

At the end of compilation, LLVM must produce a DWARF location list (or similar)
describing what register or stack location a variable can be found in, for each
instruction in that variable's lexical scope. We could track the virtual
register that the variable resides in through compilation, however this is
vulnerable to register optimisations during regalloc, and instruction
movements.

# Solution: instruction referencing

Rather than identify the virtual register that a variable value resides in,
instead in instruction referencing mode, LLVM refers to the machine instruction
and operand position that the value is defined in. Consider the LLVM IR way of
referring to instruction values:

```llvm
%2 = add i32 %0, %1
call void @llvm.dbg.value(metadata i32 %2,
```

In LLVM IR, the IR Value is synonymous with the instruction that computes the
value, to the extent that in memory a Value is a pointer to the computing
instruction. Instruction referencing implements this relationship in the
codegen backend of LLVM, after instruction selection. Consider the X86 assembly
below and instruction referencing debug info, corresponding to the earlier
LLVM IR:

```text
%2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
```

While the function remains in SSA form, virtual register `%2` is sufficient to
identify the value computed by the instruction -- however the function
eventually leaves SSA form, and register optimisations will obscure which
register the desired value is in. Instead, a more consistent way of identifying
the instruction's value is to refer to the `MachineOperand` where the value is
defined: independently of which register is defined by that `MachineOperand`. In
the code above, the `DBG_INSTR_REF` instruction refers to instruction number
one, operand zero, while the `ADD32rr` has a `debug-instr-number` attribute
attached indicating that it is instruction number one.

De-coupling variable locations from registers avoids difficulties involving
register allocation and optimisation, but requires additional instrumentation
when the instructions are optimised instead. Optimisations that replace
instructions with optimised versions that compute the same value must either
preserve the instruction number, or record a substitution from the old
instruction / operand number pair to the new instruction / operand pair -- see
`MachineFunction::substituteDebugValuesForInst`. If debug info maintenance is
not performed, or an instruction is eliminated as dead code, the variable
location is safely dropped and marked "optimised out". The exception is
instructions that are mutated rather than replaced, which always need debug info
maintenance.

# Register allocator considerations

When the register allocator runs, debugging instructions do not directly refer
to any virtual registers, and thus there is no need for expensive location
maintenance during regalloc (i.e. `LiveDebugVariables`). Debug instructions are
unlinked from the function, then linked back in after register allocation
completes.

The exception is `PHI` instructions: these become implicit definitions at
control flow merges once regalloc finishes, and any debug numbers attached to
`PHI` instructions are lost. To circumvent this, debug numbers of `PHI`s are
recorded at the start of register allocation (`phi-node-elimination`), then
`DBG_PHI` instructions are inserted after regalloc finishes. This requires some
maintenance of which register a variable is located in during regalloc, but at
single positions (block entry points) rather than ranges of instructions.

An example, before regalloc:

```text
bb.2:
  %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
```

After:

```text
bb.2:
  DBG_PHI $rax, 1
```

# `LiveDebugValues`

After optimisations and code layout complete, information about variable
values must be translated into variable locations, i.e. registers and stack
slots. This is performed in the [LiveDebugValues pass][`LiveDebugValues`], where
the debug instructions and machine code are separated out into two independent
functions:
 * One that assigns values to variable names,
 * One that assigns values to machine registers and stack slots.

LLVM's existing SSA tools are used to place `PHI`s for each function, between
variable values and the values contained in machine locations, with value
propagation eliminating any unnecessary `PHI`s. The two can then be joined up
to map variables to values, then values to locations, for each instruction in
the function.

Key to this process is being able to identify the movement of values between
registers and stack locations, so that the location of values can be preserved
for the full time that they are resident in the machine.

# Required target support and transition guide

Instruction referencing will work on any target, but likely with poor coverage.
Supporting instruction referencing well requires:
 * Target hooks to be implemented to allow `LiveDebugValues` to follow values
   through the machine,
 * Target-specific optimisations to be instrumented, to preserve instruction
   numbers.

## Target hooks

`TargetInstrInfo::isCopyInstrImpl` must be implemented to recognise any
instructions that are copy-like -- `LiveDebugValues` uses this to identify when
values move between registers.

`TargetInstrInfo::isLoadFromStackSlotPostFE` and
`TargetInstrInfo::isStoreToStackSlotPostFE` are needed to identify spill and
restore instructions. Each should return the destination or source register
respectively. `LiveDebugValues` will track the movement of a value from / to
the stack slot. In addition, any instruction that writes to a stack spill
should have a `MachineMemoryOperand` attached, so that `LiveDebugValues` can
recognise that a slot has been clobbered.

## Target-specific optimisation instrumentation

Optimisations come in two flavours: those that mutate a `MachineInstr` to make
it do something different, and those that create a new instruction to replace
the operation of the old.

The former _must_ be instrumented -- the relevant question is whether any
register def in any operand will produce a different value, as a result of the
mutation. If the answer is yes, then there is a risk that a `DBG_INSTR_REF`
instruction referring to that operand will end up assigning the different
value to a variable, presenting the debugging developer with an unexpected
variable value. In such scenarios, call `MachineInstr::dropDebugNumber()` on the
mutated instruction to erase its instruction number. Any `DBG_INSTR_REF`
referring to it will produce an empty variable location instead, that appears
as "optimised out" in the debugger.

For the latter flavour of optimisation, to increase coverage you should record
an instruction number substitution: a mapping from the old instruction number /
operand pair to new instruction number / operand pair. Consider if we replace
a three-address add instruction with a two-address add:

```text
%2:gr32 = ADD32rr %0, %1, debug-instr-number 1
```

becomes

```text
%2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
```

With a substitution from "instruction number 1 operand 0" to "instruction number
2 operand 0" recorded in the `MachineFunction`. In `LiveDebugValues`,
`DBG_INSTR_REF`s will be mapped through the substitution table to find the most
recent instruction number / operand number of the value it refers to.

Use `MachineFunction::substituteDebugValuesForInst` to automatically produce
substitutions between an old and new instruction. It assumes that any operand
that is a def in the old instruction is a def in the new instruction at the
same operand position. This works most of the time, for example in the example
above.

If operand numbers do not line up between the old and new instruction, use
`MachineInstr::getDebugInstrNum` to acquire the instruction number for the new
instruction, and `MachineFunction::makeDebugValueSubstitution` to record the
mapping between register definitions in the old and new instructions. If some
values computed by the old instruction are no longer computed by the new
instruction, record no substitution -- `LiveDebugValues` will safely drop the
now unavailable variable value.

Should your target clone instructions, much the same as the `TailDuplicator`
optimisation pass, do not attempt to preserve the instruction numbers or
record any substitutions. `MachineFunction::CloneMachineInstr` should drop the
instruction number of any cloned instruction, to avoid duplicate numbers
appearing to `LiveDebugValues`. Dealing with duplicated instructions is a
natural extension to instruction referencing that's currently unimplemented.

[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations