15.2. Control Flow Analysis
Dragon Book Section 8.4


Basic blocks and flowgraphs

A basic block is a sequence of instructions that has no branches out of it or into it. It is straight-line code.

For example, program fragment

  for i from 1 to 10 do
    for j from 1 to 10 do
      a[i,j] = 0.0;
  for i from 1 to 10 do
    a[i,i] = 1.0;

sets array a to an n×n identity matrix. It might be compiled into the following sequence of intermediate code instructions.

  i = 1
  label 1
  j = 1
  label 2
  t1 = 10*i
  t2 = t1 + j
  t3 = 8 * t2
  t4 = t3 - 88
  a[t4] = 0.0
  j = j + 1
  if j ≤ 10 goto 2
  i = i + 1
  if i ≤ 10 goto 1
  i = 1
  label 3
  t5 = i - 1
  t6 = 88 * t5
  a[t6] = 1.0
  i = i + 1
  if i ≤ 10 goto 3 

The unexpected multiplications by 10 and 8 compute byte addresses, assuming that matrix a is stored by rows and that a real number occupies 8 bytes.

Some optimization has already been done. The first 88 is present because the array indices start at 1 instead of 0. The second 88 is the number of bytes in 11 real numbers, which allows computing the address of a[i,i] efficiently.

The basic blocks can be found easily by breaking the code at label and branch instructions.

Block Instructions
B1
i = 1
Continue to B2
B2
j = 1
Continue to B3
B3
t1 = 10*i
t2 = t1 + j
t3 = 8 * t2
t4 = t3 - 88
a[t4] = 0.0
j = j + 1
if j ≤ 10 goto B3
Continue to B4
B4
i = i + 1
if i ≤ 10 goto B2
Continue to B5
B5
i = 1
Continue to B5
B6
t5 = i - 1
t6 = 88 * t5
a[t6] = 1.0
i = i + 1
if i ≤ 10 goto B6
Continue to exit

The basic blocks make up the vertices of a directed graph. For each basic block that ends on a conditional branch instruction, there are two edges, one from the conditional branch instruction and the other from the 'continue' line.

A basic block that ends of an unconditional branch (goto) instruction, or that does not end on a branch instruction at all, has one edge out of it.


Optimization within a basic block

Within a basic block, the compiler does not need to be concerned with control flow. The instructions are done one after another.

Within a basic block, a compiler can perform some common subexpression elimination by converting the code to a directed graph that is like an abstract syntax tree except that subtrees can be shared.

Each vertex of the directed graph represents a computed value or a variable. It is labeled by an operator and by variables that take on that expression as their values.

Since a variable can change, a variable can have more than one name. Variable c0, for example, is the initial value of c.

Below is a directed graph for basic block

  a = b + c
  b = a - d
  c = b + c
  d = a - d

All arrows point downward.

The directed graph allows the optimizer to compute b and d once, while recognizing that it cannot compute a and c once. The new code is

  a = b + c
  b = a - d
  c = b + c
  d = b

Optimizations based on the flow graph

A program can be sped up considerably by storing variables in registers. But there are a small number of registers, so it is important to decide which variables are critical.

Graph algorithms can be employed to recognize inner loops, which do not have any smaller loops withing them.

Variables that are used in inner loops are good candidates for storing in registers.