14.1. Intermediate Code


Translation and the back end

Most compilers translate a program from one language (the source language) to another (the target language).

The target language can vary greatly. Samples are:

But a compiler does not restrict itself to a single target language.

Instead, the compiler translates to intermediate code, a simplified machine language. Multiple back ends are written, each converting from the intermediate code into a particular target language.

Retargeting the compiler is just a matter of selecting (or writing) a particular back end.

Compilers with multiple back ends can also be used as cross-compilers; a cross compiler runs on one kind of machine and produces code for a different kind of machine.


3-address codes

3-address intermediate code is a very simple artificial machine language.

Real processors have instructions of more than one size. 3-address code instructions are all the same size.

Each 3-address instruction has 4 parts:

  1. an instruction number telling what kind of instruction it is;

  2. up to three "addresses" that are usually numbers or variable/function names.

The Dragon Book describes 3-address codes in section 6.2. Here are some of the instructions that the book describes.

  1. Binary operator instructions x = y op z, where x is a variable name and y and z are variable names or numbers. Operator op might be + or *, for example. It is the instruction kind.

  2. Unary operator instructions x = op y, where x is a variable name and and y is a variable name or number.

  3. Copy instructions x = y.

  4. Label instructions Label L. (The Dragon Book does not use these, but puts the label on the side.)

  5. Branch instructions

    where L is a label.

  6. Comparison instructions if x op y goto L, where op is a comparison such as = or <.

  7. Procedure calls. The following sequence of instructions calls p(x, y, z).

      param x
      param y
      param z
      call p, 3
    
  8. Array instructions x[i] = y and x = y[i].

  9. Address and pointer instructions


A sample 3-address translation

Code fragment

  do 
    i = i + 1;
  while(a[i] < v);

might be translated to the following sequence of 3-address codes. Variables t1, etc. are temporaries.

  Label L
  t1 = i + 1
  i = t1
  t2 = i*8
  t3 = a[t2]
  if t3 < v goto L

Note the second and third instructions, which together perform i = i + 1. Typical intermediate code generators create code that does a lot of unnecessary copies. Those are fixed up in a later phase of the compiler, so they don't worry us here.


Representing 3-address codes

A single 3-address code instruction is typically represented as a structure with 4 fields: the instruction kind and 3 integers or strings.

Instruction "if x < y goto L" would be represented as follows.

< x y L

But it is more convenient to write them in symbolic form.