Creating a translator
CSCI 4627
Spring 2002

Due: April 1. Please turn in as assignment 3 using handin. Be sure to turn in all source files, including header files.

An abstract machine is an artifical machine language that is usually implemented by an interpreter. Abstract machines often have richer data structures than real computers, and the instructions tend to be more pleasant for a compiler to generate. The language of an abstract machine is often called a byte-code because each instruction begins with a byte that tells the instruction kind, and some instructions are followed by an integer number of bytes that add extra information. Java and Prolog are languages that are typically implemented using abstract machines and byte codes. See machine.html for a description of a byte code that is similar in spirit to the Java byte code, but is much smaller and simpler. File machinedefs.h contains definitions for the abstract machine.

Modify your parser for the C- language so that it translates the program into the byte code described in machine.html. This involves the following steps.

Tables

You will be adding semantic actions, and those actions will need to have some information about variables. The semantic actions will need to know each variable's type, and where each variable is stored.

Add a table to your parser that keeps track of information about each variable and each function. For each variable or function, you will need to remember the following information.

What is the type of the variable? It can be one of six types: integer, real, integer array, real array, function that returns integer or function that returns real. Do not try to keep track of the types of function parameters yet. We will deal with that later. Just have a numbering convention for the types. Thinking ahead, it would be a good idea if you do not hard-code into your program that the type is represented by an integer. Instead, try to use abstract notions whose implementations can easily be changed later.
Where is each variable stored? The abstract machine distinguishes three different kinds of variables: global variables, parameters and local variables. The table will need to tell which of these three kinds each variable is. It will also need to store a location for each variable. For example, a particular global variable might be the one at offset 3 among the global variables.

Just store the table as an array of structures. Each row of the table will store a name, a type, a kind and a location. (The kind and location will not be used for functions.) Use linear search to look up a variable or function in the table.

Modify the parser to put information into the table. Each variable declaration will make an entry in the table. Each function definition will also make an entry for the function.

When you end a function, all of the information that was added for the parameters and local variables of that function need to be removed from the table before starting the next function. Just keep a marker index in the array where the information about the current function starts. Also keep an index of the logical end of the table. When you exit a function, move the logical end back to the marker, effectively removing the local information. Should you do that before or after you add information about the function itself?

Translation

Think about how to translate each kind of expression and statement. What will the code look like? Then write semantic actions for the expression and statement nonterminals. Each expression nonterminal should produce a sequence of instructions that compute the value of that expression and leave that value on the top of the stack. Each statement nonterminal should produce a sequence of instructions that perform that statement.

Do not write the byte code directly to a file. Instead, put it into an array. Just allocate an array that is of reasonable size, and do not try to make the array grow. As you generate each instruction, add it to the end of the array. When you are done generating, copy the array to a file.

Each instruction has either no additional bytes, one additional byte or an additional byte and a string. Make your instruction array an array of structures, where each member can hold an instruction number, a byte parameter and a string parameter. That way, each instruction will be one row of the array. Write a function that reads the array and writes out the bytes sequentially in a form suitable for the byte code.

Library functions

Normally the compiler needs to arrange for linkage between a program and libraries that contain standard functions. We will take a simpler approach here. There are only a few library functions. Just copy them to the front of the byte code file, so that they will be present in every file. Here are symbolic forms of the input and output functions.

MS_FUNCTION input
M_READ_INTEGER
M_RETURN_INTEGER
MS_END

MS_FUNCTION output
M_FETCH_PARAM_INTEGER 0
M_WRITE_INTEGER
M_PUSH_INTEGER 10
M_WRITE_CHAR
M_RETURN
MS_END

Implementation of the abstract machine

An implemenation for the abstract machine is available. Get the following files and compile the interpreter and disassembler, and the assembler if you want it. You will need to use the definitions in machinedefs.h in your translator. You may also feel free to use any of the definitions in machinetypes.h or instinfo.c.

If you compile the machine implementation and call the executable program machine, then

   machine test.m

will run the abstract machine on byte code file test.m. You can run the machine in trace mode using

   machine -t test.m

for a partial trace and

   machine -T test.m

for a fuller trace. To disassemble a file test.m and write a symbolic form of it in test.i, use

   disassemble test.m test.i

Using the disassembler is the recommended way of looking at byte code programs.

Sample translation

The text shows the following C- program.

  int gcd(int u, int v)
  {
    if(v == 0) return u;
    else return gcd(v, u - u/v*v);
  }

  void main(void)
  {
    int x; 
    int y;
    x = input();
    y = input();
    output(gcd(x,y));
  }

This might be compiled as follows. What is shown here is the disassembled form, not the binary form. The compiler should produce the binary form.