Programming assignment 3
CSCI 5220
Spring 2002

Due: TBA

This assignment is to make your compiler into a translator. It should produce code for an abstract machine that is described in the abstract machine. You will want to include file machinedefs.h to get definitions of instruction codes.

This will involve writing a few functions for translating trees. Here are some suggestions.

  1. Build your translator up gradually. Reduce the language to a small subset, and only handle that subset initially. Get things working for that subset. Then add one feature at a time.

  2. The byte code is broken into sections. Each section starts with an instruction that begins MS_. For example, a function definition begins with MS_FUNCTION. None of the MS_ instructions is executable. They only influence how the loader reads the byte code.

  3. Variables are stored in three areas. Each is either global, or a parameter to the current function or a local variable of the current function. Allocate global variables using MS_INTEGER_GLOBAL for an integer variable (one per variable) and MS_REAL_GLOBAL for a real variable. As you read global declarations, just generate one of these for each variable. The variables are stored in a table in the order encountered. An integer variable occupies one word and a real variable occupies two words. If a program declares

        var x,y: integer;
        var z: real;
        var n: integer;
    
    then it might produce the following declarations for x, y, z and n.
        MS_INTEGER_GLOBAL
        MS_INTEGER_GLOBAL
        MS_REAL_GLOBAL
        MS_INTEGER_GLOBAL
    
    It would also record that x is at offset 0, y is at offset 1, z is at offset 2 and n is at offset 4. (Remember that z occupies two words.)

    To create a global array, you need to declare the size as an integer constant and declare the array using MS_INTEGER_ARRAY_GLOBAL or MS_REAL_ARRAY_GLOBAL. The constants are stored in the order in which they are written. However, integer constants are stored in one table, real constants in another one. So the first integer constant is number 0, the next integer constant number 1, etc. Similarly, the first real constant is real constant 0, the next is real constant 1, etc. Do not take into account the number of words occupied for the constants.

    To create an array of 12 integers, use the following.

         MS_INTEGER_CONSTANT
         '1'
         '2'
         '\0'
         MS_INTEGER_ARRAY_GLOBAL
         0
    
    where the number following MS_INTEGER_ARRAY_GLOBAL is the number of the integer constant (here presumed to be 0). To create an array of 15 real numbers use
         MS_INTEGER_CONSTANT
         '1'
         '5'
         '\0'
         MS_REAL_ARRAY_GLOBAL
         1
    
    where 1 is the constant number. Note that '1' is the Ascii code for the digit 1 (which is 49), but 1 is the binary number 1.

  4. When you declare local variables in a function, you need to allocate them. The simplest thing is to allocate them one at a time. For example, if you have local variables

        var x,y: integer;
        var a: array [2..11] of real;
            z: real
    
    then you would generate the following instructions, allocating each thing as it is encountered. Notice that 2 words are allocated for real variable z.
        M_ALLOC
        1
        M_ALLOC
        1
        M_PUSH_INTEGER
        10
        M_MAKE_REAL_ARRAY
        M_ALLOC
        2
    
    Do not allocate memory for parameters. They are pushed onto the stack by the calling function, and stay where they are. You do not need to store them in the function that is called.

  5. Do not treat compile read and write as function calls. Use M_READ_INTEGER, M_READ_REAL, M_WRITE_INTEGER and M_WRITE_REAL instructions for them. After a write call, print a newline character (Ascii 10).

  6. When you start a function, allocate a variable (say, number 0) to hold the function result. The function result is called the same name as the function in Pascal.

Implementation of the abstract machine

An implemenation for the abstract machine is available. Get the following files and compile the interpreter and disassembler, and the assembler if you want it. You will need to use the definitions in machinedefs.h in your translator. You may also feel free to use any of the definitions in machinetypes.h or instinfo.c.

If you compile the machine implementation and call the executable program machine, then
   machine test.m
will run the abstract machine on byte code file test.m. You can run the machine in trace mode using
   machine -t test.m
for a partial trace and
   machine -T test.m
for a fuller trace. To disassemble a file test.m and write a symbolic form of it in test.i, use
   disassemble test.m test.i
Using the disassembler is the recommended way of looking at byte code programs.

Sample translation

The text shows the following Pascal program.

  program example(input,output);
  var x,y:integer;
  function gcd(a,b: integer): integer;
  begin
    if b = 0 then gcd := a
    else gcd := gcd(b, a mod b)
  end;

  begin
    read(x,y);
    write(gcd(x,y))
  end.
This might be compiled as follows. What is shown here is the disassembled form, not the binary form. Comments follow semicolons. The compiler should produce the binary form, not the symbolic form.
MS_START _main
MS_INTEGER_GLOBAL             ;allocate x
MS_INTEGER_GLOBAL             ;allocate y

MS_FUNCTION gcd  
M_ALLOC 1                     ;allocate gcd (location to store result)
M_FETCH_PARAM_INTEGER 1       ;fetch b
M_PUSH_INTEGER 0              ;push 0
M_COMPARE
M_GOTO_IF_ZERO 1              ;jump to label 0 if b is not 0
M_FETCH_PARAM_INTEGER 0       
M_STORE_LOCAL_INTEGER 0       ;gcd := a
M_GOTO 0                      ;jump to label 1
M_LABEL 1
                              ;begin code for gcd(b, a mod b)
M_FETCH_PARAM_INTEGER 1       ;put the first parameter, b on the stack
M_FETCH_PARAM_INTEGER 0       ;push a
M_FETCH_PARAM_INTEGER 0       ;push a again
M_FETCH_PARAM_INTEGER 1       ;push b
M_INTEGER_MOD                 ;stack now holds parameters a, a mod b
M_CALL 2 gcd                  ;call gcd with 2 words of parameters
M_STORE_LOCAL_INTEGER 0       ;store result from call into local 0 (gcd)
M_LABEL 0
M_FETCH_LOCAL_INTEGER 0       ;fetch result variable (gcd)
M_RETURN_INTEGER              ;return the result
MS_END

MS_FUNCTION _main             ;begin the main program
M_READ_INTEGER                
M_STORE_GLOBAL_INTEGER 0      ;read x
M_READ_INTEGER
M_STORE_GLOBAL_INTEGER 1      ;read y
M_FETCH_GLOBAL_INTEGER 0      ;fetch first parameter x
M_FETCH_GLOBAL_INTEGER 1      ;fetch second parameter y
M_CALL 2 gcd                  ;compute gcd(x,y)
M_WRITE_INTEGER               ;write x
M_PUSH_INTEGER 10
M_WRITE_CHAR                  ;write a newline character
M_RETURN                      ;return from _main (no result)
MS_END