Due: April 1. Please turn in as assignment 3 using handin. Be sure to turn in all source files, including header files.
An abstract machine is an artifical machine language that is usually implemented by an interpreter. Abstract machines often have richer data structures than real computers, and the instructions tend to be more pleasant for a compiler to generate. The language of an abstract machine is often called a byte-code because each instruction begins with a byte that tells the instruction kind, and some instructions are followed by an integer number of bytes that add extra information. Java and Prolog are languages that are typically implemented using abstract machines and byte codes. See machine.html for a description of a byte code that is similar in spirit to the Java byte code, but is much smaller and simpler. File machinedefs.h contains definitions for the abstract machine.
Modify your parser for the C- language so that it translates the program into the byte code described in machine.html. This involves the following steps.
You will be adding semantic actions, and those actions will need to have some information about variables. The semantic actions will need to know each variable's type, and where each variable is stored.
Add a table to your parser that keeps track of information about each variable and each function. For each variable or function, you will need to remember the following information.
What is the type of the variable? It can be one of six types: integer, real, integer array, real array, function that returns integer or function that returns real. Do not try to keep track of the types of function parameters yet. We will deal with that later. Just have a numbering convention for the types. Thinking ahead, it would be a good idea if you do not hard-code into your program that the type is represented by an integer. Instead, try to use abstract notions whose implementations can easily be changed later.
Where is each variable stored? The abstract machine distinguishes three different kinds of variables: global variables, parameters and local variables. The table will need to tell which of these three kinds each variable is. It will also need to store a location for each variable. For example, a particular global variable might be the one at offset 3 among the global variables.
Modify the parser to put information into the table. Each variable declaration will make an entry in the table. Each function definition will also make an entry for the function.
When you end a function, all of the information that was added for the parameters and local variables of that function need to be removed from the table before starting the next function. Just keep a marker index in the array where the information about the current function starts. Also keep an index of the logical end of the table. When you exit a function, move the logical end back to the marker, effectively removing the local information. Should you do that before or after you add information about the function itself?
Think about how to translate each kind of expression and statement. What will the code look like? Then write semantic actions for the expression and statement nonterminals. Each expression nonterminal should produce a sequence of instructions that compute the value of that expression and leave that value on the top of the stack. Each statement nonterminal should produce a sequence of instructions that perform that statement.
Do not write the byte code directly to a file. Instead, put it into an array. Just allocate an array that is of reasonable size, and do not try to make the array grow. As you generate each instruction, add it to the end of the array. When you are done generating, copy the array to a file.
Each instruction has either no additional bytes, one additional byte or an additional byte and a string. Make your instruction array an array of structures, where each member can hold an instruction number, a byte parameter and a string parameter. That way, each instruction will be one row of the array. Write a function that reads the array and writes out the bytes sequentially in a form suitable for the byte code.
Normally the compiler needs to arrange for linkage between a program and libraries that contain standard functions. We will take a simpler approach here. There are only a few library functions. Just copy them to the front of the byte code file, so that they will be present in every file. Here are symbolic forms of the input and output functions.
MS_FUNCTION input M_READ_INTEGER M_RETURN_INTEGER MS_END MS_FUNCTION output M_FETCH_PARAM_INTEGER 0 M_WRITE_INTEGER M_PUSH_INTEGER 10 M_WRITE_CHAR M_RETURN MS_END
An implemenation for the abstract machine is available. Get the following files and compile the interpreter and disassembler, and the assembler if you want it. You will need to use the definitions in machinedefs.h in your translator. You may also feel free to use any of the definitions in machinetypes.h or instinfo.c.
machine test.mwill run the abstract machine on byte code file test.m. You can run the machine in trace mode using
machine -t test.mfor a partial trace and
machine -T test.mfor a fuller trace. To disassemble a file test.m and write a symbolic form of it in test.i, use
disassemble test.m test.iUsing the disassembler is the recommended way of looking at byte code programs.
The text shows the following C- program.
int gcd(int u, int v) { if(v == 0) return u; else return gcd(v, u - u/v*v); } void main(void) { int x; int y; x = input(); y = input(); output(gcd(x,y)); }This might be compiled as follows. What is shown here is the disassembled form, not the binary form. The compiler should produce the binary form.
MS_START main MS_FUNCTION input M_READ_INTEGER M_RETURN_INTEGER MS_END MS_FUNCTION output M_FETCH_PARAM_INTEGER 0 M_WRITE_INTEGER M_PUSH_INTEGER 10 M_WRITE_CHAR M_RETURN MS_END MS_FUNCTION gcd M_FETCH_PARAM_INTEGER 1 M_GOTO_IF_NOT_ZERO 0 M_FETCH_PARAM_INTEGER 0 M_RETURN_INTEGER M_LABEL 0 M_FETCH_PARAM_INTEGER 1 M_FETCH_PARAM_INTEGER 0 M_FETCH_PARAM_INTEGER 0 M_FETCH_PARAM_INTEGER 1 M_INTEGER_DIVIDE M_FETCH_PARAM_INTEGER 1 M_INTEGER_MULTIPLY M_INTEGER_SUBTRACT M_CALL 2 gcd M_RETURN_INTEGER MS_END MS_FUNCTION main M_ALLOC 2 M_CALL 0 input M_STORE_LOCAL_INTEGER 0 M_CALL 0 input M_STORE_LOCAL_INTEGER 1 M_FETCH_LOCAL_INTEGER 0 M_FETCH_LOCAL_INTEGER 1 M_CALL 2 gcd M_CALL 1 output M_RETURN MS_END