Computer Science 4627
Translation

If you approach this in a disciplined manner, working through it a step at a time, you will probably find that the steps are not difficult, and it takes less work than you initially expect it to. If you have questions, please ask.

Modify your compiler by adding code generation for the abstract machine.

General issues

Write functions to generate instructions. For example, write a function that generates a one-byte instruction, another function to generate a two-byte instruction, a a function to generate a call instruction, and a function to generate a string.

A good idea is to pass the instruction number to the one-byte and two-byte generate functions. For example,

  gen1(M_INTEGER_ADD);

would generate an integer add instruction. (Notice that M_INTEGER_ADD is a number, since it is defined to be an integer.) Now gen1 can either generate a single byte containing the number M_INTEGER_ADD, or it can write a line of text containing "M_INTEGER_ADD". All that you need to generate text is an array instructionNames so that instructionNames[i] is the name (a string) of instruction i. So you can have a flag that tells the compiler whether to generate a text or binary file. You should only need to check the flag in the gen... functions.

Write your gen... functions to write information into a file.

Remark on testing

PAY ATTENTION:

You will need to perform tests. I strongly recommend that you make your tests run automatically. Create a script or makefile that runs not just one test, but all of your tests. Each time you create a test, add it to your test suite. You should be able to run all of the tests easily, without nursing it along.

Do not presume that, once a test works, it should be thrown away. Later modifications that you make can cause tests that used to work to fail. You want to be able to run them all after you have made some modifications.

A good idea is to write the results of your tests in a file. Then keep that file. The next time you run the tests, you will want to compare the results to previous tests. The Unix diff utility is useful for that.

Self-checking tests are the best. You might find them difficult to do, but if you can figure out how to make a test check itself, do it.

Code generation, part 1

Think about adding the following to your compiler.

Expressions involving integer constants. Handle operators such as addition and multiplication. Do not handle relational operators yet.
Input and output function calls. (These are handled by instructions in the machine that do input and output.) Do not try to implement general function calls yet.
Compound statements (without any local declarations).
Expression statements.
You should be able to compile a definition of main, without any local variables. So handle a function without parameters, and with a void return type.

Before you start on the changes, write some tests that stay within this subset. Try to test all of the kinds of things that are part of the subset.

Now modify some of your parsing functions to generate code for this subset.

Run your tests. Run the interpreter to see whether it correctly runs your programs. If a test does not work, look at the code that is being generated, and fix the problem.

Note
You are only generating code for a restricted part of the language. Do not remove handling of other parts of the language from your compiler. For example, you are not handling local declarations inside compound statements. But leave processing of local declarations in place. Just do not try to use local declarations in your tests, since your are not ready to handle them.

Note
Be careful about different sizes of constants. The M_PUSH_INTEGER instruction can only handle integers from 0 to 255. Your program should be able to handle larger constants. To do that, create a table of constants. When you need a constant that is larger than 255, check if it is already in the table. If not, add it. At the end, just write out constant definitions for all of the constants in your constant table.

Note

Be careful about different sizes of constants. The M_PUSH_INTEGER instruction can only handle integers from 0 to 255. Your program should be able to handle larger constants. To do that, create a table of constants. When you need a constant that is larger than 255, check if it is already in the table. If not, add it. At the end, just write out constant definitions for all of the constants in your constant table.

Note
An expression statement means to evaluate the expression and then to ignore its result. An expression leaves its result on the stack. To ignore the result, you want to pop it from the stack. So generate a M_POP_INTEGER instruction. But this means that every expression must leave its result on the stack, including output expressions. Since the M_WRITE_INTEGER and M_WRITE_CHAR instructions do not push a value on the stack, you can artificially push a 0 as their value.

Note

An expression statement means to evaluate the expression and then to ignore its result. An expression leaves its result on the stack. To ignore the result, you want to pop it from the stack. So generate a M_POP_INTEGER instruction.

But this means that every expression must leave its result on the stack, including output expressions. Since the M_WRITE_INTEGER and M_WRITE_CHAR instructions do not push a value on the stack, you can artificially push a 0 as their value.

Note
Be sure that main ends by performing an M_RETURN instruction.

Note
Remember that you need to write out the name of the start function using MS_START.

Code generation, part 2

Now add code generation for the following.

Global variable declarations for integer variables (not arrays).
Expressions involving integer variables.
Assignment expressions to simple variables (not arrays).

Before you start to make the changes, write some tests that employ this larger subset. Then make the modifications, and run the tests.

Note
Remember to generate MS_INTEGER_GLOBAL instructions for the global variables, to allocate them.

Note
An assignment expression produces a value, namely, the value that was just assigned. You can handle that by duplicating the value before doing an M_STORE... instruction, so that the value will be on the stack afterwards. That way, an assignment expression, like every other expression, produces a value.

Code generation, part 3

Add code generation for the following.

selection (if) statements.
iteration (while) statements.
relational operators

Before you start, write some tests. Try everything that is being added. Consider asking a friend for additional tests. Feel free to swap tests with classmates.

Make the modifications and run the tests. Fix any problems.

Code generation, part 4

Now add the following.

Function definitions.
Function calls.
Return statements.
Local variables in functions. Do not handle arrays. You will need to generate code to allocate space for the variables when you enter a compound statement and to deallocate the variables when you are at the end of a compound statement.

Before you start create tests that will test the things that are to be added. Then make the changes and run the tests. Fix any problems.

Note
A function whose return type is void does not produce a value. But if you are assuming that every expression leaves its value on the top of the stack, you will want calls to void functions also to leave a value on top of the stack. You can just push a 0 after calling them.

Code generation, part 5

Now add handling of arrays. That includes the following.

Declaration of global arrays. You will need to generate code to create them.
Declaration of local arrays. You will need to do two things to create a local array. You need to allocate space to hold a pointer to the array, and you need to perform an instruction that builds the actual array and stores a pointer to it into the variable.
Fetching of a value from an array. That is, handle a[i] as an expression.
Assigning a value to an array member. That is, handle a[i] = E.

Before you start create some tests. Then make the modifications and run the tests.

Code generation, part 6 (extra credit)

Modify your gen.. functions so that they write the instructions into an array, one instruction per array index. The instructions should be stored in the array in a form, such as a structure, where all of the information is available.

At the end of a declaration, you can write the contents of the array out in either binary or assembler-language form.

But before writing it out, perform a simple peephole optimization stage. You compiler will tend to generate

  M_PUSH_INTEGER 0
  M_POP_INTEGER

which is useless and can be removed, and

  M_DUP_INTEGER
  M_STORE_LOCAL_INTEGER k
  M_POP_INTEGER

which is equivalent to

  M_STORE_LOCAL_INTEGER k

So scan through the code array looking for those and replacing them. It is useful to have an instruction form in your array that represents a null instruction, which is skipped when writing out the code.

Test this. Run all of your tests to see if they are still working.

Computer Science 4627 Translation