CSCI 3310
Spring 2008
Large Programming Assignment 1

Assigned: February 8
Due: Monday, February 25, 11:59pm.

Read the entire assignment before you begin to implement it. You will need to reread parts while writing your program.

This assignment will take time to write. Do not wait to start on it. Start right away, and try to finish early so that you have time if things do not go according to plan. You will encounter difficulties. If you do not understand something, ask about it.


Infix, prefix and postfix notation

Arithmetic expressions are commonly written in infix notation, where an operator is written between its operands. For example, expression 4+5 says to perform the addition operation (+) on operands 4 and 5.

Infix notation requires notions of precedence (indicating, for example, that multiplication has higher precedence than addition, so that 3+4*5 is 23, not 35) and associativity (indicating, for example, that subtractions are to be done from left to right, so that 10-3-2 is 5, not 9). It also normally requires a way to override the precedence rules by using parentheses. For example, expression 3+4*5 has value 23, but (3+4)*5 has value 35.

An alternative notation is postfix notation, where you write the operator after both of its operands. For example, you write 3 4 + to indicate the result of adding 3 and 4. This is the form used by stack machines, where, to add 3 and 4, you push 3 onto a stack, then push 4 onto the stack, then perform an addition instruction (which pops two numbers from the stack, adds them, and pushes their sum). Postfix notation does not require any notions of precedence or associativity, and does not require parentheses. For example, infix expression 3+(4*5) is equivalent to postfix expression 3 4 5 * +, but infix expression (3+4)*5 is equivalent to postfix expression 3 4 + 5 *. (Try them both using a stack by hand. When you encounter a number, push it onto the stack. Each operation pops two numbers from the stack, calculates its answer, and pushes the result onto the stack.)

Prefix notation has you write an operator before its operands. For example, infix expression 3 + 4 is written + 3 4 in prefix, and infix expression 3 + (4 * 5) is written + 3 * 4 5.

One disadvantage of prefix and postfix notation is the need for a way to know where one number ends and the next begins. We have used spaces for that.


The assignment

Your assignment is to write a program that converts an expression from infix notation to either prefix or postfix notation, the form chosen by the program's user.

The infix expressions that your program needs to handle involve (decimal) integer constants, such as 3 and 72, single-letter variable names, binary operators +, -, * and /, and parentheses. Operators + and - have the same precedence, and are performed from right to left. Operators * and / have the same precedence, which is higher than the precedence of + and -, and they are also done from right to left. Spaces in the input are optional, and are ignored. The output is printed with one space between each part (including operators).

The input begins either with prefix or postfix, indicating the desired output form. For example, if the input is

   postfix (21+x) * (1+y)
then the program should print
  21 x + 1 y + *

Where to find the input

The string to be converted should be on the command line. Your program will be called Convert.java. Command

 java Convert "postfix 2+3*45"
should print
  2 3 45 * +
The command line can have several parts, separated by spaces, and your program needs to consider them all. For example,
  java Convert postfix 2 + 3 * 45
should also print
  2 3 45 * +
The command line arguments are given to the main program as the array parameter, args. For example, command
  java Convert prefix 2 + 3*4
will call main with args[0] = "prefix", args[1] = "2", args[2] = "+" and args[3] = "3*4". The command is normally broken at white space. Quotes override that. So command
 java Convert "postfix 2+3*45"
calls main with args[0] = "postfix 2+3*45".

Where to write the output

This program always writes the result on the standard output (System.out).


Requirements

Call the class that contains the main function Convert, and call the file that contains it Convert.java. Please not that Convert starts with an upper case C. If the shift key on your keyboard is broken, please get a new keyboard.

There are many ways in which this program could be written. But this is intended to be an exercise in object-oriented programming. Subsequent sections describe how to implement this as an object-oriented translator. You are required to write this program using the approach described here, which employs classes, inheritance and abstract methods. Programs written using other approaches will receive no credit.


Semantic trees

Expressions can be diagrammed as trees, where each subexpression is a subtree. For example, expression (3+4)*5 is described by the following tree.

Each node of a semantic tree should be an object. Create an abstract class Expression, with a subclass for each kind of node that can occur in the tree.

You will want a semantic tree to be able to print itself out in either prefix or postfix notation. So add abstract methods printPrefix and printPostfix to class Expression, and implement those methods in each of the subclasses of Expression.


Lexers

You will want a tool that reads components of a string. A component can be a single characters, such as ( and x, or it can be several characters, such as 354. Write a class, Lexer, the contains three methods, skip() and current() and currentInt(). The idea is that current() returns a character describing the current component. (For x, it could just return 'x'. For a number, it might return '0'. Method currentInt() should only be called when current() indicate the the current thing is a number, and it returns the numbers. Method skip() skips ahead to the next token, so that, after calling skip(), you can call current() to get that token.


Grammars and parse trees

A semantic tree is short and clear, and is useful for processing, but it is not very good for syntactic processing, where you are reading in the input understanding what it means. Two difficulties with semantic trees (1) some of the symbols, such as + and *, are at internal nodes, while others are at leaves, and (2) some of the input symbols, such as parentheses, are not in the semantic tree at all.

Another way to show an expression as a tree is to use a parse tree, where all of the symbols that occur in the expression (including parentheses) appear in leaves of the tree. You can read the expression from a parse tree by reading the leaves from left to right. Think of E as standing for expression. The following is a conceivable parse tree for expression (3+4)*5.

Each subtree represents a subexpression. Subtree

indicates an expression that is just the number 5. Subtree

shows an expression that has the form of an expression followed by + followed by another expression.

If you look in the tree at how a node can be connected to its children, you see the following possibilities. Each has a rule with it (in red) that has the root label on the left-hand side, and the sequence of its children on the right-hand side.

The rules that describe the children that each node is allowed to have are collected together, and called a grammar. Taking the above rules, we get the following grammar.

  E -> number
  E -> E + E
  E -> E * E
  E -> ( E )
You can build any tree that follows those rules. Expression 3+4*5 can actually be converted to a tree in two ways.

Notice that each internal node has children that are allowed by one of the rules in the grammar.

Having two different trees for the same expression is unpleasant, because it means that the grammar does not provide information about precedence and associativity. But it is possible to create a different grammar that allows the same expressions (but different parse trees), and that enforces rules of precedence and associativity. The new grammar is more complicated, but that is to be expected, since it gives you more information.

In addition to indicating precedence and associativity, we will make the new grammar handle subtraction and division as well as addition and multiplication.

In the new grammar, there are three different kinds of expressions: an E (expression) expression can be any expresion; a T (term) expression can only involve multiplications and divisions, except inside parentheses; and an F (factor) expression does not allow any operations, except inside parentheses. Phrase number indicates any integer constant (a sequence of one or more decimal digits), and letter indicates any single letter.

  E -> T
  E -> T + E
  E -> T - E
  T -> F
  T -> F * T
  T -> F / T
  F -> number
  F -> letter
  F -> ( E )
Here is a tree for expression 3+4*5 using the new grammar. There is only one way to build a tree for this, or any other, expression, using the new grammar.

For this assignment, use this (unambiguous) grammar for expressions.


Parsing

Your parser will convert a string into a semantic tree. Write a class Parser with a method for each of E, T and F. The result of each method should be the tree that the string describes, or null if there is an error.


Error handling

It is possible that the input is not an allowed expression. If the program encounters a situation that makes no sense, it should just print that the expression is poorly formed, and should not produce any other output.


Testing and debugging

Do not try to write the entire program, then test it. If you do, you probably will end up making the same mistake several times, and need to fix it several times. If there is an error, you will find it difficult to know where the error is.

Instead, implement the program in stages. Here is a suggestion of how to implement this program.

  1. Create the semantic tree classes. Write the printPostfix and printPrefix functions. Add another method, just for debugging, that prints a tree in a form so that you can see the structure of the tree. A good idea is a function printtree(t,n) that prints tree t, indented n spaces. Print the first line with that indent, but print the subtrees indented a little further. For example, tree

    might be printed as follows for debugging.

      *
        +
          3
          4
        5
    
    Build some trees explicitly, by calling the constructor. Print them using debug prints. Print each in prefix and postfix notation. Is everything working?

  2. Write a lexer class. Make the constructor take a string to read, and be sure to make the constructor initialize the lexer to read that string. Do not make the lexer handle the initial word prefix or postfix. Test it by giving the lexer a string constant to work on. Successively get components from the string and print them. Were they obtained correctly? Do not be satisfied with one easy test. Try using numbers with a few digits in them. Be sure to use all of the allowed operators, and be sure to use a few variable names. Try an input with spaces and one without.

  3. Implement the grammar. Make the parsing functions produce trees. Write a main program that parses a string and prints the tree using the debug printer. Does the tree look right? Try several examples. You will want to store the lexer in the Parser class. That does not mean to write the lexer class inside the parser class, and it does not mean that either the parser is a subclass of the lexer or the other way around. Just put a variable in the parser class that holds the lexer. When you construct the parser, give it a lexer to use.

  4. Add an initial stage that puts the command line together into a single string. Test that.

  5. Get rid of the debug print, and handle the initial word prefix or postfix. Get the initial word before sending the expression to the lexer. Make the program do the job it is intended to do. Do not perform major surgery on your program at this point. You should be making minor modifications here.


Submitting your work

You can put all classes in one file or put each class in a separate file. (If you use separate files, just compile them all together in a single compilation.) If your files start with a package line, please comment out that line. Log into one of the Unix computers in the lab, and change your directory to the directory that contains your assignment. For example, use command

 cd Assignment3
if that is where your files are. Now use the following command to submit all .java files in the current directory.
 ~karl/3310/bin/submit L1 *.java