CSCI 3310
Fall 2007
Large Programming Assignment 1

Assigned: September 13
Due: Wednesday, October 3, 11:59pm

Read the entire assignment before you begin to implement it. You will need to come back to it while writing your program.

This assignment will take time to write. Do not wait to start on it. Start right away, and try to finish early so that you have time if things do not go according to plan. You will encounter difficulties. If you do not understand something, ask about it.


Infix and postfix notation

Arithmetic expressions are commonly written in infix notation, where an operator is written between its operands. For example, expression 4+5 says to perform the addition operation (+) on operands 4 and 5.

Infix notation requires notions of precedence (indicating, for example, that multiplication has higher precedence than addition, so that 3+4*5 is 23, not 35) and associativity (indicating, for example, that subtractions are to be done from left to right, so that 10-3-2 is 5, not 9). It also normally requires a way to override the precedence rules by using parentheses. For example, expression 3+4*5 has value 23, but (3+4)*5 has value 35.

An alternative notation is postfix notation, where you write the operator after both of its operands. For example, you write 3 4 + to indicate the result of adding 3 and 4. This is the form used by stack machines, where, to add 3 and 4, you push 3 onto a stack, then push 4 onto the stack, then perform an addition instruction (which pops two numbers from the stack, adds them, and pushes their sum). Postfix notation does not require any notions of precedence or associativity, and does not require parentheses. For example, infix expression 3+(4*5) is equivalent to postfix expression 3 4 5 * +, but infix expression (3+4)*5 is equivalent to postfix expression 3 4 + 5 *. (Try them both using a stack by hand. When you encounter a number, push it onto the stack. Each operation pops two numbers from the stack, calculates its answer, and pushes the result onto the stack.)

One disadvantage of postfix notation is the need for a way to know where one number ends and the next begins. We have used spaces for that.


The assignment

Your assignment is to write a program that converts an expression from infix notation to postfix notation.

The infix expressions that your program needs to handle involve (decimal) integer constants, such as 3 and 72, single-letter variable names, binary operators +, -, * and /, and parentheses. Operators + and - have the same precedence, and are performed from right to left. Operators * and / have the same precedence, which is higher than the precedence of + and -, and they are also done from right to left. Spaces in the input are optional, and are ignored. The output is printed with one space between each part (including operators).

For example, if the input is

   (21+x) * (1+y)
then the program should print
  21 x + 1 y + *

Where to find the input

The string to be converted should be on the command line. Your program will be called Postfix.java. Command

 java Postfix "2+3*45"
should print
  2 3 45 * +

If the first command line argument after the command begins with a dash (minus sign), then your program should skip over that argument. For example, command

 java Postfix -v "2+3*4"
should also print
  2 3 45 * +

Where to write the output

This program always writes the result on the standard output (System.out).


Requirements

Call the class that contains the main function Postfix, and call the file that contains it Postfix.java. Please not that Postfix starts with an upper case P. If the shift key on your keyboard is broken, please get a new keyboard.

There are many ways in which this program could be written. But this is intended to be an exercise in object-oriented programming. Subsequent sections describe how to implement this as an object-oriented translator. You are required to write this program using the approach described here, which employs classes, inheritance and abstract methods. Programs written using other approaches will receive no credit.


Grammars and parse trees

Expressions can be diagrammed as trees, where each subexpression is a subtree. For example, expression (3+4)*5 is described by the following tree.

That kind of tree is short and clear, but it has the difficulties that (1) some of the symbols, such as + and *, are at internal nodes instead of at leaves, and (2) some of the input symbols, such as parentheses, are not in the tree at all.

Another way to show an expression as a tree that is closer to what we need here is to make all of the symbols that occur in the expression (including parentheses) be leaves of the tree, so that you can read the expression by reading the leaves from left to right. Think of E as standing for expression. The following tree is an alternative way to diagram expression (3+4)*5.

Each subtree represents a subexpression. Subtree

indicates an expression that is just the number 5. Subtree

shows an expression that has the form of an expression followed by + followed by another expression.

If you look in the tree at how a node can be connected to its children, you see the following possibilities. Each has a rule with it (in red) that has the root label on the left-hand side, and the sequence of its children on the right-hand side.

The rules that describe the children that each node is allowed to have are collected together, and called a grammar. Taking the above rules, we get the following grammar.

  E -> number
  E -> E + E
  E -> E * E
  E -> ( E )
You can build any tree that follows those rules. Expression 3+4*5 can actually be converted to a tree in two ways.

Notice that each internal node has children that are allowed by one of the rules in the grammar.

Having two different trees for the same expression is unpleasant, because it means that the grammar does not provide information about precedence and associativity. But it is possible to create a different grammar that allows the same expressions (but different trees), and that enforces rules of precedence and associativity. The new grammar is more complicated, but that is to be expected, since it gives you more information.

In addition to indicating precedence and associativity, we will make the new grammar handle subtraction and division as well as addition and multiplication.

In the new grammar, there are three different kinds of expressions: an E (expression) expression can be any expresion; a T (term) expression can only involve multiplications and divisions, except inside parentheses; and an F (factor) expression does not allow any operations, except inside parentheses. Phrase number indicates any integer constant (a sequence of one or more decimal digits), and letter indicates any single letter.

  E -> T
  E -> T + E
  E -> T - E
  T -> F
  T -> F * T
  T -> F / T
  F -> number
  F -> letter
  F -> ( E )
Here is a tree for expression 3+4*5 using the new grammar. There is only one way to build a tree for this, or any other, expression, using the new grammar.

For this assignment, use the above grammar for expressions.


Representing the tree

Each node of a tree should be an object. Create a class for each kind of expression (E, T or F). Call the classes Expression, Term and Factor.

Notice that a node labeled E can have three different forms, depending on which rule is used to create it. The three forms are as follows.

To handle this in the program, create three subclasses of Expression, one for each of the three rules for an E node. For example, suppose that the class that corresponds to rule E -> T + E is called ExpressionPlus. Then an ExpressionPlus object needs to hold two pointers, one to a Term object and one to an Expression object, as shown in the tree diagram. It does not need to remember the + symbol, since that is implicit in the fact that rule E -> T + E was used.

Classes Expression, Term and Factor are abstract classes. You will never create an object that is just an Expression, but instead will create objects that belong to subclasses of Expression.


Writing the answer

The problem of constructing the tree is dealt with below. For now, suppose that you already have the tree, given by the object that represents the root of the tree. The goal is to write the expression out in postfix form. Every object that represents a node of the tree should have a method that prints the subtree rooted at that node, in postfix notation. Suppose that the method is called WritePostfix. How would an ExpressionPlus object write its subtree? It writes its left subtree, then its right subtree, then a + sign. So the implementation for WritePostfix for class ExpressionPlus is something like the following, where the two children are called leftChild (of class Term) and rightChild (of class Expression).

  public void WritePostfix()
  {
    leftChild.WritePostfix();
    System.out.println(" ");
    rightChild.WritePostfix();
    System.out.println(" + ");
  }
You need to do say how to do WritePostfix for every kind of object that can be a node in a tree.

Notice that rightChild.WritePostfix( ) uses WritePostfix for class Expression. But WritePostfix is not implemented for class Expression, it is implemented for the subclasses of Expression, such as ExpressionPlus. That means that WritePostfix is an abstract method for class Expression.


Reading the input

You will find that you need to look at a single input character several times. Create an object, which we will call the lexer, that is responsible for reading the input (a string). When such an object is created, it should be given the string to read, and it should prime the pump with the first thing. A "thing" in the input can be either a number (one or more characters) or a single letter or an operator character. A simple way to handle that is to remember two things about the current "thing": a character (the character read, or a particular digit to stand for a number) and, in the case of a number, the value of the number. Before getting the next thing, the lexer should skip over blanks. If your lexer is called lex, then

  char c = lex.current();
should make c be the current character, and
  long v = lex.currentValue();
should make v be the value of an integer, when the current thing is an integer. (Make currentValue produce a long integer, to allow fairly large integers.) Getting the current value several times in a row should produce the same thing every time. To advance to the next thing, you should do statement
  lex.getNext()


Building the tree

You will want a class method (not an instance method or constructor), for each of the classes Expression, Term and Factor, that reads a string and returns a tree (as the object at the root of the tree). Suppose that the method is called Parse. Then Expression.Parse(lex) should read an expression and return a pointer to a tree that represents the expression. Parse gets characters from the lexer. Similarly, Term.Parse(lex) reads something that could be the sequence of leaves in a T tree, and returns the root of the tree.

Whenever Parse(lex) begins, lexcurrent( ) is the first thing that Parse wants to see. Remember that you might look at a given character several times before throwing it away and getting the next character. There is a simple rule for when to use lex.getNext( ).

When you are processing a character or number that occurs in one of the rules, call lex.getNext( ) to skip that character. Never try to skip a character that is part of a different rule.

The Parse method needs to make decisions. For example, there are three rules for E,

  E -> T
  E -> T + E
  E -> T - E
so the Parse method needs to decide which one is appropriate, based on what it sees in the input. Notice that all of the rules with E on the left-hand side start with a T on their right-hand side. So Expression.Parse(lex) should start by calling Term.Parse(lex), to read a T. The character that comes after the T tells you what to do next. If it is a '+', then the rule must be E -> T + E. If the next character is a '-', then rule E -> T - E is appropriate. And if the next character is anything else, then use rule E -> T; that is, don't read any more in this call to Parse.

Here is an implementation of Expression.Parse. Let's assume that the lexer has class Lex. I have made other presumptions about class names and constructors that should be clear.

  public static Expression Parse(Lex lex)
  {
    Term tree1 = Term.Parse(lex);

    // E -> T + E

    if(lex.current() == '+') 
    {
      lex.getNext();            // skip the '+'
      Expression tree2 = Expression.Parse(lex);
      return new ExpressionPlus(tree1, tree2);
    }

    // E -> T - E

    else if(lex.current() == '-') 
    {
      lex.getNext();           // skip the '-'
      Expression tree2 = Expression.Parse(lex);
      return new ExpressionMinus(tree1, tree2);
    }

    // E -> T

    else 
    {
      return new ExpressionSimple(tree1);
    }
  }
Notice that Expression.Parse returns an object of class Expression, according to its heading. In fact, it always returns an object that is an instance of a subclass of Expression.


Error handling

It is possible that the input is not an allowed expression. If the program encounters a situation that makes no sense, it should just print that the expression is poorly formed, and should not produce any other output.


Testing and debugging

Do not try to write the entire program, then test it. If you do, you probably will end up making the same mistake several times, and need to fix it several times. If there is an error, you will find it difficult to know where the error is.

Instead, implement the program in stages. Here is a suggestion of how to implement this program.

  1. Implement the class for the lexer. Write a small tester that uses a lexer to read a few things, and print them. Be sure to test handling the end of the string and also test skipping spaces.

  2. Implement a small grammar, such as the following.

     E -> letter
     E -> letter + E
     E -> letter - E
    
    Create the subclasses. Implement the method that reads an expression and the method that prints the expression in postfix notation for this small grammar. Test this, and make sure that it works before moving on.

  3. Implement the full grammar above, with Expression, Term and Factor. By this time, you should understand what is going on well enough to do this. Test the program.


Extra credit

For extra credit, add the ability to evaluate an expression instead of printing it in postfix form. If the first command-line argument is -v, then read the expression, evaluate it and print the expression value. For example, if the command is

 java Postfix -v "(5+21)/2"
then the program should print
  13

For this option, there should be no letters in the expression, only digits, operators and parentheses. Make / be integer division so that the results are always integers.

Do this by adding another method to each class that produces the value of the expression represented by a tree. Use that instead of WritePostfix when you see argument -v. Do not do a radical redesign of your program to make it able to evaluate expressions. Just add another capability to each class.

Only do this after the entire program is working, and you have time to add another feature. If the program does not handle conversion to postfix correctly, then you will not receive credit for this extra credit part.


Submitting your work

You can put all classes in one file or put each class in a separate file. (If you use separate files, just compile them all together in a single compilation.) Log into one of the Unix computers in the lab, and change your directory to the directory that contains your assignment. For example, use command

 cd Assignment3
Now use the following command to submit all .java files in the current directory.
 ~karl/3310/bin/submit L1 *.java