CSCI 3310
Summer 2006
Large Programming Assignment 1

Assigned: May 22
Due: May 28, 11:59pm

Read the entire assignment before you begin to implement it. You will almost certainly want to read it at least twice, and you will need to come back to it while writing your program.

This assignment will take time to write. Do not wait to start on it. Start right away, and try to finish early so that you have time if things do not go according to plan. You will encounter difficulties. If you do not understand something, ask about it.


Infix and postfix notation

Arithmetic expressions are commonly written in infix notation, where an operator is written between its operands. For example, expression 4+5 says to perform the addition operation (+) on operands 4 and 5.

Infix notation requires notions of precedence (indicating, for example, that multiplication has higher precedence than addition, so that 3+4*5 is 23, not 35) and associativity (indicating, for example, that subtractions are to be done from left to right, so that 10-3-2 is 5, not 9). It also normally requires a way to override the precedence rules by using parentheses. For example, expression 3+4*5 has value 23, but (3+4)*5 has value 35.

An alternative notation is postfix notation, where you write the operator after both of its operands. For example, you write 3 4 + to indicate the result of adding 3 and 4. This is the form used by stack machines, where, to add 3 and 4, you push 3 onto a stack, then push 4 onto the stack, then perform an addition instruction (which pops two numbers from the stack, adds them, and pushes their sum). It turns out that postfix notation does not require any notions of precedence or associativity, and does not require parentheses. For example, infix expression 3+(4*5) is equivalent to postfix expression 3 4 5 * +. (Try it using a stack by hand. When you encounter a number, push it onto the stack. Each operation pops two numbers from the stack, calculates its answer, and pushes the result onto the stack.) Infix expression (3+4)*5 is equivalent to postfix expression 3 4 + 5 *.

One disadvantage of postfix notation is the need for a way to know where one number ends and the next begins. We have used spaces for that. If you only use single-digit numbers, that is not a problem.


The assignment

Your assignment is to write a program that reads an expression in infix notation and prints an equivalent expression in postfix notation. The input expression occupies one line, and is ended by the end-of-line character ('\n').

The infix expressions that your program needs to handle involve single-digit integer constants, such as 3 and 7, single-letter variable names, binary operators +, -, * and /, and parentheses. Operators + and - have the same precedence, and are performed from right to left. Operators * and / have the same precedence, which is higher than the precedence of + and -, and they are also done from right to left. Spaces in the input are optional, and are ignored. The output is printed without spaces.

For example, if the input is

   (2+x) * (1+y)
then the program should print
  2x+1y+*

Where to find the input

The program should optionally take the name of a file on the command line. If no file name is given, then the program reads its expression from the standard input. If a file name is given, then the program opens the file by that name and reads the expression from that file. If the file does not exist, the program must print a sensible message indicating that the file could not be opened.

For example, if your program is called expression, then command

 expression
should read an expression from the standard input, but
 expression myfile
should read a program from file myfile.

If the first command line argument after the command begins with a dash (minus sign), then your program should skip over that argument. For example, command

 expression -v
should read from the standard input, and
 expression -v myfile
should read from file myfile.

Where to write the output

This program always writes the result on the standard output.


Requirements

There are many ways in which this program could be written. But this is intended to be an exercise in object-oriented programming. Subsequent sections describe how to implement this as an object-oriented translator. You are required to write this program using the approach described here, which employs classes, inheritance and virtual functions. Programs written using other approaches will receive no credit.


Grammars and parse trees

Expressions can be shown as trees, where each subexpression is a subtree. For example, expression (3+4)*5 is described by the following tree.

That kind of tree is short and clear, but it has the difficulties that (1) some of the symbols, such as + and *, are at internal nodes instead of at leaves, and (2) some of the input symbols, such as parentheses, are not in the tree at all.

Another way to show an expression as a tree that is closer to what we need here is to make all of the symbols that occur in the expression (including parentheses) be leaves of the tree, so that you can read the expression by reading the leaves from left to right. Think of E as standing for expression. The following tree is an alternative way to diagram expression (3+4)*5.

Each subtree represents a subexpression. Subtree

indicates an expression that is just the digit 5. Subtree

shows an expression that has the form of an expression followed by + followed by another expression.

If you look in the tree at how a node can be connected to its children, you see the following possibilities. Each has a rule with it (in red) that has the root label on the left-hand side, and the sequence of its children on the right-hand side.

The rules that describe the children that each node is allowed to have are collected together, and called a grammar. Taking the above rules, we get the following grammar.

  E -> digit
  E -> E + E
  E -> E * E
  E -> ( E )
Now you can build any tree that follows these rules. Expression 3+4*5 can actually be converted to a tree in two ways.

Notice that each internal node has children that are allowed by one of the rules in the grammar.

Having two different trees for the same expression is unpleasant, because it means that the grammar does not provide information about precedence and associativity. But it is possible to create a different grammar that allows the same expressions (but different trees), and that enforces rule of precedence and associativity. The new grammar is more complicated, but that is to be expected, since it gives you more information.

In addition to indicating precedence and associativity, we will make the new grammar handle subtraction and division as well as addition and multiplication.

In the new grammar, there are three different kinds of expressions: an E (expression) expression can be any expresion; a T (term) expression can only involve multiplications and divisions, except inside parentheses; and an F (factor) expression does not allow any operations, except inside parentheses. Phrase digit indicates any single digit, and letter indicates any single letter.

  E -> T
  E -> T + E
  E -> T - E
  T -> F
  T -> F * T
  T -> F / T
  F -> digit
  F -> letter
  F -> ( E )
Here is a tree for expression 3+4*5 using the new grammar. There is only one way to build a tree for this, or any other, expression, using the new grammar.

You should use the above grammar for expressions.


Representing the tree

Each node of a tree should be an object. Create a class for each kind (E, T or F) of expression. Call the classes Expression, Term and Factor.

Notice that a node labeled E can have three different forms, depending on which rule is used to create it. The three forms are as follows.

To handle this in the program, create three subclasses of Expression, one for each of the three rules for an E node. For example, suppose that the class that corresponds to rule E -> T + E is called ExpressionPlus. Then an ExpressionPlus object needs to hold two pointers, one to a Term object and one to an Expression object, as shown in the tree diagram. It does not need to remember the + symbol, since that is implied by the fact that rule E -> T + E was used.

Classes Expression, Term and Factor are abstract classes. You will never create an object that is just an Expression, but instead will create objects that belong to subclasses of Expression.


Writing the result

The problem of constructing the tree is dealt with below. For now, suppose that you already have the tree, given by a pointer to the object that represents the root of the tree. The goal is to write the expression out in postfix form. Every object that represents a node of the tree should have a method that prints the subtree rooted at that node, in postfix notation. Suppose that the method is called WritePostfix. How would an ExpressionPlus object write its subtree? It writes its left subtree, then its right subtree, then a + sign. So the implementation for WritePostfix for class ExpressionPlus is something like the following, where the two children are called leftChild (of type Term*) and rightChild (of type Expression*).

  void ExpressionPlus::WritePostfix()
  {
    leftChild->WritePostfix();
    rightChild->WritePostfix();
    cout << '+';
  }
You need to do say how to do WritePostfix for every kind of node object.

Notice that rightChild->WritePostfix( ) uses WritePostfix for class Expression. But WritePostfix is not implemented for class Expression, it is implemented for the subclasses of Expression, such as ExpressionPlus. That means that WritePostfix is a (pure) virtual function for class Expression.


Reading the input

You will find that you need to look at a single input character several times. Create an object that is responsible for reading the input. When such an object is created, it should be given an open file to read from (which it needs to remember), and it should read the first nonblank character, and remember that. (So there are two things to remember, the file and the last nonblank character read.) If your object is called reader, then

  reader.current()
should return the current character, without changing anything, and
  reader.getNext()
should read the next nonblank character, and remember that as the current character. (getNext will not return any result to its caller.)


Building the tree

You will want a class method (not an instance method or constructor), for each of the classes Expression, Term and Factor, that reads a string and returns a tree (as a pointer to the object at the root of the tree). Suppose that the method is called Parse. Then Expression::Parse(reader) should read an expression and return a pointer to a tree that represents the expression. Parse gets characters from the reader. Similarly, Term::Parse(reader) reads something that could be the sequence of leaves in a T tree, and returns a pointer to the root of the tree.

Whenever a Parse method begins, reader.current( ) is the first character of what Parse wants to read. Remember that you might look at a given character several times before throwing it away and getting the next character. There is a simple rule for when to use reader.getNext( ).

When you are processing an input character in one of the rules, call reader.getNext( ) to skip that character. Never try to skip a character that is part of another rule.

The Parse method needs to make decisions. For example, there are three rules for E,

  E -> T
  E -> T + E
  E -> T - E
so the Parse method needs to decide which one is appropriate, based on what it sees in the input. Notice that all of the rules with E on the left-hand side start with a T on their right-hand side. So Expression::Parse(reader) should start by calling Term::Parse(reader), to read a T. The character that comes after the T tells you what to do next. If it is a '+', then the rule must be E -> T + E. If the next character is a '-', then rule E -> T - E is appropriate. And if the next character is anything else, then use rule E -> T; that is, don't read any more in this call to Parse.

Here is an implementation of Expression::Parse. It takes the reader as a parameter. Let's assume that the reader has class Reader. I have made other presumptions about class names and constructors that should be clear.

  Expression* Expression::Parse(Reader& reader)
  {
    Term* tree1 = Term::Parse(reader);

    // E -> T + E

    if(reader.current() == '+') 
    {
      reader.getNext();            // skip the '+'
      Expression* tree2 = Expression::Parse(reader);
      return new ExpressionPlus(tree1, tree2);
    }

    // E -> T - E

    else if(reader.current() == '-') 
    {
      reader.getNext();           // skip the '-'
      Expression* tree2 = Expression::Parse(reader);
      return new ExpressionMinus(tree1, tree2);
    }

    // E -> T

    else 
    {
      return new ExpressionSimple(tree1);
    }
  }
Notice that Expression::Parse returns a pointer of type Expression*, according to its heading. In fact, it always returns a pointer to an object that belongs to a subclass of Expression.


Error handling

It is possible that the input is not an allowed expression. If you encounter a situation that makes no sense, just print that the expression is poorly formed, and do not produce any other output.

Normally, you are not allowed to use global variables for this course. I will, however, allow you to use one (and only one) global variable for this assignment. You can have a variable of type bool that tells whether there has been an error. When an error occurs, record it in this variable. If, after the parse, you see that there was an error, print an error message instead of trying to print the erroneous expression in postfix notation.


Modules

Break your program into modules, with a module for each of Expression, Term and Factor, another module for the Reader class, and a module for the main program. Each module should have two files, a header file and an implementation file. So there should be a total of ten files. The Expression module should describe and implement class Expression and all of its subclasses. Design the Term and Factor modules similarly. Ensure that modules are organized sanely. For example, do not put an implementation of a Reader method in the Expression module.

Using classes out of order

Classes Expression, Term and Factor need to refer to one another. For example, class ExpressionPlus needs a variable that is a pointer to an object of class Term. To avoid problems, briefly mention each class in the header file for the main program, and include that header file in all of the other header files. For example,

  class Expression;
does not define class Expression, but does allow you to use type Expression* before Expression is defined.

File Expression.h only needs to import the header file of the main program, since it only needs to know that class Term exists. File Expression.cpp needs to know more, since it needs to use terms. It should include Term.h.

Avoiding reading a file more than once

It is easy to find that your program reads a header file more than once, and that can cause problems. To avoid that, add some lines to the header files to cause them to be ignored the second and subsequent times they are read. For example, file Expression.h might look as follows.

#ifndef EXPRESSION_H
#define EXPRESSION_H
.... (rest of header file)
#endif
Use a different name for each header file. For example, in Term.h, use TERM_H.

Handling a global variable

Declare your error flag variable in just one file, such as the main program module. If the variable is called errorOccurred, then the header file for the main program should say

  extern bool errorOccurred;
to allow other modules to use this variable. The main program implementation module will say
  bool errorOccurred;
to create the variable. Do not write the variable in a header file without using the word extern, since that can cause multiple copies of the variable to be created.


Testing and debugging

Do not try to write the entire program, then test it. If you do, you probably will end up making the same mistake several times, and need to fix it several times. If there is an error, you will find it difficult to know where the error is.

Instead, implement the program in stages. Here is a suggestion of how to implement this program.

  1. Implement the reader class. Write a small tester that uses the reader class to read a few characters, and print them. Be sure to test reading an end-of-line character, and also test skipping spaces.

  2. Implement a small grammar, such as the following.

     E -> letter
     E -> letter + E
     E -> letter - E
    
    Create the subclasses. Implement the method that reads an expression and the method that prints the expression in postfix notation for this small grammar. Test this, and make sure that it works before moving on.

  3. Implement the full grammar above, with Expression, Term and Factor. By this time, you should understand what is going on well enough to do this. Test the program.


Extra credit

For extra credit, add the ability to evaluate an expression instead of printing it in postfix form. If the first command-line argument after the command name is -v, then read the expression, evaluate it and print the expression value. For example, if the command is

 expression -v 
and the input is
  (5+21)/2
then the program should print
  13

For this option, there should be no letters in the expression, only digits and operators. Use integer results. Make / be integer division. The result can be a multi-digit integer, and can be negative.

Do this by adding another method to each class that produces the value of the expression represented by a tree. Use that instead of WritePostfix when you see argument -v. Do not do a radical redesign of your program to make it able to evaluate expressions.

Only do this after the entire program is working, and you have time to add another feature.


Implementation notes

Input

You can use either the stdio or the iostream approach to input. If you use iostream, then store an object of class istream in the reader object. (The cin object has class istream. The class ifstream of an input file is a subclass of istream. That will allow you to build a reader that reads from cin or from a disk file.) You will need to use the get method of class istream; If object in has class istream (or one of its subclasses), then in.get(ch) reads the next character, and stores it into ch. Skip blanks by getting more characters until you see a nonblank character. If you use in >> ch, you will encounter problems, since that skips over all white-space characters, including newlines, and you do not want to ignore the end of a line.

Checking for letters and digits

If you include the cctype header file

#include <cctype>
using namespace std;
then you can use functions isalpha(c) and isdigit(c) to test whether character c is a letter or a digit, respectively.

The command line

To use the command line, declare your main program as follows.

 int main(int argc, char** argv)
 {
   ...
 }
Then argv will be an array of strings (where each string is given as a null-terminated array of characters) and argc is the number of strings in the argv array. If you invoke your program using command
  expression -v myfile
then argc will be 3, argv[0] will be "expression", argv[1] will be "-v" and argv[2] will be "myfile".

A Makefile

To build the program in Unix, you will want a Makefile. You can use this one. Put it in the same directory as the rest of your files. (Call it Makefile.) Then, to compile your program, use command

  make
That will recompile only what needs to be recompiled, based on the dependencies indicated in the makefile. Sometimes you want to rebuild everything from the ground up. To do that, do
  make clean
  make


Submitting your work

Put all of your files in a directory for this assignment. Before submitting, move any extraneous .h or .cpp files to a different directory. Log into one of the Unix computers in the lab, and change your directory to the directory that contains your assignment. For example, use command

 cd Assignment1
Now use the following command to submit the assignment.
 ~karl/3310/bin/submit L1 *.h *.cpp