Computer Science 4627
Parser

Due: Feb 28, end of day. Please email me your assignment. See below for what to turn in.

  1. Convert the C- grammar of appendix A to a form that can be parsed top-down. Note. The grammar treats mulop, addop and relop as nonterminals. You should treat them as tokens instead. Your lexical analyzer should return the same token for + and -, with an attribute that says which one it is.

  2. Find the First and Follow sets for the nonterminals in the modified grammar.

  3. Modify your lexical analyzer to keep track of the current line number. It should store the current line in a global variable. (Just add one to that variable when you see a newline character.)

  4. Write a recursive descent parser for the modified grammar. The parser should use the lexer that you have created to read tokens. It should print the productions used in a leftmost derivation of the input.


Requirement

Write the parser in C. Use recursive descent. That means there must be one function per nonterminal.

Just before each function, in a comment, list the productions for the nonterminal to which the function corresponds. Also show the First and Follow sets for that nonterminal.

Ensure that the function definition corresponds closely to the productions.

The parser should not need to look at the attributes of any tokens in order to carry out the parse. The attributes are strictly for the semantic phase, which comes later.

The parser and lexer should be linked. Do not include the source code of the lexer in the parser. Instead, create a header file that describes what the lexer provides, and include that. You can put this into file tokens.h. For example, in addition to listing the token numbers, include

  int yylex(void);

  typedef union 
  {
    ...
  }
  ATTRIBUTE;

  extern ATTRIBUTE attr;
in tokens.h


Error reporting

Keep error reporting to a minimum. If the input is not syntactically correct, your program should print a message telling the line number where the error occurred, and then stop the parse.


What to turn in

Turn in your parser and your lexer, in the following format.

  1. The parser should start by showing the entire modified grammar, with the productions numbered.

  2. Create a global variable to hold the current lookahead token, and a function match that checks whether the lookahead is equal to a given token, and gets the next token.

  3. Next should be the functions that correspond to the nonterminals, with a comment before each one showing the relevant productions and the First and Follow sets.

  4. Last, include a main program. It should start by setting the lookahead to yylex(), and then should call the function that corresponds to the start nonterminal in the grammar.