Computer Science 4627
Lexical Analyzer

We will implement a compiler for the C- language described in Appendix A of the text. For this assignment, use Flex to create a lexical analyzer for C-.

Woe to the student who does not follow the instructions.


The assignment

  1. Create a file called tokens.h that defines the tokens. Identify the tokens, and create a name for each one. Then define each to be an integer, using a #define line. Your file might start something like this.

      #define T_INT 1  /* int */
      #define T_IF  2  /* if */
      ...
      
    Use a positive integer for each token number. Make each collection of operators that have the same precedence and associativity have just one token. So, for example, define a T_MULOP token to handle * and /, not two separate tokens. Use an attribute to tell the difference between the two.

  2. Some tokens have attributes. Decide what the attribute of each token that has an attribute will be. Add information about attributes to tokens.h, in comments, so that it is not buried in the program.

    Your attributes do not all need to have the same type. Create a union type for the attributes. Your type might be something like this.

      typedef char* STRING;
      typedef union
      {
        STRING string_attr;
        int    int_attr;
      }
      ATTRIBUTE;
      
    Add a line for each type of attribute that you use.

  3. Now create the lexical analyzer using Flex. It should use a global variable (of type ATTRIBUTE) to hold the attribute. Make yylex() return the token number. Use token names in the lexical analyzer, not token numbers. So you might see

        return T_IF;
      
    but not
        return 2;
      

  4. Test your lexical analyzer. Write a C main program that just calls yylex() repeatedly until yylex() returns 0. For each token, the main program should print the lexeme, the token number and the attribute. (To print the attribute, look at the token to decide what kind of attribute, if any, each token has.) Run the tester on an example file. Check the results. Is it working?


Turning in the assignment

I will give you instructions for turning in this assignment shortly.


Printing in C

You cannot use the C++ object cout, or the stream classes, in C. You will find printf convenient for printing things. The form is

  printf(format, arg1, arg2,...);
where format is a string that describes what the result should be. The format contains parts that start with a % that call for inserting arguments. Some formats are as follows.

%d A decimal integer
%8d A decimal integer in 8 characters (padding with blanks on the left as necessary)
%s A string (type char*)

Statement

  
  printf("There are %d dogs in my %s.\n", 14, "neighborhood");
prints
  There are 14 dogs in my neighborhood.


Allocating memory in C

You cannot use the C++ new operator in C. You can use the malloc and strdup functions to allocate memory.

  char* s = (char*) malloc(n);
allocates exactly n bytes of memory and makes pointer variable s point to that memory.
  char* s = strdup(t);
allocates new memory, copies string t into that memory, and makes pointer variable s point to the copy. If you intend to copy yytext, you might want to use this. (Never return yytext itself as an attribute. It will be destroyed as soon as you read the next token.

The strdup function is available in most implementations of C, but is not standard. Its definition is as follows.

  char* strdup(char* s)
  {
    char* result = (char*) malloc(strlen(s) + 1);
    strcpy(result, s);
    return result;
  }


Remark on overloading

C++ allows you to overload names. For example, you can create more than one function by the same name, as long as the parameters have different types, or there are a different number of parameters. C, on the other hand, does not allow overloading. So choose a different name for each function.