The Lexical Analyzer

Files

Create files src/lexer.lex and src/lexer.h. The lexer will create a function yylex. It also creates variables yyin and yylval. Put the following into lexer.h.

#include <stdio.h>

int yylex();

extern FILE*   yyin;
extern YYSTYPE yylval;

Identifying lexemes

Now look at the language description.

  1. Identify all of the reserved words. Put them into lexer.lex, one per line.

  2. Identify all of the symbols, such as parentheses. Add them to lexer.lex, one per line.

  3. Look for other kinds of lexemes that a program can contain, including identifiers. Add them to lexer.lex.

Choosing tokens

Now you have a list of most of the lexemes of SFL. Decide on a token name for each kind.

File token.h

Copy the list of lexemes into another file called src/token.h. Don't copy single-character lexemes, such as '('.

Replace each lexeme in token.h by a definition of its token as an integer greater than or equal to 256. For example, you might write

#define TOK_ARROW	256
#define TOK_DEF		257
The numbers that you choose are not important, but be sure to choose a different number for each token, and keep the token numbers reasonably small.

YYSTYPE

Include a definition of type YYSTYPE in token.h, as follows.

#define YYSTYPE_DEFINED
typedef union {
  char* str;
  int   ival;
} 
YYSTYPE;
YYSTYPE is the type of a token attribute. This assumes that a token attribute is either a string or an integer.

lexer.lex

Now return to file lexer.lex. Change it into a lexical analyzer using Flex. See the short description of Flex or this Flex manual or this Flex manual. The lexer returns a token number. Here is a start.

%{
#include "lexer.h"
#include "token.h"
#include "stringtable.h"

YYSTYPE yylval;
%}

letter  [a-zA-Z]
digit   [0-9]
id      {letter}+

%%
[ \t]   {}

"\n"    {line_number++;
        }

def     {return TOK_DEF;
        }

"("     {return '(';
        }

"->"    {return TOK_ARROW;
        }

{id}    {yylval.str = intern(yytext);
         return TOK_ID;
        }
%%

/*=========================================*
 * yylex calls yywrap() when it encounters
 * the end of the file.  Returned value 1 
 * tells yylex t0 return 0, which is yylex's
 * way of saying 'end of file'.
 *=========================================*/

int yywrap()
{
  return 1;
}

You will need to handle all of the lexemes and tokens. You will also need to include handling of spaces and comments. The sample lexer contains cases to handle spaces, tabs and newlines. Create a case to handle comments by doing nothing. (Doing nothing implicitly restarts yylex to try again.)

To compile your flex lexer, use command

  flex lexer.lex
Flex will create a file called lex.yy.c containing the lexer function, called yylex, written in C. Each time you call yylex( ), it returns the next token. yylex( ) returns 0 at the end of the file.

A tester

Create a file src/testlexer.c. Here is a suggestion for it.

#include "lexer.h"

/*======================================*
 * This main function tests the lexer
 * function, yylex().  It assumes that 
 * the command line contains the name of
 * a file to read.
 *======================================*/

int main(int argc, char** argv)
{
  if(argc == 2) {
    /*----------------------------------*
     * Flex reads from open file yyin.  *
     *----------------------------------*/

    yyin = fopen(argv[1], "r");
    if(yyin == NULL) return 1;

    /*-----------------------------*
     * yylex() reads a lexeme and  *
     * returns a token.            *
     *-----------------------------*/

    while(1) {
      int tok = yylex();

      if(tok == 0) {
        break;
      }
      else if(tok == TOK_ID) {
        printf("ID, name = \"%s\"\n",
               yylval.str);
      }
      else {
        printf("Token %d (\"%s\")\n",
               tok, yytext);
      }
    }
    return 0;
  }
}

Here is a modified Makefile that includes the lexer. To build the lexer and the tester, use command

  make bin/testlexer
In the future, you will need to modify the Makefile on your own.

CC     = gcc
LINK   = gcc
CFLAGS = -Wall
RM     = rm

CFILES = src/stringtable.c\
         src/teststringtable.c\
         src/lex.yy.c\
         src/testlexer.c

OFILES = $(patsubst src/%.c,obj/%.o,$(CFILES))

EFILES = bin/teststringtable\
         bin/testlexer

STRINGTABLE_OFILES = obj/stringtable.o\
                     obj/teststringtable.o

TESTLEXER_OFILES = obj/stringtable.o\
                   obj/lex.yy.o\
                   obj/testlexer.o

bin/testlexer: $(TESTLEXER_OFILES)
	$(LINK) -o bin/testlexer $(TESTLEXER_OFILES)

bin/teststringtable: $(STRINGTABLE_OFILES)
	$(LINK) -o bin/teststringtable $(STRINGTABLE_OFILES)

src/lex.yy.c: src/lexer.lex
	cd src; flex lexer.lex

obj/%.o: src/%.c
	$(CC) $(CFLAGS) -c $< -o $@

clean:
	$(RM) $(OFILES) $(EFILES)

Now test the lexer. Write a small test SFL program. Include some comments. Run the lexer on it. Does it work?

When you have the lexer working, submit it using the following command, if your lexer is called lexer.lex.

  cd src
  ~abrahamsonk/5220/bin/submit lexer lexer.lex lexer.h token.h allocate.h stringtable.h stringtable.c