Abstract syntax trees

←

↑

→

Abstract Syntax Trees

Create a type of abstract syntax trees that describe an expression. (You will use abstract syntax trees both for describing expressions and for describing values of expressions.) Create file src/ast.h holding definitions of abstract syntax tree node kinds and holding a definition of type AST. Here is a start.

typedef struct astnode
{
  int kind;   /* This tells which kind of expression this is. */
  int mark;   /* Used to mark nodes */
  int extra;

  union {
    int         intval;
    const char* strval;
    struct {
      struct astnode* s1;
      struct astnode* s2;
      struct astnode* s3;
    } subtrees;
  } fields;
}
ASTNODE;

typedef ASTNODE* AST;

#define NUMBER_NK	1
#define APPLY_NK	2
…

Now, if t has type AST* then

t->kind is the kind of t's root (one of the _NK values),
t->fields.intval is the integer value stored in the root of t,
t->fields.subtrees.s1 is the first subtree of t.

In file src/ast.c, create a collection of functions for building particular forms of trees. For example, you might have a constructor numberNode(n) that builds a tree representing the integer n, and another applyNode(A, B) that builds a tree representing an application of A to B. Expect to have one constructor for each kind of expression. Here are definitions of numberNode and applyNode to get you started.

AST numberNode(int n)
{
  AST t = NEW(ASTNODE);

  t->kind          = NUMBER_NK;
  t->fields.intval = n;
  return t;
}

AST applyNode(AST a, AST b)
{
  AST t = NEW(ASTNODE);

  t->kind               = APPLY_NK;
  t->fields.subtrees.s1 = a;
  t->fields.subtrees.s2 = b;
  return t;
}

Please do not pack your functions together. Leave a blank line between functions. Provide documentation for any functions whose meaning is not obvious.

Kinds of nodes

Look at the kinds of expressions that you need to model, and decide what the trees will look like. That should be a matter of reading the expressions section.

Keep the number of different kinds of nodes as small as you reasonably can. Instead of building a separate kind of node for each basic function, just have one that has a function number stored in it. (Use the intval fields for the number of the function.)

Basic functions and operators

For example, suppose you want a tree that stands for expression isNull A. Start by creating a node kind for a basic function and a bacic function kind for isNull.

  #define BASIC_FUN_NK	20
  …

  #define IS_NULL_FK    1
  …

Then basicFunNode(BASICFUN_ISNULL, A) creates a node whose kind is BASIC_FUN_NK, whose extra value if IS_NULL_FK and whose fields.subtrees.s1 field points to A. You might draw the tree as follows.

   BASIC_FUN(IS_NULL)
         |
         A

For a binary operator, such as +, use another kind of node such as

      BASIC_OP(+)
     /           \
    A             B

Lists

For the ':' operator, create two different kinds of nodes.

One kind represents expression A:B where A and B are expressions that need to be evaluated. You can fold that in with other kinds of operators if you like.

The other kind, which I will call a CONS node, represents a:b where a and b have already been simplified. You don't need to create a linked list type. The tree itself will handle that, by using CONS nodes. For example, expression [1+2, 5, 10−3] is represented by the following tree.

          BASIC_OP(COLON)
         /              \
     BASIC_OP(+)        BASIC_OP(COLON)
    /           \      /               \
NUM(1)        NUM(2) NUM(5)     BASIC_OP(COLON)
                               /               \
                           BASIC_OP(-)    EMPTYLIST
                          /           \
                        NUM(10)      NUM(3)

Simplifying (or evaluating) that tree leads to the following tree.

        CONS
       /    \
  NUM(3)    CONS
           /    \
      NUM(5)    CONS
               /    \
          NUM(7)    EMPTYLIST

which represents list [3, 5, 7].

Reducing the number of node kinds

Some kinds of expressions do not need separate kinds of nodes. For example, expession let x = A in B end is equivalent to ((x -> B) A). Just use a function and apply node to build it. There is no need for a LET node.

Similarly, expression A ; B is equivalent to A ~> (x -> B) where x is a new name. So there is no need for a ';' node. Just make a '~>' node.

(Creating a new name that is surely not used elsewhere in the program is easy. You are not bound by the rules of the language when you build trees. So you might start all new names with an underscore, for example.)

Handling the case construct

Instead of a 'case' node with a child for every case, just use an 'if ' node with three children (a condition (subtree s1), a 'then' expression (subtree s2) and an 'else' expression (subtree s3). Translate a case expression into 'if ' nodes. For example,

  case 
    A => B
  | C => D
  | else     => E
  end

becomes tree

           IFNODE
          /  |  \
         /   |   \
        A    B  IFNODE
               /  |  \
              /   |   \
             C    D    E

Printing abstract syntax trees

Write a function that prints a syntax tree for debugging purposes. Then test your trees, by themselves. Write a main program that builds and prints a tree. Did you get what you expected?

Do not try to print a picture of a tree. That is too difficult. To show tree

               CONS
              /    \
         NUM(3)    CONS
                  /    \
             NUM(5)    CONS
                      /    \
                 NUM(7)    EMPTYLIST

write

 CONS
   3
   CONS
     5
     CONS
       7
       []

To show

           BASIC_OP(COLON)
          /              \
      BASIC_OP(+)         BASIC_OP(COLON)
     /           \       /               \
 NUM(1)        NUM(2)  NUM(5)             BASIC_OP(COLON)
                                         /               \
                                    BASIC_OP(-)          EMPTYLIST
                                   /           \
                                 NUM(10)      NUM(3)

write something like the following.

  COLON
    PLUS
      1
      2
    COLON
      5
      COLON
        MINUS
          10
          3
        []

You can use :, + etc. instead of COLON, PLUS, etc.

Functions and capture

A function is represented as an abstract syntax tree (with kind FUNCTION_NK, for example.) A function contains its body as a subtree.

A bound variable is one that refers to a function parameter, and a free variable is a variable that is not bound. For example, in SFL expression

  x -> x y

variable x is bound (because it refers to the parameter x of the function) and variable y is free.

Functions are a little tricky because of the problem of capture. Capture can occur when you pass a function as a parameter to another function. Consider the following SFL expression.

  (x -> (y -> x + y)) (y)

That is an application of a function to something called y. Notice that y is used for two different purposes here. One occurrence of y is bound, while the other is free.

Since the function calls its parameter x, and it is being applied to y, we replace each bound occurrence of x in the function body by y, yielding

  (y -> y + y)

But that is not correct. The free occurrence of y has been substituted into a context where it has become bound. We say that the free occurrence of y has been captured.

Representing functions to avoid capture

We can eliminate the possibility of capture by avoiding using parameter names. A FUNCTION node has only a subtree, its body. Within the body are PARAMETER nodes. Each FUNCTION node has an integer that identifies it, and each PARAMETER holds the identity number of the function whose parameter it is. For example, function

  (x -> y -> x + y)

is represented by a tree something like the following. (I apologize for the ASCII art.)

         FUN(1)
          | 
         FUN(2)
          |
          +
         / \
        /   \ 
    PARAM   PARAM 
      (1)    (2)

Applying that function to identifier y (a free variable) yields

           FUN(3)
            |
            +
           / \
          /   \
      ID(y)   PARAM
                (3)

where a new FUN node with identity 3 has been created. No capture has occurred.

You will need a constructor that builds a function node. Make sure that it converts each occurrence of the function's parameter (which will start out as an ID node) into a PARAMETER node.

Submitting your work

When you have the tree module working, submit it using the following command,

  ~abrahamsonk/5220/bin/submit ast allocate.h ast.c ast.h stringtable.h stringtable.c

Include any other files that are needed for testing abstract syntax trees.