Computer Science 6220
Fall 2001
Programming Assignment 3

Due: TBA

This exercise has you perform type inference on a very small and simple language. The program should be written in Scheme.

The language

The language allows the following forms.
  1. An identifier is a Scheme symbol.

  2. Some identifiers stand for predefined things. They are

    nil
    the empty list.
    cons
    the list constructor, as in Scheme.
    head
    the list head function.
    tail
    the list tail function.
    left
    the function that takes the left-hand member of an ordered pair.
    right
    the function that takes the right-hand member of an ordered pair.
    plus
    the addition function.
    minus
    the subtraction function.
    equal
    The equality testing function.

  3. A definition has the form (def sym expr) where sym is an identifier (symbol) and expr is an expression.

  4. An expression has one of the following forms

    (fun sym expr)
    (fun x E) is the function f defined by f(x) = E.
    (apply expr expr)
    (apply f a) is the result of applying function f to argument a.
    (pair expr expr)
    (pair A B) is computes the ordered pair (A, B).
    (list E1 ... En)
    This is the list [E1, ..., En].
    (let (sym expr) expr)
    (let (x A) E) is the result of evaluating expression E after first binding x = A. Inside E you can use x as an expression.
    (if expr expr expr)
    Expression (if A B C) has the value B when A is true, and value C when A is false.
    24
    A numeric constant stands for itself (a number).
    #\a
    A character constant stands for itself.
    #t, #f
    Boolean constants stand for themselves.

Types

A system for evaluating expressions and definitions might first check the definitions to see if they are well-typed. Your exercise is ONLY to do type inference. You do not need to evaluate the expressions.

Types have the following forms.

Num
All numbers have type Num.
Bool
All boolean values have type Bool.
Char
All characters have type Char.
(Var n)
This is type variable number n, where n is an integer.
(Fun A B)
This is a function type with domain A and codomain B.
(Pair A B)
This is a cartesian product (or "ordered pair") type, (A,B).
(List A)
This is the type of a list whose members have type A.

You will need to be able to make a copy of a type, by copying all of the variables in the type. You will also need to be able to unify two types.

Each expression induces some equations on types. To express those equations, I will write TE for the type of expression E. I will use X and Y for type variables.

nil
Tnil = (List X).
cons
Tcons = (Fun (Pair X (List X)) (List X)).
head
Thead = (Fun (List X) X).
tail
Ttail = (Fun (List X) (List X)).
left
Tleft = (Fun (Pair X Y) X).
right
Tright = (Fun (Pair X Y) Y).
plus
Tplus = (Fun (Pair Num Num) Num).
minus
Tminus = (Fun (Pair Num Num) Num).
equal
Tequal = (Fun (Pair X X) Bool).
(fun sym expr)
If A is expression (fun x E) then TA = (Fun Tx TE).
(apply expr expr)
If R is expression (apply f a) then choose two new variables X and Y. The equations are Tf = (Fun X Y), Ta = X, TR = Y.
(pair expr expr)
If R is expression (pair A B) then TR = (Pair TA TB)
(list E1 ... En)
Suppose that R is this list expression. Choose a new variable X. Equations are TR = (List X), TE1 = X, ..., TEn = X.
(let (sym expr) expr)
Suppose that R is expression (let (x A) E). Perform all equations for A, completing the unifications. Then make Tx = TA. Call Tx by then name Z in what follows. Go through expression E, finding each occurrence of identifier x. For each one, make Tx = a copy of Z. Then continue to perform type inference on E.
(if A B C)
If R is this expression then TR = TB = TC, and TA = Bool.
24
A numeric constant has type Num.
#\a
A characger constant has type Char
#t, #f
Boolean constants have type Bool.

The assignment

Write a Scheme function called infer, so that (infer def) returns the most general polymorphic type of the identifier defined in def. Infer can also work on an expression. (infer E) returns the most general polymorphic type of expression E.

A definition can be recursive. Process a definition by attaching a type variable to the symbol that is being defined. Then be sure that the SAME type variable is attached to every occurrence of that symbol. Do not copy the type.

Keep your functions as short and simple as possible. Here are some suggested functions. You can have others as well.

newvar

(newvar) produces a new type variable. You can have a global variable holding the next number to use. Just add 1 to it, using (set! next-num (+ next-num 1)).

bypass

(bypass X tbl) produces the value to which variable X is bound by table tbl. The table can be a list of pairs, such as ((3 (Var 5)) (7 (Fun Num Num))) indicating that variable 3 is bound to variable 5, and variable 7 is bound to type (Fun Num Num). Use this function whenever you look at a variable, to be sure not to forget the binding of the variable. You can use built-in Scheme function assoc to find a variable in the table. Be careful when the answer is a variable. That variable might also be bound. Bypass it.

occurs

(occurs n A tbl) should return true if variable number n occurs in type A, and false otherwise. Table tbl gives bindings of variables.

copy-type

(copy-type A tbl) returns a copy of type A, by replacing each unbound variable in A by a new variable.

unify

(unify A B tbl) unifies types A and B. It assumes that tbl gives bindings of variables that are known before starting this unification, and it produces a new table of all bindings in tbl plus those that are required to perform this unification. Be sure to perform the occur-check.

attach-types

(attach-types E) should attach a different type variable to every subexpression of E. It can do that by adding the type variable to the front of each list. For example, (fun A B) becomes ((Var n) fun A' B') where A' is the result of attaching types to the subexpressions in A and B' is the result of attaching types to the subexpressions in B. Notice that each expression is a subexpression of itself, so be sure to attach a type at the top level.

To attach a type to a simple expression such as a number, make a list. For example, 25 becomes ((var 50) 25). If you prefer, it is acceptable to do a little bit of type inference here, and replace 25 by (Num 25).

unify-identifier-type

(unify-identifier-type x T E) unifies the type of each occurrence of identifier x that does not occur within a let that defines the same identifier to type T. You will want two versions of this, one that copies T each time it does a unification and one that does not.

You will probably find it convenient to break the infer function down into subfunctions, one for doing fun expression, one for let expressions, etc. Then infer just needs to look at the kind of expression and call the appropriate helper function.

Be sure to keep in mind that the expressions have a type up front when you are doing infer. Look for that type.

I cannot stress enough how important it is to keep your function definitions clear, readable and obvious. Do not fall into the trap of writing opaque definitions.

Testing your function

Run your function on some representative definitions.