Computer Science 6220
Fall 2001
Programming Assignment 3

Due: TBA

This exercise has you perform type inference on a very small and simple language. The program should be written in Scheme.

The language

The language allows the following forms.

An identifier is a Scheme symbol.
Some identifiers stand for predefined things. They are

nil

the empty list.

cons

the list constructor, as in Scheme.

head

the list head function.

tail

the list tail function.

left

the function that takes the left-hand member of an ordered pair.

right

the function that takes the right-hand member of an ordered pair.

plus

the addition function.

minus

the subtraction function.

equal

The equality testing function.
A definition has the form (def sym expr) where sym is an identifier (symbol) and expr is an expression.
An expression has one of the following forms

(fun sym expr)

(fun x E) is the function f defined by f(x) = E.

(apply expr expr)
(apply f a) is the result of applying function f to argument a.

(pair expr expr)
(pair A B) is computes the ordered pair (A, B).

(list E₁ ... E_n)
This is the list [E₁, ..., E_n].

(let (sym expr) expr)

(let (x A) E) is the result of evaluating expression E after first binding x = A. Inside E you can use x as an expression.

(if expr expr expr)

Expression (if A B C) has the value B when A is true, and value C when A is false.

24

A numeric constant stands for itself (a number).

#\a

A character constant stands for itself.

#t, #f

Boolean constants stand for themselves.

Types

A system for evaluating expressions and definitions might first check the definitions to see if they are well-typed. Your exercise is ONLY to do type inference. You do not need to evaluate the expressions.

Types have the following forms.

Num: All numbers have type Num.
Bool: All boolean values have type Bool.
Char: All characters have type Char.
(Var n): This is type variable number n, where n is an integer.
(Fun A B): This is a function type with domain A and codomain B.
(Pair A B): This is a cartesian product (or "ordered pair") type, (A,B).
(List A): This is the type of a list whose members have type A.

You will need to be able to make a copy of a type, by copying all of the variables in the type. You will also need to be able to unify two types.

Each expression induces some equations on types. To express those equations, I will write T_E for the type of expression E. I will use X and Y for type variables.

nil: T_nil = (List X).
cons: T_cons = (Fun (Pair X (List X)) (List X)).
head: T_head = (Fun (List X) X).
tail: T_tail = (Fun (List X) (List X)).
left: T_left = (Fun (Pair X Y) X).
right: T_right = (Fun (Pair X Y) Y).
plus: T_plus = (Fun (Pair Num Num) Num).
minus: T_minus = (Fun (Pair Num Num) Num).
equal: T_equal = (Fun (Pair X X) Bool).
(fun sym expr): If A is expression (fun x E) then T_A = (Fun T_x T_E).
(apply expr expr): If R is expression (apply f a) then choose two new variables X and Y. The equations are T_f = (Fun X Y), T_a = X, T_R = Y.
(pair expr expr): If R is expression (pair A B) then T_R = (Pair T_A T_B)
(list E₁ ... E_n): Suppose that R is this list expression. Choose a new variable X. Equations are T_R = (List X), T_E₁ = X, ..., T_{E_n} = X.
(let (sym expr) expr): Suppose that R is expression (let (x A) E). Perform all equations for A, completing the unifications. Then make T_x = T_A. Call T_x by then name Z in what follows. Go through expression E, finding each occurrence of identifier x. For each one, make T_x = a copy of Z. Then continue to perform type inference on E.
(if A B C): If R is this expression then T_R = T_B = T_C, and T_A = Bool.
24: A numeric constant has type Num.
#\a: A characger constant has type Char
#t, #f: Boolean constants have type Bool.

The assignment

Write a Scheme function called infer, so that (infer def) returns the most general polymorphic type of the identifier defined in def. Infer can also work on an expression. (infer E) returns the most general polymorphic type of expression E.

A definition can be recursive. Process a definition by attaching a type variable to the symbol that is being defined. Then be sure that the SAME type variable is attached to every occurrence of that symbol. Do not copy the type.

Keep your functions as short and simple as possible. Here are some suggested functions. You can have others as well.

newvar

(newvar) produces a new type variable. You can have a global variable holding the next number to use. Just add 1 to it, using (set! next-num (+ next-num 1)).

bypass

(bypass X tbl) produces the value to which variable X is bound by table tbl. The table can be a list of pairs, such as ((3 (Var 5)) (7 (Fun Num Num))) indicating that variable 3 is bound to variable 5, and variable 7 is bound to type (Fun Num Num). Use this function whenever you look at a variable, to be sure not to forget the binding of the variable. You can use built-in Scheme function assoc to find a variable in the table. Be careful when the answer is a variable. That variable might also be bound. Bypass it.

occurs

(occurs n A tbl) should return true if variable number n occurs in type A, and false otherwise. Table tbl gives bindings of variables.

copy-type

(copy-type A tbl) returns a copy of type A, by replacing each unbound variable in A by a new variable.

unify

(unify A B tbl) unifies types A and B. It assumes that tbl gives bindings of variables that are known before starting this unification, and it produces a new table of all bindings in tbl plus those that are required to perform this unification. Be sure to perform the occur-check.

attach-types

(attach-types E) should attach a different type variable to every subexpression of E. It can do that by adding the type variable to the front of each list. For example, (fun A B) becomes ((Var n) fun A' B') where A' is the result of attaching types to the subexpressions in A and B' is the result of attaching types to the subexpressions in B. Notice that each expression is a subexpression of itself, so be sure to attach a type at the top level.

To attach a type to a simple expression such as a number, make a list. For example, 25 becomes ((var 50) 25). If you prefer, it is acceptable to do a little bit of type inference here, and replace 25 by (Num 25).

unify-identifier-type

(unify-identifier-type x T E) unifies the type of each occurrence of identifier x that does not occur within a let that defines the same identifier to type T. You will want two versions of this, one that copies T each time it does a unification and one that does not.

You will probably find it convenient to break the infer function down into subfunctions, one for doing fun expression, one for let expressions, etc. Then infer just needs to look at the kind of expression and call the appropriate helper function.

Be sure to keep in mind that the expressions have a type up front when you are doing infer. Look for that type.

I cannot stress enough how important it is to keep your function definitions clear, readable and obvious. Do not fall into the trap of writing opaque definitions.

Testing your function

Run your function on some representative definitions.

Computer Science 6220 Fall 2001 Programming Assignment 3

The language

Types

The assignment

Testing your function

Computer Science 6220
Fall 2001
Programming Assignment 3