This exercise has you perform type inference on a very small and simple language. The program should be written in Scheme.
An identifier is a Scheme symbol.
Some identifiers stand for predefined things. They are
A definition has the form (def sym expr) where sym is an identifier (symbol) and expr is an expression.
An expression has one of the following forms
A system for evaluating expressions and definitions might first check the definitions to see if they are well-typed. Your exercise is ONLY to do type inference. You do not need to evaluate the expressions.
Types have the following forms.
You will need to be able to make a copy of a type, by copying all of the variables in the type. You will also need to be able to unify two types.
Each expression induces some equations on types. To express those equations, I will write TE for the type of expression E. I will use X and Y for type variables.
Write a Scheme function called infer, so that (infer def) returns the most general polymorphic type of the identifier defined in def. Infer can also work on an expression. (infer E) returns the most general polymorphic type of expression E.
A definition can be recursive. Process a definition by attaching a type variable to the symbol that is being defined. Then be sure that the SAME type variable is attached to every occurrence of that symbol. Do not copy the type.
Keep your functions as short and simple as possible. Here are some suggested functions. You can have others as well.
(newvar) produces a new type variable. You can have a global variable holding the next number to use. Just add 1 to it, using (set! next-num (+ next-num 1)).
(bypass X tbl) produces the value to which variable X is bound by table tbl. The table can be a list of pairs, such as ((3 (Var 5)) (7 (Fun Num Num))) indicating that variable 3 is bound to variable 5, and variable 7 is bound to type (Fun Num Num). Use this function whenever you look at a variable, to be sure not to forget the binding of the variable. You can use built-in Scheme function assoc to find a variable in the table. Be careful when the answer is a variable. That variable might also be bound. Bypass it.
(occurs n A tbl) should return true if variable number n occurs in type A, and false otherwise. Table tbl gives bindings of variables.
(copy-type A tbl) returns a copy of type A, by replacing each unbound variable in A by a new variable.
(unify A B tbl) unifies types A and B. It assumes that tbl gives bindings of variables that are known before starting this unification, and it produces a new table of all bindings in tbl plus those that are required to perform this unification. Be sure to perform the occur-check.
(attach-types E) should attach a different type variable to every subexpression of E. It can do that by adding the type variable to the front of each list. For example, (fun A B) becomes ((Var n) fun A' B') where A' is the result of attaching types to the subexpressions in A and B' is the result of attaching types to the subexpressions in B. Notice that each expression is a subexpression of itself, so be sure to attach a type at the top level.
To attach a type to a simple expression such as a number, make a list. For example, 25 becomes ((var 50) 25). If you prefer, it is acceptable to do a little bit of type inference here, and replace 25 by (Num 25).
(unify-identifier-type x T E) unifies the type of each occurrence of identifier x that does not occur within a let that defines the same identifier to type T. You will want two versions of this, one that copies T each time it does a unification and one that does not.
You will probably find it convenient to break the infer function down into subfunctions, one for doing fun expression, one for let expressions, etc. Then infer just needs to look at the kind of expression and call the appropriate helper function.
Be sure to keep in mind that the expressions have a type up front when you are doing infer. Look for that type.
I cannot stress enough how important it is to keep your function definitions clear, readable and obvious. Do not fall into the trap of writing opaque definitions.
Run your function on some representative definitions.