CSCI 5220
Spring 2002
Projects

As part of your software development work for CSCI 5220 you will make a modification to the Astarte compiler. This page discusses choosing what modification to make, and also provides some pointers on how to navigate around the compiler.

Background

Astarte is a programming language that I have defined and implemented. It is intended to support several paradigms within a single framework. The implementation has two major parts: a compiler that translates from Astarte into a byte code; and an interpreter that executes the byte code.

Documentation on the language is available. It is not necessary to read it all, but it is a good idea to familiarize yourself with a few of the examples before choosing what to do. You will need to read some more that is closely related to what you want to do as you go. You will probably want to ask me some questions.

Choosing what to do

You can choose any reasonable project that involves making a change to the compiler. The change can involve the front end or the back end. If you have any ideas, I will be happy to hear them. Below are some suggestions. If done well, whatever you do might be incorporated into the implementation.

I do not recommend that you choose a project that involves substantial modification of the interpreter (as oppossed to the compiler). You will probably find that kind of project too ambitious. If what you want to do involves changes to the interpreter, I will need to help make the necessary modifications there, and you will need to coordinate with me.

Here are suggestions for projects.


  1. Astarte allows you to override one definition with another. Sometimes, when writing a new definition of a function, you want to be able to use the prior definition that you are replacing.

    Add expression prior x, where x is an identifier. The idea is that this will occur inside a definition of function x that overrides a prior definition. prior x is the prior (overridden) definition.

    This involves creating a {copy} definition of x and referring to the copy definition. For example, if function f is written

        Define f(?x) =
          Let y = (prior f)(x).
          If y == 0 then x else y %If
        %Define
    
    then this definition is processed as if it were written as follows.
        Team
          Define{copy} oldf = f.
          Define f(?x) =
            Let y = oldf(x).
            If y == 0 then x else y %If
          %Define
        %Team
    
    Of course, you cannot use the fixed name oldf, but must choose a new name. The name should involve the name of the current package, the name f and a number in case several of these occurs in a single package.

    Modifications will involve the following files.

    Documentation files of interest include the following.


  2. Provide expression After a do b %After. The idea is to perform statement b after doing expression a, but to have the same value as expression a. Statement b should be performed even if evaluation of a fails. After a do b %After should behave like the following.

       Try catchingTermination catchingEachThread
         Let res = a.
       then
         b 
         res
       else
         b
         PropagateFailure.
       %Try
    
    This says to evaluate a. If evaluation of a succeeds then the then part is executed. Statement b is performed and the result is res, the value of a. But if evaluation of a fails then the else part is executed. Statement b is performed, and the failure is propagated outwards.

    Modifications will involve the following files.

    Documentation files of interest include the following.


  3. A class (object-oriented programming) can be thought of as a collection of variables, constants and functions. It is common practice to define a function that gets the value of a given variable and another that sets the value of the variable. For example, if your class has a variable called amount, then you might write a function SetAmount(x) that sets amount to x, and another function getAmount that returns the value of amount. Writing those functions is just busywork, and can easily be automated.

    Add options (modes) {set} and {get} to var declarations so that a var declaration in a class that has the get option will cause a get method to be defined automatically, and a var declaration with the set option will cause a set method to be defined automatically. For example, declaration

        Var{get,set} size holding Integer.
    
    would cause the following definitions to be made.
        Define my-getSize = \my-size.
        Define My-SetSize ?x. = 
          Assign my-size := x.
        %Define
    
    In Astarte, notation my-getSize stands for getSize(self) (or getSize(?self) in pattern match context), my-size stands for size(self), etc. Also, my-size := x abbreviates (my-size, x). So, in less abbreviated form, the definitions would be as follows.
        Define getSize(?self) = \(size self).
        Define SetSize ?self, ?x. = 
          Assign (size self, x).
        %Define
    

    This involves the following modifications.

    Documentation pages that will be of interest include the following.


  4. Sometimes a program needs to manage resources. For example, if an object gains control of some resource (such as a semaphore) it will later want to release control of the resource. Sometimes an object wants to hold a resource for as long as the object is alive. The object releases the resource only when the object is destroyed. The problem is how an object realizes that it is about to be destroyed.

    Astarte currently has a simple resource management facility called finalizer that allows you to attach a function to a box so that, when the box is recovered by the garbage collector, the function is automatically run. The function is supposed to release resources. But finalizer can only attach a function to a box, not to a more general object.

    Make it possible to define a finalizer in a class, so that attaching the finalizer to a box related to an object is automatic. Whenever a new object is created, a finalizer for it is created as well. The finalizer should be attached to a box that is part of the object. Every object contains all of the characteristics of type Object, so a box stored there will do the job.

    Add a reasonable syntax indicating that a particular function is a finalizer for the class. Think carefully about the semantics of this feature. For example, should the finalizer work for all objects in subclasses of C, or just for objects that belong only to class C?

    This will involve the following modifications.

    Documentation of interest includes the following.


  5. Astarte supports definition by cases. (See defining values by cases and defining functions by cases.) Loop and Choose expressions allow a case to have subcases. (See subcases for choose expressions.), but currently definitions by cases do not allow subcases. Modify the compiler to allow subcases in definition by cases. For example,

        Define f by
          ...
          case f(p) when c
            Subcases
              case f(q) = a
              case f(r) = b
              ...
            %Subcases
          ...
        %Define
    
    indicates that, when this case is selected, further cases should be tried in the subcases construct.

    This will involve modifications to the following parts of the compiler.


  6. A copy statement has the form

         Let y = x.
    
    where y is defined to be the same as x. It is common to get copy statements, particularly after the early stages of semantic processing of a definition, even when processing a definition that has no obvious copy statements. There is usually no need for them. Statement Let y = x. can typically be eliminated by simply using x whereever y occurs. That improves efficiency.

    Add a semantic processing stage that eliminates copy definitions. For example,

        Let x = 0.
        Let y = x.
        Let z = x + y.
    
    is replaced by
        Let x = 0.
        Let z = x + x.
    
    or possibly
        Let x = 0.
        ()
        Let z = x + x.
    
    where the copy definition has been replaced by a "do nothing" operation (). (The do-nothing operation will not have any code generated for it, so it is not a problem.) Notice that all occurrences of y are replaced by x. You should only eliminate a definition Let y = x if there are no relets to either x or y, so that neither can be changed. You need to check for that.

    This does not involve any syntactic changes, but involves adding a phase to the semantic processing. Perform this phase just before the second phase of identifier processing. See Compiler.txt. Files of interest include the following.


  7. Allow a package to contain an embedded package, where things can be hidden. This involves lengthening the names of packages to include subpackage names, as well as some syntactic and semantic processing to handle the embedded packages. For example,

        Package Test
          ...
          Package Embed
             ...
          %Package
          ...
        %Package
    
    would use a private name test.embed.x for a private thing called x inside the Embed package. Names like this are already in use, so it should not be to hard just to change the name. It will also be necessary to update which names are visible. Inside Embed, both Test and Test.Embed are visible. Outside the embedded package, only Test is visible.

    This will involve modification to or use of the following files.

    Packages are described in packages and the package hierarchy.


  8. (This is probably the most involved of the suggestions.) Astarte supports inline expansions (expand declarations) and pattern matching (pattern declarations). Pattern matching is handled by performing inline expansions.

    Currently pattern and expand declarations are only allowed at the outer level. They cannot be embedded inside other declarations. Modify the compiler to allow pattern and expand declarations to be local to a function definition. For example, it should be possible to write

        Define f(?x) =
          Expand g(?y) => x + y.
          ...
          Let z = g(n).
          ...
        %Define
    
    which is translated to
        Define f(?x) =
          ...
          Let z = x + n.
          ...
        %Define
    
    Information about pattern and expand definitions is stored in a table. This modification involves adding pattern and expand declarations to a local table for each declaration, and inspecting that local table when looking for a pattern or expand definition. The semantic processing is fairly complex, and the expression tree needs to be scanned more than once. Each time, during scanning, local tables need to be regenerated based on what is in the expression. The pattern and expand definitions will need to be stored explicitly in the EXPR data structure.

    This involves changes in expression processing (probably in several modules, but especially in src/exprs/t_exprutil.c) and in pattern match and expand substitution (src/patmatch/t_translt.c). Local tables are handled in src/tables/src/t_loctbl.c.


Navigating around the compiler

The compiler and interpreter are available as source code. Get file astarte-0-11-3.tar.gz. Put it into a subdirectory in your account, and pull it apart:

  gunzip ast*.gz
  tar xvf ast*.tar
You should get directories called ast, htdoc, messages, src and test. Directory ast/ contains the library. Directory messages/ contains strings that the compiler and interpreter print, in a collection of files. Directory src/ contains the C source code. Directory test/ contains some test files. You can run them by going to directory test and doing a make. To rerun them, do make clean and make again.

After unpacking, edit the following files.

  1. src/misc/config.h: Define BASE_DIR to be the directory where you have put the source files.

  2. src/misc/options.h: Comment out the line #include "../misc/prod.h" and uncomment the line #include "../misc/debug.h".

  3. src/exec/Makefile: Define TOPDIR to be the directory where the source is put and define BINDIR to be the directory where you want the binaries put. Change the definition of CFLAGS to be
        CFLAGS = -g -W -Wshadow -Wcomment -Wformat -Wimplicit -Wreturn-type
    
After making these changes, build the compiler by setting your current directory to src/exec and performing command
    make

Modify your PATH variable to contain the directory where you have put the binaries. (You might, for example, choose to put them in a directory called bin that you create beside the src, ast, test, etc. directories.)

The source files should initially be read only. Change a file's permission to read/write if you need to edit it. When you are done you will be able to see which files you modified simply by seeing which ones have write permission.

Files Compiler.txt and Interpreter.txt describe the anatomy of the compiler and interpreter, respectively. You should be able to find the files that you need, and avoid reading things that are irrelevant or uninteresting to you.

Ask questions when you need to. I expect to get questions.

Reference counts

The compiler uses reference counts for memory management. For example, each EXPR node has a ref_cnt variable that tells how many pointers are pointing to it. You need to handle reference counts. Thre is a simple recipe for that.

  1. When you create a pointer p to an EXPR node, call bump_expr(p) to add 1 to the reference count of that node.

  2. When you lose a pointer p to an EXPR node for any reason at all, call drop_expr(p). The pointer might, for example, be a variable that is going out of scope. Function drop_expr decreases the reference count of a node and frees that node if the reference count goes to 0. Function drop_expr also knows to do recursive drops for all pointers contained in any node that is being destroyed.

  3. If you want to change the value of a pointer variable that points to an EXPR node, use SET_EXPR instead of an assignment. SET_EXPR(p,x) does the following.

         bump_expr(x);
         drop_expr(p);
         p = x;
    

Handling strings

Strings are kept in hash tables. That way, there is one permanent copy of each string that the compiler has seen. That makes it easy to put strings in data structures, without worrying about ever deallocating the strings.

You will probably want to use the identifier table, which contains strings that are identifiers. Function id_tb(s) takes any string s and returns a pointer to the official copy of s in the identifier table. (If there is not an entry yet, it creates one.) You can then use that pointer without worrying about whether it will ever change or be deallocated. Function id_tb is defined in file src/tables/t_strtbl.c.

If you create an EXPR node with the name of an identifier in it, then be sure that the identifier is the official copy, as returned by id_tb. All pointers in the str_tb array are official, so you do not need to call id_tb on them. The str_tb array is defined in src/standard/s_stdids.c.

Never change a string that is in the identifier table. They are constants. Do not even change them temporarily.