Abbreviated Description of Astarte Abstract Machine

Only some of the instructions are shown here. See full language for a more complete (but still not quite finished) description.

 

Architecture

The abstract machine maintains a collection of data structures that describe the program and its state. This section describes those data structures.

The abstract machine supports multiple threads. Some structures are global, shared by all threads, while others are local to a particular thread. Some data structures are local to a particular frame of the call stack, representing one function call.

Those whose names are shown in blue are generally more important, and those in dark red less important to know about.

Major Thread-Local Data Structures
(Each thread has its own copy of these)
Structure Description
Data stack

The data stack is a stack where data items are pushed and popped during computation. Most of the machine instructions get their inputs from the top of this stack, and push their results onto this stack. Parameters to functions are pushed here, and function results are left on this stack.

Type stack

Programs can produce complex types. The type is a stack where types are pushed and popped while building types and taking them apart.

Type store

This is an array where types are stored by index. (You get a type by specifying an index, such as 0 or 1.) Some instructions store types into this array, and others fetch types out of this array. The type store also keeps track of the index of the first unoccupied spot in the array, so an instruction can ask for a type to be put into this first unoccupied index.

This array is not used for long-term storage, but is for temporary storage during the construction of types. It allows you to build a type and use that type in several places. The values of type variables are typically stored here.

Call stack

The call stack holds information about functions that have called another function and are waiting for it to return. This is the usual run-time stack. Each node in the call stack remembers information such as an environment that can be restored when the function is resumed.

Frame-Local Data Structures
(Each frame of the call stack has its own copy of these)
Structure Description
Environment

A function can have local variables, and the bindings of those variables are held in the environment.

The environment is chain of arrays. The first array in the chain is the most local environment, storing bindings of variables in the most local scope. The next array in the chain is for the scope that statically encloses the most local scope. As you go deeper into the chain, you reach more and more outward in lexical scope.

The last array in this chain is called the static environment, which is a cache where globally defined items that are needed by a particular function are stored. Some types can also be stored here. When a function is created, it grabs some globally defined data items and types that it will need and stores them in the static environment so that it does not need to ask for them repeatedly from the global tables (which are more expensive to use than the static environment), or does not need to rebuild the types each time they are needed.

Each individual environment in chain is an array indexed from 0. So the data items are referred to as item 0, item 1, etc. Each identifier has an associated index. For example, identifier x might be found at index 1 in the environment. Instructions are available for fetching and storing items by index.

Type variable binding stack

Occassionally a thread needs to bind type variables. Each member of the type variable binding stack is a table giving bindings of type variables for this thread. The top table is the active one, and other tables in the stack can be restored to active status later.

Each table also holds a collection of constraints that indicate relationships among variables and species. The constraints limit the bindings that variables can take on. A constraint has the form A >= B, where A and B are (polymorphic) species.

Instruction counter

The instruction counter tells the address of the next instruction that the current frame will perform.

Name

The name indicates which function is running in this frame.

Global Data Structures
Structure Description
Polymorphic global definition table

This is a table where definitions are kept. Only definitions done by declarations are stored in this table. Each definition is a triple (name, type, how-to-compute) indicating a binding of a given name with a given polymorphic type, and a method of computing the item that is defined.

There can be several definitions for each name. There can even be several that have the same name and the same type. The bindings of a given name are kept as an ordered list. A binding is looked up by providing a (name, type) pair. The first binding for this name that matches the desired type is selected.

Monomorphic global definition table

The monomorphic global table is a cache for information found from the polymorphic global table. When a binding of a name with a specific type is found from the polymorphic global table, that binding is is stored, with its type, in the monomorphic global table. Later, when it is needed again with the same species, it is looked up in this table.

 

 

General Issues

There are a few issues that you need to be aware of in order to understand how the abstract machine works, and what its instructions do.

Important Issues
Numbers and Characters

Numbers are stored in three representations: as integers, as rational numbers (a quotient of two integers) and in floating point form. Numbers can have arbitrary precision. The precision of floating point numbers is controlled by box precision.

The integer representation is universal; all instructions that need numbers, of any kind, can understand integer representation. The rational representation can also be used where a real number is needed.

There are four standard numeric species. Species Natural (nonnegative integers) and Integer (all integers) use exactly the same representation for nonnegative integers. As a consequence, any instruction that works on an integer will also work on a natural number. In some places, the documentation indicates that a natural number is required. That just means that the number must be nonnegative.

Characters are integers from 0 to 255. (The Ascii alphabet is currently used.) Instructions that work on integers or produce integers in this range can be used for characters as well.

Numbers are stored internally in base 216. This is important to a few of the instrutions, where it is called the internal base.

Strings

A string is a list of characters, and has species List(Char). Any instruction that works on general list or lists of integers also works on strings. Any instruction that produces a list of integers in the required range for characters can be used to produce a string.

Tagged values

Some values are tagged with a species. They are indicated in the form (X:T) where X is the value and T is the species. These are represented as structures holding both the value and the species.

Operations that work on numbers can handle tagged numbers. Operations that require characters or lists of characters require that the characters not be tagged.

Some values are tagged with a nonnegative integer. They are indicated in the form (X:n). An item that is tagged with an integer tag can get another integer or species tag attached to it. For example, ((X:2):5) is acceptable.

Promises and lazy evaluation

A data item can be stored as a promise, indicating a deferred computation that will be done when its result is needed. Some instructions need to evaluate those promises, either fully or partially. As a result, evaluating an instruction might cause a side computation to be performed before it returns. In effect, instructions can do implicit calls to functions. So a short and simple instruction like an add instruction does not always complete quickly, and could perform an arbitrarily long side computation in order to evaluate the numbers to be added.

A promise remembers the state and other thread-local data structures that were in effect when the promise was created. When the promise is evaluated, those data structures are restored. In effect, creating a promise also creates a thread that will (if necessary) evaluate the promise. So the contents of nonshared boxes will be the same as they would be if the eager evaluation had been used. The values of shared boxes, however, are part of the global information. The contents that a promise sees for them can depend on when the promise is evaluated.

Computations can fail, and it is possible for a deferred computation to fail to produce a value. In that case, the fact that the computation failed is remembered, and each subsequent attempt to evaluate the promise will immediately lead to the same kind of failure.

Types

The abstract machine is partly typed, but partly untyped. Many of the instructions work in an untyped manner. But some of the instructions handle types explicitly, so that this abstract machine provides support for run-time type checking.

A type is called a species. A type constructor is called a parameterized species or a family.

See here for details about species in the abstract machine.

 

 

Program Format

Program formats

There are three formats in which you might see an abstract machine program.

  1. There is an assembler that converts a symbolic program into a binary program. The assembler format is the form recognized by the assembler. When you look at an instruction (using one of the links below) you will see the assembler form.

    The assembler is line oriented, so each instruction occupies one line. A % sign on a line indicates that the rest of that line is a comment.

  2. The assembler writes the program into a binary file. The byte-code format is the form of the program in that binary file.

    Each instruction in the byte code has a number. When the byte code format is described, instruction numbers are given by their names, such as LET_I, instead of by their numbers, such as 72. The values of the instruction codes are defined in file instruc.h. It is not a good idea to depend on them, though, since they can change when the machine version changes.

    The byte code form of an instruction is described on a separate page from the assembler form, and is accessible via a link from the description of the assembler form.

  3. When the interpreter reads in a byte code file, it stores the program into the program area. It makes some changes to the format, storing a program in internal format. This format is only of interest if you look at the interpreter source code.

Package heading

A package has an interface part and an implementation part. You can select a name for each part, or use the same name for both parts.

A package must begin with a heading that provides the name(s) of the package and the abstract machine version that it uses.

  1. In assembler format, the package heading has the following form

        .package name1 name2
        .version v
      
    name1 is the name of the interface package, and name2 is the name of the implementation package. If there is only one package, then name2 is omitted. v is a decimal integer that is the byte code version. For example, the first two lines might be
         .package Widget WidgetImp
         .version 41
      

  2. In the byte-code version, the heading has the form

        @(#)Astarte byte code version v0name10name20
      
    where v is the version (in decimal) and 0 indicates a null byte. If name2 is omitted, then there should be two null bytes in a row. So name2 is an empty string in that case.

The rest of the package

The remainder of the package, after the heading, consists of a sequence of declarations, as described in the declaration section below. The package must end on an @END-PACKAGE instruction. Any bytes after the @END-PACKAGE are ignored.

Numbers in the byte code

Some of the instructions contain integers as part of the instruction. Numbers in a program are represented in one of three forms: (1) one byte, (2) three bytes or (3) multiple bytes. Single byte integers are unsigned, from 0 to 255. Three byte integers can be either signed or unsigned, depending on the instruction, and can range from 0 to 2^24-1 (unsigned) or from -223 to 223-1 (signed). Multibyte integers can be arbitrarily large, and signed or unsigned.

In the assembler format, all numbers are written in decimal.

In the byte-code format, three byte numbers are stored low order byte first in biased form. To store n, the three byte integer is 0x800000 + n as an unsigned three byte integer.

 

 

Declarations

Labels and Names
Declaration Purpose
@LABEL n Make it possible to refer to what is defined in the next declaration by a numeric label.
@ID name Define a name for a data item
@NAME name Define a name that does not refer to a data item

Defining Constants
Declaration Purpose
@STRING "str" Define a string constant
@INT n Define an integer constant

Making Definitions and Performing Actions
Declaration Purpose
@EXECUTE Evaluate some instructions.
@DEFINE Define an named data item.

Miscellaneous
Declaration Purpose
@IMPORT path Import file path.
@BEGIN-IMPLEMENTATION End the interface part of the package and begin the implementation.

 

 

Instructions for Building and Taking Apart (Structured) Types

Building Types
Instruction Purpose
STD-SPECIES name Push a built-in primary species or primary parameterized species onto the type stack.
STD-SPECIES-VAR name Push a new variable that ranges over the genus or parameterized genus name.
FUNCTION-SPECIES Create a function species.
PAIR-SPECIES Create a cartesian product species.
LIST-SPECIES Create a list species.
FAM-MEM-SPECIES Apply a parameterized species as a species constructor.

Taking Types Apart
Instruction Purpose
UNPAIR-SPECIES Take apart a function, cartesian product or constructed type.
HEAD-SPECIES Get part of a function, cartesian product or constructed type.
TAIL-SPECIES Get part of a function, cartesian product or constructed type.
POP-SPECIES Pop the top thing off the type stack.
SWAP-SPECIES Swap the top two things on the type stack.

Using the Type Store and Static Environment
Instruction Purpose
STORE-SPECIES Pop the top thing from the type stack and store it at the next available place in the type store.
STORE-AND-LEAVE-SPECIES Store the top thing from the type stack into the next available place in the type store, but leave it on the type stack.
GET-SPECIES n Push the thing that was stored at index n in the type store onto the type stack.
GLOBAL-FETCH-SPECIES n Push the thing that was stored at index n in the current function's static environment onto the type stack.

 

 

Building the Static Environment

These instructions should only be used in parts where a static environment is being built.

Building the Static Environment
Instruction Purpose
GET-GLOBAL-ID L Get the binding of an identifier in the global definition table and store it into the static environment.
SPECIES-ONLY Pop the type stack and store the popped type into the static environment.

 

 

Computation

Data Stack Manipulation
Instruction Purpose
DUP Duplicate the top of the data stack.
POP Pop the data stack.
SWAP Swap the top two items on the top of the data stack.

Constants
Instruction Purpose
SMALL-INT n Push a very small (one byte) integer onto the data stack.
CONST L Push the constant defined at label L onto the data stack.
ZERO Push 0 onto the data stack. The number pushed has type Natural or Integer.

Numbers and Arithmetic
Instruction Purpose
ZERO? Test whether a number is zero.
PLUS Pop two numbers and push their sum.
MINUS Pop two numbers and push their difference.
NEGATE Take the negation of a number.
TIMES Pop two numbers and push their product.
INT-DIVIDE Perform an integer division, obtaining a quotient and a remainder.
STRING-TO-INTEGER Convert a decimal number as a string to an integer.

Conversion to string
Instruction Purpose
TO-STRING Convert a number, box or function to a string.
TO-STRING-FORMATTED Convert a number, box or function to a string.

Comparisons
Instruction Purpose
COMPARE Compare two numbers, telling whether one is <, > or = to the other.
EQ Determine whether two numbers are equal.
NE Determine whether two numbers are not equal.
LE Determine whether one number is <= another.
LT Determine whether one number is < another.
GE Determine whether one number is >= another.
GT Determine whether one number is > another.

Booleans
Instruction Purpose
FALSE Push false onto the data stack.
TRUE Push true onto the data stack.
NOT Take the logical negation of the top of the data stack.
And/Or See AND-SKIP and OR-SKIP.

Lists, Strings and Ordered Pairs
Instruction Purpose
NIL Push [] onto the data stack.
NIL? Test whether the top of the data stack is [].
PAIR Pop y and x from the type stack and push an item that is either x::y or (x,y). Its type can be either a list or a product type.
HEAD Get the head of a list or the left-hand member of an ordered pair.
TAIL Get the tail of a list or the right-hand member of an ordered pair.
SPLIT Get the head and tail of a list, or the left-hand and right-hand members of an ordered pair.
LENGTH Compute the length of a list.
REVERSE Compute the reversal of a list.
APPEND Compute the concatenation of two lists.
SUBSCRIPT Select a member from a list by position.
SUBLIST Get a sublist of a list, given by a start position and length.
UPTO Pop two integers n and m and push the list [m,m+1,...,n].
DOWNTO Pop two integers n and m and push the list [m,m-1,...,n].

Tagging and Untagging Items
Instruction Purpose
ATTACH-ITAG n Tag an item with an integer tag.
TEST-ITAG n Check for a certain integer tag on a tagged item.
REMOVE-ITAG Remove the tag from an item that is tagged with an integer tag.

Branches and local labels
Instruction Purpose
LOC-LABEL n Label the next instruction by local label n.
GOTO L Jump to the instruction just after local label L.
GOTO-IF-FALSE L Pop the data stack, and jump to the instruction just after local label L if the popped value is false or 0 or []. Only a Natural or Integer 0 is recognized as 0.
GOTO-IF-FALSE-LEAVE L This is similar to GOTO-IF-FALSE L, but it leaves the value that was just examined on the stack.
AND-SKIP L Look at the top of the data stack. If it is false, leave that value on the data stack and jump to local label L. If the stack top is true, then pop the stack and do not do the jump.
OR-SKIP L Look at the top of the data stack. If it is true, leave that value on the data stack and jump to local label L. If the stack top is false, then pop the stack and do not do the jump.

Functions and Function Application
Instruction Purpose
FUNCTION L Build a new function.
APPLY Apply a function to a parameter.
TAIL-APPLY Apply a function to a parameter. But pop the calling function from the run-time stack before pushing information about this new function.
RETURN Return from a function application.

Lazy Evaluation
Instruction Purpose
LAZY L Build a promise to evaluate an expression later.
TOP-FORCE Force evaluation of an item until it is at least partially known.

Using the Environment
Instruction Purpose
FETCH n Fetch a binding at index n in the closest local environment.
LONG-FETCH n k Fetch a binding at index n in the frame that is k frames offset from the closest frame in the local environment.
BIND n (name) Pop the data stack and store the popped value at index n in the closest environment frame.
BIND-AND-LEAVE n (name) Same as a STORE, but do not pop the stack. Leave the item that was stored in the environment on the stack.
DEFINE L n (name) Similar to a let, but create a promise, and store the promise in the environment.
EXIT-SCOPE n Remove all bindings after the first n indices from the closest environment frame.

Using the Static Environment and Fetching Globals
Instruction Purpose
FETCH-GLOBAL n Fetch a binding from the static environment of the current thread.
DYNAMIC-FETCH-GLOBAL Fetch a definition directly from the global definition table. The name is given by a label where the name is defined.

System Calls and Related Functions
Instruction Purpose
OPEN-OUTFILE Create a SysOutfile object that is tied to a file.
CLOSE-FILE Close a SysOutfile.
PRINT Print a list of strings on a file.
FLUSH-FILE Flush the output buffer of a SysOutfile.
OUTFILE-TO-STRING Convert a SysOutfile value to a string that describes it, by describing the file to which it is tied.
READ-FILE Get the content of a file.

Miscellaneous
Instruction Purpose
LINE n The following section of code is compiled from source line n.
NAME L Tell the interpreter what to call this section of code.
SNAME Tell the interpreter what to call this section of code.
PRIM-TRACE Set the abstract machine to trace its actions.