Abbreviated Astarte abstract machine description

Only some of the instructions are shown here. See full language for a more complete (but still not quite finished) description.

The abstract machine maintains a collection of data structures that describe the program and its state. This section describes those data structures.

The abstract machine supports multiple threads. Some structures are global, shared by all threads, while others are local to a particular thread. Some data structures are local to a particular frame of the call stack, representing one function call.

Those whose names are shown in blue are generally more important, and those in dark red less important to know about.

Major Thread-Local Data Structures (Each thread has its own copy of these)
Structure	Description
Data stack	The data stack is a stack where data items are pushed and popped during computation. Most of the machine instructions get their inputs from the top of this stack, and push their results onto this stack. Parameters to functions are pushed here, and function results are left on this stack.
Type stack	Programs can produce complex types. The type is a stack where types are pushed and popped while building types and taking them apart.
Type store	This is an array where types are stored by index. (You get a type by specifying an index, such as 0 or 1.) Some instructions store types into this array, and others fetch types out of this array. The type store also keeps track of the index of the first unoccupied spot in the array, so an instruction can ask for a type to be put into this first unoccupied index. This array is not used for long-term storage, but is for temporary storage during the construction of types. It allows you to build a type and use that type in several places. The values of type variables are typically stored here.
Call stack	The call stack holds information about functions that have called another function and are waiting for it to return. This is the usual run-time stack. Each node in the call stack remembers information such as an environment that can be restored when the function is resumed.

Frame-Local Data Structures (Each frame of the call stack has its own copy of these)
Structure	Description
Environment	A function can have local variables, and the bindings of those variables are held in the environment. The environment is chain of arrays. The first array in the chain is the most local environment, storing bindings of variables in the most local scope. The next array in the chain is for the scope that statically encloses the most local scope. As you go deeper into the chain, you reach more and more outward in lexical scope. The last array in this chain is called the static environment, which is a cache where globally defined items that are needed by a particular function are stored. Some types can also be stored here. When a function is created, it grabs some globally defined data items and types that it will need and stores them in the static environment so that it does not need to ask for them repeatedly from the global tables (which are more expensive to use than the static environment), or does not need to rebuild the types each time they are needed. Each individual environment in chain is an array indexed from 0. So the data items are referred to as item 0, item 1, etc. Each identifier has an associated index. For example, identifier x might be found at index 1 in the environment. Instructions are available for fetching and storing items by index.
Type variable binding stack	Occassionally a thread needs to bind type variables. Each member of the type variable binding stack is a table giving bindings of type variables for this thread. The top table is the active one, and other tables in the stack can be restored to active status later. Each table also holds a collection of constraints that indicate relationships among variables and species. The constraints limit the bindings that variables can take on. A constraint has the form A `>=` B, where A and B are (polymorphic) species.
Instruction counter	The instruction counter tells the address of the next instruction that the current frame will perform.
Name	The name indicates which function is running in this frame.

Global Data Structures

Structure

Description

Polymorphic global definition table

This is a table where definitions are kept. Only definitions done by declarations are stored in this table. Each definition is a triple (name, type, how-to-compute) indicating a binding of a given name with a given polymorphic type, and a method of computing the item that is defined.

There can be several definitions for each name. There can even be several that have the same name and the same type. The bindings of a given name are kept as an ordered list. A binding is looked up by providing a (name, type) pair. The first binding for this name that matches the desired type is selected.

Monomorphic global definition table

The monomorphic global table is a cache for information found from the polymorphic global table. When a binding of a name with a specific type is found from the polymorphic global table, that binding is is stored, with its type, in the monomorphic global table. Later, when it is needed again with the same species, it is looked up in this table.

There are a few issues that you need to be aware of in order to understand how the abstract machine works, and what its instructions do.

Important Issues
Numbers and Characters	Numbers are stored in three representations: as integers, as rational numbers (a quotient of two integers) and in floating point form. Numbers can have arbitrary precision. The precision of floating point numbers is controlled by box precision. The integer representation is universal; all instructions that need numbers, of any kind, can understand integer representation. The rational representation can also be used where a real number is needed. There are four standard numeric species. Species Natural (nonnegative integers) and Integer (all integers) use exactly the same representation for nonnegative integers. As a consequence, any instruction that works on an integer will also work on a natural number. In some places, the documentation indicates that a natural number is required. That just means that the number must be nonnegative. Characters are integers from 0 to 255. (The Ascii alphabet is currently used.) Instructions that work on integers or produce integers in this range can be used for characters as well. Numbers are stored internally in base 2¹⁶. This is important to a few of the instrutions, where it is called the internal base.
Strings	A string is a list of characters, and has species List(Char). Any instruction that works on general list or lists of integers also works on strings. Any instruction that produces a list of integers in the required range for characters can be used to produce a string.
Tagged values	Some values are tagged with a species. They are indicated in the form (X:T) where X is the value and T is the species. These are represented as structures holding both the value and the species. Operations that work on numbers can handle tagged numbers. Operations that require characters or lists of characters require that the characters not be tagged. Some values are tagged with a nonnegative integer. They are indicated in the form (X:n). An item that is tagged with an integer tag can get another integer or species tag attached to it. For example, ((X:2):5) is acceptable.
Promises and lazy evaluation	A data item can be stored as a promise, indicating a deferred computation that will be done when its result is needed. Some instructions need to evaluate those promises, either fully or partially. As a result, evaluating an instruction might cause a side computation to be performed before it returns. In effect, instructions can do implicit calls to functions. So a short and simple instruction like an add instruction does not always complete quickly, and could perform an arbitrarily long side computation in order to evaluate the numbers to be added. A promise remembers the state and other thread-local data structures that were in effect when the promise was created. When the promise is evaluated, those data structures are restored. In effect, creating a promise also creates a thread that will (if necessary) evaluate the promise. So the contents of nonshared boxes will be the same as they would be if the eager evaluation had been used. The values of shared boxes, however, are part of the global information. The contents that a promise sees for them can depend on when the promise is evaluated. Computations can fail, and it is possible for a deferred computation to fail to produce a value. In that case, the fact that the computation failed is remembered, and each subsequent attempt to evaluate the promise will immediately lead to the same kind of failure.
Types	The abstract machine is partly *typed, but partly untyped. Many of the instructions work in an untyped manner. But some of the instructions handle types explicitly, so that this abstract machine provides support for run-time type checking. A type is called a species. A type constructor is called a parameterized species* or a family. See here for details about species in the abstract machine.

Program formats

Package heading

A package has an interface part and an implementation part. You can select a name for each part, or use the same name for both parts.

A package must begin with a heading that provides the name(s) of the package and the abstract machine version that it uses.

The rest of the package

The remainder of the package, after the heading, consists of a sequence of declarations, as described in the declaration section below. The package must end on an @END-PACKAGE instruction. Any bytes after the @END-PACKAGE are ignored.

Numbers in the byte code

Some of the instructions contain integers as part of the instruction. Numbers in a program are represented in one of three forms: (1) one byte, (2) three bytes or (3) multiple bytes. Single byte integers are unsigned, from 0 to 255. Three byte integers can be either signed or unsigned, depending on the instruction, and can range from 0 to 2^24-1 (unsigned) or from -2²³ to 2²³-1 (signed). Multibyte integers can be arbitrarily large, and signed or unsigned.

In the byte-code format, three byte numbers are stored low order byte first in biased form. To store n, the three byte integer is 0x800000 + n as an unsigned three byte integer.

Labels and Names
Declaration	Purpose
@LABEL n	Make it possible to refer to what is defined in the next declaration by a numeric label.
@ID name	Define a name for a data item
@NAME name	Define a name that does not refer to a data item

Defining Constants
Declaration	Purpose
@STRING "str"	Define a string constant
@INT n	Define an integer constant

Making Definitions and Performing Actions
Declaration	Purpose
@EXECUTE	Evaluate some instructions.
@DEFINE	Define an named data item.

Miscellaneous
Declaration	Purpose
@IMPORT path	Import file path.
@BEGIN-IMPLEMENTATION	End the interface part of the package and begin the implementation.

Building Types
Instruction	Purpose
STD-SPECIES name	Push a built-in primary species or primary parameterized species onto the type stack.
STD-SPECIES-VAR name	Push a new variable that ranges over the genus or parameterized genus name.
FUNCTION-SPECIES	Create a function species.
PAIR-SPECIES	Create a cartesian product species.
LIST-SPECIES	Create a list species.
FAM-MEM-SPECIES	Apply a parameterized species as a species constructor.

Taking Types Apart
Instruction	Purpose
UNPAIR-SPECIES	Take apart a function, cartesian product or constructed type.
HEAD-SPECIES	Get part of a function, cartesian product or constructed type.
TAIL-SPECIES	Get part of a function, cartesian product or constructed type.
POP-SPECIES	Pop the top thing off the type stack.
SWAP-SPECIES	Swap the top two things on the type stack.

Using the Type Store and Static Environment
Instruction	Purpose
STORE-SPECIES	Pop the top thing from the type stack and store it at the next available place in the type store.
STORE-AND-LEAVE-SPECIES	Store the top thing from the type stack into the next available place in the type store, but leave it on the type stack.
GET-SPECIES n	Push the thing that was stored at index n in the type store onto the type stack.
GLOBAL-FETCH-SPECIES n	Push the thing that was stored at index n in the current function's static environment onto the type stack.

These instructions should only be used in parts where a static environment is being built.

Building the Static Environment
Instruction	Purpose
GET-GLOBAL-ID L	Get the binding of an identifier in the global definition table and store it into the static environment.
SPECIES-ONLY	Pop the type stack and store the popped type into the static environment.

Data Stack Manipulation
Instruction	Purpose
DUP	Duplicate the top of the data stack.
POP	Pop the data stack.
SWAP	Swap the top two items on the top of the data stack.

Constants
Instruction	Purpose
SMALL-INT n	Push a very small (one byte) integer onto the data stack.
CONST L	Push the constant defined at label L onto the data stack.
ZERO	Push 0 onto the data stack. The number pushed has type Natural or Integer.

Numbers and Arithmetic
Instruction	Purpose
ZERO?	Test whether a number is zero.
PLUS	Pop two numbers and push their sum.
MINUS	Pop two numbers and push their difference.
NEGATE	Take the negation of a number.
TIMES	Pop two numbers and push their product.
INT-DIVIDE	Perform an integer division, obtaining a quotient and a remainder.
STRING-TO-INTEGER	Convert a decimal number as a string to an integer.

Conversion to string
Instruction	Purpose
TO-STRING	Convert a number, box or function to a string.
TO-STRING-FORMATTED	Convert a number, box or function to a string.

Comparisons
Instruction	Purpose
COMPARE	Compare two numbers, telling whether one is <, > or = to the other.
EQ	Determine whether two numbers are equal.
NE	Determine whether two numbers are not equal.
LE	Determine whether one number is <= another.
LT	Determine whether one number is < another.
GE	Determine whether one number is >= another.
GT	Determine whether one number is > another.

Booleans
Instruction	Purpose
FALSE	Push false onto the data stack.
TRUE	Push true onto the data stack.
NOT	Take the logical negation of the top of the data stack.
And/Or	See AND-SKIP and OR-SKIP.

Lists, Strings and Ordered Pairs
Instruction	Purpose
NIL	Push [] onto the data stack.
NIL?	Test whether the top of the data stack is [].
PAIR	Pop y and x from the type stack and push an item that is either x::y or (x,y). Its type can be either a list or a product type.
HEAD	Get the head of a list or the left-hand member of an ordered pair.
TAIL	Get the tail of a list or the right-hand member of an ordered pair.
SPLIT	Get the head and tail of a list, or the left-hand and right-hand members of an ordered pair.
LENGTH	Compute the length of a list.
REVERSE	Compute the reversal of a list.
APPEND	Compute the concatenation of two lists.
SUBSCRIPT	Select a member from a list by position.
SUBLIST	Get a sublist of a list, given by a start position and length.
UPTO	Pop two integers n and m and push the list [m,m+1,...,n].
DOWNTO	Pop two integers n and m and push the list [m,m-1,...,n].

Tagging and Untagging Items
Instruction	Purpose
ATTACH-ITAG n	Tag an item with an integer tag.
TEST-ITAG n	Check for a certain integer tag on a tagged item.
REMOVE-ITAG	Remove the tag from an item that is tagged with an integer tag.

Branches and local labels
Instruction	Purpose
LOC-LABEL n	Label the next instruction by local label n.
GOTO L	Jump to the instruction just after local label L.
GOTO-IF-FALSE L	Pop the data stack, and jump to the instruction just after local label L if the popped value is false or 0 or []. Only a Natural or Integer 0 is recognized as 0.
GOTO-IF-FALSE-LEAVE L	This is similar to GOTO-IF-FALSE L, but it leaves the value that was just examined on the stack.
AND-SKIP L	Look at the top of the data stack. If it is false, leave that value on the data stack and jump to local label L. If the stack top is true, then pop the stack and do not do the jump.
OR-SKIP L	Look at the top of the data stack. If it is true, leave that value on the data stack and jump to local label L. If the stack top is false, then pop the stack and do not do the jump.

Functions and Function Application
Instruction	Purpose
FUNCTION L	Build a new function.
APPLY	Apply a function to a parameter.
TAIL-APPLY	Apply a function to a parameter. But pop the calling function from the run-time stack before pushing information about this new function.
RETURN	Return from a function application.

Lazy Evaluation
Instruction	Purpose
LAZY L	Build a promise to evaluate an expression later.
TOP-FORCE	Force evaluation of an item until it is at least partially known.

Using the Environment
Instruction	Purpose
FETCH n	Fetch a binding at index n in the closest local environment.
LONG-FETCH n k	Fetch a binding at index n in the frame that is k frames offset from the closest frame in the local environment.
BIND n (name)	Pop the data stack and store the popped value at index n in the closest environment frame.
BIND-AND-LEAVE n (name)	Same as a STORE, but do not pop the stack. Leave the item that was stored in the environment on the stack.
DEFINE L n (name)	Similar to a let, but create a promise, and store the promise in the environment.
EXIT-SCOPE n	Remove all bindings after the first n indices from the closest environment frame.

Using the Static Environment and Fetching Globals
Instruction	Purpose
FETCH-GLOBAL n	Fetch a binding from the static environment of the current thread.
DYNAMIC-FETCH-GLOBAL	Fetch a definition directly from the global definition table. The name is given by a label where the name is defined.

System Calls and Related Functions
Instruction	Purpose
OPEN-OUTFILE	Create a SysOutfile object that is tied to a file.
CLOSE-FILE	Close a SysOutfile.
PRINT	Print a list of strings on a file.
FLUSH-FILE	Flush the output buffer of a SysOutfile.
OUTFILE-TO-STRING	Convert a SysOutfile value to a string that describes it, by describing the file to which it is tied.
READ-FILE	Get the content of a file.

Miscellaneous
Instruction	Purpose
LINE n	The following section of code is compiled from source line n.
NAME L	Tell the interpreter what to call this section of code.
SNAME	Tell the interpreter what to call this section of code.
PRIM-TRACE	Set the abstract machine to trace its actions.

Abbreviated Description of Astarte Abstract Machine

Program formats

Package heading

The rest of the package

Numbers in the byte code