Prev Next

Data and Data Representation [Chapter 5]

A program needs to store information. Values are pieces of information that are fixed, and have no characteristics that vary over time. Nonvalues have time-varying characteristics.

Simple values have no visible internal structure. They are atomic. An integer is an example of a simple value.

Complex values have visible internal structure. An ordered pair (3,4) is an example of a complex value. It has a left-hand member and a right-hand member.

Symbols

A symbol is a simple value that is completely characterized by its name. For example, kangaroo might be a symbol.

Tuples

A tuple is an ordered pair, triple, etc.

  (true,"abc")
  (1,2,3)
  (56,true,false,91)
Tuples have the following characteristics.

  1. Different types of things can usually be mixed.
  2. The size of a tuple is usually known at compile time.

Lists

A list is a sequence of zero or more things, usually of the same type. I will use [...] to indicate a list. Examples:

  []
  [true]
  [2,3,4]

Representing lists

Sequential representation

You can represent a list by putting things in successive memory locations. The implicit sequencing in the memory is used to implement the sequencing of a list.

If you represent a tuple, you know how large it is. But a list needs to store its size within its representation. Somehow, you need to know where the list ends.

End marker

Initial size

Cost of operations
Efficient Inefficient
Get the k-th member Add a new member to the beginning
Compute the length (w/ initial size) Concatenate two lists

Linked representation

Cost of operations
Efficient Inefficient
Add a new member to the beginning Get the k-th member
  Compute the length
Concatenate two lists (first one short) Concatenate two lists (first one long)

Tables and records

Some programming languages provide tables with keys and values as primitive things. Python calls them dictionaries. Pascal calls them records. Usually, records require that the keys be names, and the compiler knows just which names will be present.

Trees

A tree has a root and zero or more subtrees.

The subtrees of this tree are

To represent a tree, it suffices to represent a pair consisting of the root label and a list of subtrees. You can use any representation for the pair, and any representation for the list of subtrees.

Representing lists as trees

If you have a way to represent trees, you can use that as yet another representation of lists, in more than one way.

Linked representation as a tree

Preorder traversal

Tree

can be thought of as a representation of list [3,6,4,7,9,5], the values obtained by traversing the tree in preorder. With this general idea, you can devise a way to represent lists so that concatenation takes constant time.

Tagged items

Often, you do not know at compile time what type of value you will be working with at run time. To handle that, you tag each item so that the run-time system can determine, from the tag, what kind of data is present.

Mutability

You can think of a variable as part of the control mechanism in a program. But it is often better to think of a variable as a data item.

Think of a variable as a box. You can pass the box around, put it inside another box, etc. You can also put a new thing inside the box. The box behaves like a data item that has a mutable characteristic: its content.

A variable is a first class item if it can be passed to a function, returned from a function, stored in another variable, etc.

Pointers

A pointer is a memory address. Since it refers to a place in memory, and that place can have its content changed, a pointer is similar to a box. In fact, boxes are usually representated as pointers.

Some languages (C++, for example) provide pointers as basic things in the language. Other languages (Java, for example) use pointer in their implementations, but do not provide them as things in the language itself. In Java, a variable is not a first class item.


Prev Next