2.3. Alphabets, Symbols and Strings


Alphabets and symbols

We write using an alphabet such as the English alphabet of 26 letters. We also communicate with a computer using an alphabet, such as the ASCII or Unicode alphabet. The members of an alphabet are called symbols.

For this course, we want to consider a variety of alphabets. So we allow the notion of a symbol to be quite general. In fact,

An alphabet is a finite, nonempty set.
No matter what alphabet we are working with at any time, we call its members symbols. It is reasonable that we must have at least one symbol (so an alphabet is not allowed to be empty), and we do not have infinitely many symbols.


Strings

Strings are defined as follows.

A string over alphabet A is a finite sequence of zero or more symbols that are members of A.

For example, suppose that our alphabet is A = {a,b,c}. Then abaa is a string over alphabet A. (It is convenient not to write quotes around strings.)

It is important to notice that

strings can only be finitely long.
We will use strings as inputs and outputs of algorithms. An infinitely long string presents a problem.
If it is the input, the algorithm cannot look at the entire input.

If it is the output, the algorithm can never finish writing down the output.

Empty strings

A string can have no symbols in it. To make an empty string easier to talk about, we use symbol ε for the empty string.

Do not confuse the empty string with the empty set. One is a string, the other is a set.

Technically, we always talk about a string over a given alphabet. But in cases where the alphabet is clear from context, or where the alphabet is not important, we just talk about a string.


Repetition

If s is a string and n is a nonnegative integer, then

sn is a string consisting of n copies of s in a row.

For example, (ab)3 = ababab.

String s0 is defined to be the empty string, for every string s.


Sets of strings

If A is an alphabet, then

A* is the set of all strings over alphabet A.

For example, {a,b}* = {ε, a, b, aa, ab, ba, bb, aaa, …}.