Characters and Strings


Characters

It is important to be able to communicate with a program. A simple way to do that is for you to type keys on a keyboard, and for the program to write information on the screen.

Letters, digits, punctuation symbols such as periods and commas, and other special characters like $ and @ are called characters. Programs treat characters as values, just like numbers and boolean values. To write a character in a program, enclose it in single-quote marks. For example, 'a' is the lower-case a character, and '3' is a digit. The single-quote key is typically next to the enter key, with the double-quote key.

There is a space character (what you get when you type the space bar). Write it as a space between single-quote marks, as ' '.

There is also a newline character, which is used to indicate the end of a line. When you are typing text, you end a line by typing the enter key. But to talk about the newline character as a character value in a program, write '\n'. It is important to type it exactly like that. Use the backslash key, not the forward slash key. The backslash key is typically just above the enter key. The newline character is '\n', not '/n'.

You can give a name to a character, just like any other value. For example,

  Let space = ' '.
says that name space refers to the space character.

You can define functions that work on characters. For example, let's define a function that takes a characters and tells you whether that character is a vowel. It produces a boolean value. For example, isVowel('a') is true, but isVowel('b') is false.

  isVowel(c) = (c == 'a') or (c == 'e') or (c == 'i') 
               or (c == 'o') or (c == 'u') or (c == 'y')


Strings

A string is a sequence of characters. You write a string in double-quote marks instead of single-quote marks. For example, "rodeo" is a string of five characters.

You use words, phrases and sentences every day to communicate. Strings play a similar role in computer programs. They are tremendously useful for communicating between a program and a user, and are also used for storing information in a program and for communicating between different parts of a program.

Strings can contain any characters at all. For example, they can contains spaces, punctuation marks, special characters and even the newline character. Some examples of strings are "The fox and the hound", "Well, that will be $10." and "Two\n lines\n". Be sure to write double-quote marks around the string. They are required.

A string can contain any number of characters, even one or zero. String "", with no characters between the quote marks, is a string of no characters called the empty string. People do not use the empty string in everyday life, but it turns out to be very useful in computer programs.

String "A" is a string that only has one character in it. It looks like it should be the same as 'A', the upper-case A character. But in both Cinnameg and Java, "A" and 'A' are considered different values. (The first is a string that happens to contain only one character, while the second is a character, not a string.)

A string is a value. You can give it a name. For example,

  Let complaint = "I am hungry".
gives a name to a string.


Operations on strings

You are familiar with operations such as + and - on numbers, and you have seen operations on boolean values. There are also operations and functions that work on characters and strings, and an expression can have a string or character as its value. The ones described here work in Cinnameg.

Operation Description
s ++ t

Operator ++ concatenates strings, which means it glues them together into one string. For example, "abc" ++ "def" = "abcdef". No additional characters are added. For example, "the" ++ "fox" = "thefox". If you want a space, be sure to include it. For example, "the " ++ "fox" = "the fox".

You can concatenate several strings together. Expression "the" ++ " three " ++ "bears" has value "the three bears".

length(s)

Function length(s) gives you the length of string s. For example, length("happy") = 5 and length("for sure") = 8.

s _ n

You can get the first, second, third, etc. character from a string. Expression s_n produces the n-th character in string s. For example, "rabbit"_1 = 'r' and "rabbit"_5 = 'i'. Notice that the answer is a character.

(If there is no n-th character in string s, then computing s_n is an error.)

s _* [a,...,b]

Cinnameg uses a special notation to get a substring of a string. Expression s_*[a,...,b], produces the substring of string s from the a-th character to the b-th character. For example, "rabbit"_*[3,...,5] = "bbi", the third through the fifth characters of "rabbit".

(If there are not enough characters in the string to get the requested substring, nonexistent characters are just ignored. For example, "abcdef" _* [3,...,10] = "cdef", the third to sixth characters of the string, since that is all that there is.)

tail(s)

The tail of a string is the substring that contains all but the first character. For example, tail("rabbit") = "abbit".

(If s is an empty string then computing tail(s) is an error.)

$(x)

There is a special function with the peculiar name $ that converts a value to a string. For example, $(31) = "31", $(845) = "845", $(true) = "true" and $(false) = "false". When c is a character, $(c) is the string that contains only character c. For example, $('r') = "r".


Defining new functions on strings

Suppose that you want a function lastChar(s) that produces the last character in string s. Just define it in a way similar to how you would define a function that works on numbers. If string s has 5 characters, then the last character is s_5. In general,

  lastChar(s) = s _ length(s)


Case study: Checking whether a given character occurs in a given string

Suppose that you want to tell whether a character occurs in a string. For example, occurs('a', "rabbit") should be true, since "rabbit" contains character 'a'. Just think about cases. A good starting point is to think about questions where the answer is obviously no or obviously yes.

First, no character occurs in an empty string. So the first case is

  case occurs(c, s) = false   when s == ""

Second, character c certainly does occur in string s of c is the first character in s. For example, occurs('r', "rabbit") is true. In general,

  case occurs(c, s) = true    when s_1 == c

At this point it might be tempting to start adding more cases such as

  case occurs(c, s) = true    when s_2 == c
  case occurs(c, s) = true    when s_3 == c
  case occurs(c, s) = true    when s_4 == c
  case occurs(c, s) = true    when s_5 == c
  ...
But there are two problems with that. First, how many cases do you list? The function is supposed to work no matter what string s is, and you do not know right now how long s will be when the function is used. Second, since this function should work for any string, it has to work for short strings. What about occurs('w', "ab")? Then s = "ab", and there is no such thing as s_3. (Trying to compute expression "ab"_3 will cause an error.) It is important not to charge ahead trying to look at characters that are not there.

So what can we do about cases like occurs('a', "rabbit"), where the character occurs in the string, but not at the beginning? It is just a matter of looking in the tail of the string. Notice that occurs('a', "rabbit") = occurs('a', "abbit") = true. That also works when the character does not occur in the string. For example, occurs('z', "rabbit") = occurs('z', "abbit") = false. So the last case (which is only used when character c is not equal to s_1) is

  case occurs(c, s) = occurs(c, tail(s))
Putting the cases together gives the entire definition of occurs.
  case occurs(c, s) = false               when s == ""
  case occurs(c, s) = true                when s_1 == c
  case occurs(c, s) = occurs(c, tail(s))
That is short and simple. Let's try a full step-by-step evaluation of occurs('b', "rabbit").
  occurs('b', "rabbit")
    = occurs('b', tail("rabbit"))   (by the third case)
    = occurs('b', "abbit")
    = occurs('b', tail("abbit"))    (by the third case)
    = occurs('b', "bbit")
    = true                          (by the second case)


Defining new functions that produce strings

A function can produce a string as its answer. As a simple example, suppose that you want to repeat a string twice in a row. Say, for example, repeat("abc") = "abcabc". It is just a matter of concatenating "abc" with itself.

  repeat(s) = s ++ s

What if you want to take two strings and glue them together, but to put a space between them? Let's call the function glue, with glue("the", "horse") = "the horse". The concatenation operator ++ will do the job, as long as the space is added explicitly.

  glue(s,t) = s ++ " " ++ t


Problems

  1. [solve] Show how to shorten the definition of isVowel by using occurs. Write the shorter definition of isVowel. (A character is a vowel if it occurs in string "aeiouy".)

  2. [solve] Write a definition of function isPrefix(s, t), which produces true if string s is a prefix of string t. For example, isPrefix("rab", "rabbit") should be true. According to our definition, a string is considered to be a prefix of itself. So isPrefix("rabbit", "rabbit") should also be true.

    (Hint: Just get a substring from the beginning of t that has the same length as s, and see if it is equal to s. Use _*.)

  3. [solve] Define function numBs(s) that produces the number of b's in string s. For example, numBs("bob") = 2. Only count lower case b's.

    (Hint. (1) An empty string has no b's in it. (2) A string that begins with a b has one more b than its tail. (3) A string that does not begin with a b has the same number of b's as its tail.

  4. [solve] Say that string s is a substring of string t if s is found inside t, without any gaps. For example, "rack" is a substring of "cracker".

    Write a definition of a function isSubtring(s, t) that produces true if s is a substring of t. So isSubstring("zoo", "zoology") = true, but isSubstring("ogy", "zoology") = false.

    (Hint. Use recursion. Here are some ideas.

    Note. You will want to use isPrefix. Just write the definition of isSubstring. For this problem, the definition of isPrefix is automatically supplied for you.

  5. [solve] Write a function capitalize(s) that produces a capitalized version of string s. For example, capitalize("dog") = "Dog".

    If string s does not begin with a lower case letter then capitialize(s) should just produce the same string s as its answer. For example, capitalize("Dog") = "Dog".

    (Hint. There is a predefined function toUpper(c) that takes a character c and produces an upper case version of c (or produces c itself if c is not a lower case letter). For example, toUpper('d') = 'D' and toUpper('D') = 'D'. Also, you will want to use $ to convert a character to a string. For example, $('D') = "D".)


Summary

A character is what you get when you type one key. Write a character in single quotes, such as 'Q'.

A string is a sequence of zero or more characters. Write a string in double quotes, such as "dragon".

There are several predefined operations for working on strings. Some of them are illustrated as follows.


Review

To develop a function definition that involves recursion, think about cases. Try to exploit solutions to smaller versions of the same kind of problem. There will always be at least one special case where you do not use the function that you are defining. Those are for very small values, and are usually easy to deal with.

Expect to make some errors. Everybody does. When you make them, try to find out what is wrong, and fix it.

Avoid random experimentation to fix an error. Not only is it a waste of your time, it is usually worse than doing nothing.