5.12.1. Null-Terminated Strings


Null-terminated strings

You can store a string in an array of characters. But remember that, in general, you cannot find out how large an array is by looking at the array, since an array is just a pointer to the first thing in a chunk of memory.

To avoid that unpleasant issue, a null-terminated string is an array of characters that includes a null character ('\0') as an end marker. For example, an array s containing five characters

  s[0] = 'g'
  s[1] = 'o'
  s[2] = 'a'
  s[3] = 't'
  s[4] = '\0'
represents the string "goat". The null character is not part of the string, but is only a marker letting you know where the string ends.

You can pass a null-terminated string to a function without passing a separate size, because the function can find out how long the string is by looking for the null character.


String constants

A string constant such as "some text" is a null-terminated string. So it is an array of characters, with a null character at the end.

A string constant has type const char*. For example,

  const char* flower = "lotus";
makes flower point into the static area of memory, where null-terminated string "lotus" is stored.

If you want to include an end-of-line character in a string constant, use \n. A string constant is not allowed to have an actual line break in it. But if you write two or more string constants in a row, they are automatically combined into a single string constant. For example,

  char* message = "This is a multiline\n"
                  "message for you\n";
creates a string constant with two lines.

Note. Combination of consecutive strings only works for string constants, not for other expressions that represent strings.


Operations on null-terminated strings

The following are available if you #include <cstring>. Type size_t is equivalent to unsigned int.

size_t strlen(const char* s)

strlen(s) returns the length of null-terminated string s. The length does not count the null character. For example, strlen("rabbit") = 6.

Note. strlen finds the length by scanning through the array looking for the null character. So it takes time that is proportional to the length of the string. Avoid computing strlen(s) over and over for the same string s in the same function.


int strcmp(const char* s, const char* t)

Do not compare strings using ==. If s and t have type char* then expression s == t is true if s and t are the same pointer. It does not look at the characters in the strings.

Function strcmp compares strings s and t for alphabetical ordering or for equality. strcmp(s,t) returns an integer r with the following properties.

r < 0 if s comes before t
r = 0 if s and t are equal
r > 0 if s comes after t
For example, strcmp("cat", "cab") > 0 since "cat" comes after "cab" in alphabetical order.

Alphabetical ordering is determined by character codes. Since 'Z' is 90 and 'a' is 97, Z comes before a in the alphabetical ordering used by strcmp.


int strcasecmp(const char* s, const char* t)

Like strcmp, but ignore the case of letters. So 'r' and 'R' are treated like the same character.

char* strcpy(char* dest, const char* src);

strcpy(dest, src) copies null-terminated string src into array dest and null-terminates dest. The caller must ensure that there is enough room in array dest for the entire string plus the null character at the end.

The return value of strcpy(dest, src) is dest.


char* strncpy(char* dest, const char* src, size_t n);

Like strcpy(dest, src), but array dest has size n, and no more than that many characters are copied. (If there is not room, no null character is stored in dest.)

char* strcat(char* dest, const char* src);

Copy string src to the end of the string in array dest, adding a null character to the end. Then return dest.

Be careful. strcat is not a concatenation function. For example,

  char* c = strcat(a,b);
does not just set c to the concatenation of string a followed by b. It adds b to the end of the string in array a. So what is in array a is changed. Also, there must be enough room to add b to a. Strcat will not allocate more room.

C++ is not Java.


char* strncat(char* dest, const char* src, size_t n);

Like strcat(dest, src), but at most n characters are copied from src.

char* strchr(const char* s, int c)

Return a pointer to the first occurrence of character c in null-terminated string s. If there is no such character, strchr(c, s) returns NULL.

The type of strchr is a poor one. Since parameter s has type const char*, strchr should not be able to return a non-const pointer into s, since that gives you a back-door way to modify a constant string. However, the library designers really wanted to provide two different functions:

  const char* strchr(const char* s, int c);
  char* strchr(char* s, int c);
For a C program, that would require two different function names, and that would likely cause problems for legacy programs.


char* strstr(const char* haystack, const char* needle)

Return a pointer to the first occurrence of substring needle in string haystack, or return NULL if there is none. Both haystack and needle are null-terminated strings.


Examples

For illustrations, here are implementations of strlen, strcmp and strncat. The implementation of strncat is from the manual page for strncat. (See it by doing command man strncat in Linux.)

  size_t strlen(const char* s)
  {
    size_t k;
    for(k = 0; s[k] != '\0'; k++) {}
    return k;
  }

  int strcmp(const char* s, const char* t)
  {
    int k = 0;
    while(s[k] != '\0' && s[k] == t[k])
    {
      k++;
    }

    // We want to return s[k] - t[k], but be
    // sure to treat the characters as numbers
    // from 0 to 255, not as numbers from
    // -128 to 127.

    int sk = (unsigned char) s[k];
    int tk = (unsigned char) t[k];
    return sk - tk;
  }

  char* strncat(char *dest, const char *src, size_t n)
  {
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0 ; i < n && src[i] != '\0' ; i++)
    {
      dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';
    return dest;
  }

Exercises

  1. What is strlen("frog" + 1)? Answer

  2. The following function is intended to return a copy of a null-terminated string.

      char* copyString(const char* s)
      {
        char* t = new char[strlen(s)];
        strcpy(t, s);
        return t;
      }
    
    There is a serious problem with it. Why doesn't it work? Answer

  3. Write a function that returns the number of occurrences of character 'a' in a given null-terminated string. Answer

  4. Suppose that numNonblanks(s) is supposed to return the number of nonblank characters in null-terminated string s. Look at the following definition of numNonblanks.

    int numNonblanks(const char* s)
    {
      int count = 0;
    
      for(int i = 0; i < strlen(s); i++)
      {
        if(s[i] != ' ')
        {
          count++;
        }
      }
      return count;
    }
    
    That definition is not a very good one. Why not? Answer

  5. Write a function removeBlanks(s) that allocates space for a null-terminated string in the heap, copies all nonblank characters in null-terminated string s into that space, null-terminates the new array and returns a pointer to that array. Do not allocate more room than is needed. Answer