26C. Algorithms on Null-Terminated Strings


Example function definitions

For illustration, here are implementations of strlen, strcmp, strcat and strncat. The implementation of strncat is from the manual page for strncat. (See it by doing command man strncat in Linux.)

  //----------------------------------------------------------

  size_t strlen(const char* s)
  {
    size_t k;
    for(k = 0; s[k] != '\0'; k++) {}
    return k;
  }

  //----------------------------------------------------------

  int strcmp(const char* s, const char* t)
  {
    int k = 0;
    while(s[k] != '\0' && s[k] == t[k])
    {
      k++;
    }

    // We want to return s[k] - t[k], but be
    // sure to treat the characters as numbers
    // from 0 to 255, not as numbers from
    // -128 to 127.

    int sk = (unsigned char) s[k];
    int tk = (unsigned char) t[k];
    return sk - tk;
  }

  //----------------------------------------------------------

  char* strcat(char* dest, const char* src)
  {
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0; src[i] != '\0'; i++)
    {
      dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';
    return dest;    
  }

  //----------------------------------------------------------

  char* strncat(char* dest, const char* src, size_t n)
  {
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0 ; i < n && src[i] != '\0' ; i++)
    {
      dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';
    return dest;
  }
  //----------------------------------------------------------

Looping over a null-terminated string

The above definition of strcat shows a standard way to look at each character in null-terminated string s:

  for(i = 0; s[i] != '\0'; i++)
  {
    …
  }
Sometimes students do that as follows instead.
  for(i = 0; i < strlen(s); i++)
  {
    …
  }
That works, but it is very slow for long strings. Notice that it needs to recompute strlen(s) for every character in s. That requires time proportional to n2 when s has length n. If n is 1000, n2 is 1,000,000.


Storing the null character

A null character is stored for a string constant automatically, and library functions such as strcpy and strcat store null characters. But that is as far as it goes. Looking at the definition of strcat, you can see that the null character needs to be stored explicitly. Don't forget to do that.


Copying null-terminated strings

Copying a null-terminated string s involves two steps.

  1. Allocate an array (let's call it cpy) large enough to store the copy. Recall that strlen does not count the null character, but you need to store a null character into cpy. So the size of cpy needs to be strlen(s) + 1.

  2. Copy s into cpy. Strcpy will do that job.

But you don't want to do two steps every time you need to copy a string. Make a function that is an expert on copying null-terminated strings, such as the following.

  char* copystr(const char* s)
  {
    char* cpy = new char[strlen(s) + 1];
    strcpy(cpy, s);
    return cpy;
  }

A slightly more involved example

Let's define a function copyLetters(dest, src) that copies all of the letters in string src into array dest, and null-terminates dest. For example, if src is "I'm happy, not sad" then string "Imhappynotsad" is stored into dest.

An important thing to notice is that a character in dest is not necessarily at the same index as the corresponding character in src. For example, the 'm' in src is at index 2, but it is at index 1 in dest.

  void copyLetters(char* dest, const char* src)
  {
    int di = 0;  // dest index

    for(int si = 0; src[si] != '\0'; si++)
    {
      if(isalpha((unsigned)(src[si])))
      {
        dest[di] = src[si];
        di++;
      }
    }
    dest[di] = '\0';
  }

Exercises

  1. The following function is intended to return a copy of a null-terminated string.

      char* copyString(const char* s)
      {
        char* t = new char[strlen(s)];
        strcpy(t, s);
        return t;
      }
    
    There is a serious problem with it. Why doesn't it work? Answer

  2. Write a function that returns the number of occurrences of character 'a' in a given null-terminated string. Answer

  3. Suppose that numNonblanks(s) is supposed to return the number of nonblank characters in null-terminated string s. Look at the following definition of numNonblanks.

    int numNonblanks(const char* s)
    {
      int count = 0;
    
      for(int i = 0; i < strlen(s); i++)
      {
        if(s[i] != ' ')
        {
          count++;
        }
      }
      return count;
    }
    
    That definition is not a very good one. Why not? Answer

  4. Write a function removeBlanks(s) that allocates space for a null-terminated string in the heap, copies all nonblank characters in null-terminated string s into that space, null-terminates the new array and returns a pointer to that array. Do not allocate more room than is needed. Answer

  5. Does the following definition of strlen work?

      size_t strlen(const char* s)
      {
        for(const char* r = s; *r!= '\0'; r++) {}
        return r - s;
      }
    
    Answer

  6. Write a function concat(A, B) that returns the concatenation of two null-terminated strings A and B. Its parameters must be const arrays. Answer

  7. After working out the solution yourself, look at the answer to the preceding question. That is a short and clear way to write it. But notice that the strcat line starts back at the beginning of the string, and needs to search for the end, which was already known at the end of the strcpy line. Can you see how to avoid rescanning the copy of A? Answer

  8. We use positional notation to write numbers. In base 10, there is a position for 1's, a position for 10's, a position for 100's, etc., with a position for each power of 10.

    Computers also use positional notation, but they use base 2 instead of 10 (binary notation). There is a position for 1's, a position for 2's, a position for 4's, etc., with a position for each power of 2.

    Write a C++ program that reads a binary number from the standard input and writes the equivalent decimal (base 10) number on the standard output. Assume that the binary number has no more than 50 digits.

    Hints.

    1. Statement

        scanf("%s", binary);
      
      reads a string and stores the string into character array binary as a null-terminated string. (You don't add & to binary because it is already a pointer to the array where you want scanf to store the string.)

    2. To convert the binary string to an integer, loop over the null-terminated string, from beginning to end. For each bit, multiply your current number by 2 and add the next digit. That is based on the following ideas. Suppose that num(str) is the function that converts a binary string to an integer.

        num("1")   + 1 = num("11").
        num("11")  + 0 = num("110").
        num("110") + 0 = num("1100").
      

    3. Use type long for numbers so that you can handle 50-bit integers (on a 64-bit machine).

    Answer