5.15. The Preprocessor


The preprocessor

Compilation of a C++ program goes through an initial phase, called preprocessing, where some replacements are made in the text of the program. The preprocessor makes those changes, and preprocessor directives tell the preprocessor what to do.

Strictly speaking, preprocessor directives are not part of C++, but part of what the preprocessor understands. They tend to look very different from C++. For example, although C++ is free form, a preprocessor directive is one line long, and begins with a # sign. Where a C++ statement, type definition, etc. typically ends with a semicolon, the preprocessor uses the end of a line to signal the end of the directive.

It is still possible to break up long lines in preprocessor directives. If the last character in a line is \ (a backslash), then both the backslash and the newline character that follows it are ignored. So just put a \ at the end of each line but the last, and it is all read by the preprocessor as a single line.


Includes

Directives

  #include <file>
and
  #include "file"
tell the preprocessor to insert the contents of file file into the program at this point. The file is expanded as if it had been typed here (except that a message to the compiler is added, telling it where the text came from, so that it can know what to say if there is an error).

The first form, with <file>, tells the preprocessor to search for the file in the directory that holds header files for the standard library (by default). The second form, with "file", tells the preprocessor to look for the file in the current directory (by default).


Symbol definitions

You can tell the preprocessor to perform substitutions. For example,

  #define SHORT_MAX 0xffff
tells the preprocessor to replace SHORT_MAX by 0xffff. Notice that there is no = sign and no semicolon. It is just a textual substution. For example,
  short n = SHORT_MAX;
is replaced by
  short n = 0xffff;
The preprocessor only does substitions of full words. For example, directive
  #define FOG (backup * 3)
causes line
  int i = FOG + DEFOG;
to be replaced by
  int i = (backup * 3) + DEFOG;

By convention, symbols defined in preprocessor directives are written in all upper case letters, so that someone reading the program can recognize them. The convention is not always followed, however, and you can define any symbol that would be an allowed C++ variable name.

If you want to remove a preprocessor definition, use #undef. For example,

  #undef FOG
removes the definition of FOG.


Symbol definitions with parameters

You can define preprocessor symbols with parameters. For example,

  #define MUNCH(x,y) x = crunch(y)
causes
  MUNCH(fork, 2*w+1);
to be replaced by
  fork = crunch(2*w+1);
Similarly,
  #define MAX(a,b) (((a) > (b)) ? (a) : (b))
causes
  int n = MAX(y*z, x);
to be replaced by
  int n = (((y*z) > (x)) ? (y*z) : (x));
Notice the parentheses. The preprocessor does a textual substitution, and you do not want to find that the substitution causes expressions to be reparsed. For example, if you define
  #define SUM(x,y) x + y
then expression
  ostrich = sparrow * SUM(2, trex);
is replaced by
  ostrich = sparrow * 2 + trex
which calls for multiplying sparrow by 2 and then adding trex. A correct definition of SUM is
  #define SUM(x,y) ((x) + (y))


The preprocessor processes everything, including substituted text

The value of symbol defined by the preprocessor can refer to other symbols that are also defined by the preprocessor. For example,

  #define MAX_NUM_VERTICES 1000
  #define MAX_NUM_NODES	   MAX_NUM_VERTICES
causes both MAX_NUM_VERTICES and MAX_NUM_NODES to be replaced by 1000.


Conditional compilation

Sometimes you want part of a program to be compiled only in certain circumstances. You can do that using #if or #ifdef or #ifndef. For example,

  #ifdef DEBUG
    printf("debugging print");
  #endif
tells the preprocess to put the printf line into the program if preprocessor symbol DEBUG is defined, and to leave it out if DEBUG is not defined.

#ifndef asks if a preprocessor symbol is not defined, and #if asks if an expresion is a nonzero integer. For #if, the expression must yield an integer after replacements done by the preprocessor. You can also use operators +, −, *, /, >, <, >=, <=, &&, || and ! in those expressions, and you can parenthesise. For example,

 #if MAX_NUM_VERTICES < 1000
tests if MAX_NUM_VERTICES is less than 1000. The preprocessor must be able to answer the question on its own.

You can have an else-part for any of these. For example,

  #if MAX_NUM_VERTICES < 1000
    Vertex verts[MAX_NUM_VERTICES];
  #else
    Vertex* verts = new Vertex[MAX_NUM_VERTICES];
  #endif
puts one of the two lines into the program, depending on the value of preprocessor symbol MAX_NUM_VERTICES.