Context-free grammars are the fundamental tool used to describe the syntax of a programming language. They are much more powerful than regular expressions, but, partly because of that power, require significantly more careful care and feeding than regular expressions.
Having chosen a collection of tokens, the next step is to choose a collection of nonterminals. A nonterminal represents a syntactic idea, such as an expression, a list of expressions, a statement, etc.
For brevity in this description, I will use upper-case letters for nonterminals and lower-case letters for tokens. I will also use special symbols such as +, * and comma as tokens.
A greek letter stands for a string of zero or more tokens and nonterminals.
The next step in creating a context-free grammar is to write a collection of productions. A production has the form
N → αwhere N is a nonterminal and α is a string of zero or more tokens and nonterminals.
A production is a rewrite rule. Production N → α says that N can be replaced by α.
For example, suppose that nonterminal E is intended to stand for an expression. Production
E → E + Esays that an expression can have the form of an expression, then token +, then another expression.
In addition to a list of tokens and nontermials, a context-free grammar has a set of productions and a start nonterminal.
For example, here is a small context-free grammar for expressions. There is only one nonterminal, E, so it is obviously the start nonterminal. Token n stands for a number.
E | → | n |
E | → | E + E |
E | → | E * E |
E | → | ( E ) |
In the last of those productions, ( and ) are tokens.
Other kinds of grammars, beyond context-free, exist, but in this course we will only use context-free grammars. So I drop the context-free designation and just call them grammars.
By convention, a group of productions that have the same left-hand side are shown with the left-hand side written only once. In all but the first production in the group, write | instead of →. For example, the above grammar for expressions is usually written as follows.
E | → | n |
| | E + E | |
| | E * E | |
| | ( E ) |