6.2. Derivations

A derivation uses the productions of a grammar to replace nonterminals until there are only tokens left.


Rewriting a string

The starting point is a single rewrite, or replacement.

Imagine that you have a grammar G. Everything below is with the understanding that we are talking about this fixed grammar G.

Given a string of tokens and nonterminals σ, locate a nonterminal N in σ. Then σ = γNδ where γ and δ are two strings of tokens and nontermals.

Now select a production of the form N → α. Using that rule to rewrite N as α, we say that

γNδ ⇒ γαδ
Symbol ⇒ is a relation between strings of tokens and nonterminals; α ⇒ β means that string α can be rewritten into β.

Be sure not to confuse relation ⇒ with the arrow, →, used in productions. A production part of the definition of a grammar. Relation ⇒ is defined in terms of the productions in the grammar.

When talking about a fixed grammar, we use symbol ⇒ for the rewrite relation of that grammar. But, to make the grammar explicit, we use ⇒G for the rewrite relation of grammar G.


Examples of rewrites

Suppose that G is the following expression grammar.

E n
  | E + E
  | E * E
  | ( E )

All of the following are correct rewrites for that grammar. In each left-hand side, the nonterminal that is being rewritten is shown in red, and its replacement is shown in blue on the right-hand side.

E G n
n + E G n + n
n + E G n + E * E
E + E + E G ( E ) + E + E
E + E + E G E + E + E * E

Derivations

You use a grammar by starting with the start nonterminal and performing a sequence of rewrites until all of the nonterminals are gone.

Each sequence of tokens derivable that way is an allowed sequence of tokens in the language.

For example, using the expression grammar G above, here is a derivation of expression n + n.

E G E + E
  G n + E
  G n + n

The language of a grammar

Say that α ⇒*G β if it is possible to rewrite string α into β by doing a sequence of zero or more rewrite steps using the productions of G. For example, we saw above that, using the expression grammar G above,

E ⇒*G n + n

A grammar describes a language. If G is a grammar with start nonterminal S, then the language of G is the set of all sequences of tokens α such that S ⇒*G α.


Leftmost and rightmost derivations

When doing rewrites, you are allowed to select any nonterminal in a string to rewrite. But sometimes it is useful to choose the first nonterminal, other times to choose the last nonterminal, to rewrite.

A leftmost derivation is a derivation that always selects the leftmost nonterminal to rewrite. For example,

E E + E
  n + E
  n + E * E
  n + ( E ) * E
  n + ( n ) * E
  n + ( n ) * n

is a leftmost derivation of n + ( n ) * n.

A rightmost derivation is a derivation that always selects the rightmost nonterminal to rewrite. For example,

E E + E
  E + E * E
  E + E * n
  E + ( E ) * n
  E + ( n ) * n
  n + ( n ) * n

is a rightmost derivation of n + ( n ) * n.