On the previous page, I constructed the parsing table by hand. But we need an algorithm that constructs it. That is the subject of this and the next page.
Fix a grammar G to talk about. The tokens, nonterminals and productions in what follows belong to G. S is the start symbol.
Let α be a string of tokens and nonterminals.
Definition. FIRST(α) is the set of all tokens t so that α ⇒* tβ for some string β.
Additionally, FIRST(α) contains ε if α ⇒* ε.
That is, FIRST(α) contains all tokens that can begin α, plus ε if α can be erased.
For example, let's see what FIRST(N) is for each nonterminal N in the following grammar.
E | → | T R |
R | → | ε |
R | → | + E |
T | → | F S |
S | → | ε |
S | → | * T |
F | → | n |
F | → | ( E ) |
N | E | T | F | R | S |
---|---|---|---|---|---|
FIRST(N) | {n, (} | {n, (} | {n, (} | {ε, +} | {ε, *} |
There is a simple and natural algorithm to compute FIRST(α). To begin with, we compute FIRST(α) where α is exactly one symbol long.
For each nonterminal N, start with FIRST(N) = {} and add members as necessary until no more members can be added.
To compute first(t) and first (N)If t is a token then FIRST(t) = {t}.
If N is a nonterminal, find all productions with N on the left-hand side. For each such production N → Y1Y2…Yn (n ≥ 0), do the following.
Add all tokens in FIRST(Y1) to FIRST(N).
For k = 2, … n−1
if ε is in every one of FIRST(Yj),
for j = 1, …, k−1,
then add all tokens in
FIRST(Yj) to FIRST(N).
If ε is in FIRST(Yj) for j = 1, …, n, then add ε to FIRST(N).
The next step is to define FIRST(α) for every string of tokens and nonterminals α.
To compute FIRST(α) for strings αFIRST(ε) = {ε}
If α begins with token t then FIRST(α) = {t}.
If α = Nβ starts with nonterminal N, then
If FIRST(N) does not contain ε then FIRST(α) = FIRST(N).
If FIRST(N) contains ε then FIRST(α) = FIRST(N) ∪ FIRST(β).
The interesting part of the computation is FIRST(N) for a nonterminal N. Here is the computation for the expression grammar above.
The computation goes in phases. Start with FIRST(N) = {} for every nonterminal N. For the sample grammar above, that looks like this.
N | E | T | F | R | S |
---|---|---|---|---|---|
FIRST(N) | {} | {} | {} | {} | {} |
First, the productions whose right-hand side begin with a token and the erasing productions come into play. Rule 2 above tell you that
FIRST(R) contains + and ε, due to productions R → + E and R → ε.
FIRST(S) contains * and ε, due to productions S → * T and S → ε.
FIRST(F) contains n and ( due to productions F → n and F → ( E ).
N | E | T | F | R | S |
---|---|---|---|---|---|
FIRST(N) | {} | {} | {n, (} | {ε, +} | {ε, *} |
Now, since T → F S is a production, all tokens in FIRST(F) are added to FIRST(T).
N | E | T | F | R | S |
---|---|---|---|---|---|
FIRST(N) | {} | {n, (} | {n, (} | {ε, +} | {ε, *} |
Then, since E → T R is a production, all tokens in FIRST(T) are added to FIRST(E).
N | E | T | F | R | S |
---|---|---|---|---|---|
FIRST(N) | {n, (} | {n, (} | {n, (} | {ε, +} | {ε, *} |
Finally, no new members can be added to any of the sets, so the algorithm is done.
The FIRST sets tell how to make entries in the LL(1) parsing table. Look at each production. If a production is
then add production number i to the table at row N, column t for every token t in FIRST(α).