Let G be a grammar with start nonterminal S.
A sentential form of G is a string α where S ⇒* α. That is, it is a string that can be derived from the start nonterminal using the productions of G.
Let N be a nonterminal of G. FOLLOW(N) is a set of tokens, possibly additionally including special symbol $ indicating end-of-input.
Definition. FOLLOW(N) contains all tokens t so that there exists a sentential form that contains substring N t.
That is, t is in FOLLOW(N) provided t can follow N in a derivation.
Additionally, FOLLOW(N) contains $ if N there exists a sentential form that ends on N.
Let's use the left-factored expression grammar, with start nonterminal E.
E | → | T R |
R | → | ε |
R | → | + E |
T | → | F S |
S | → | ε |
S | → | * T |
F | → | n |
F | → | ( E ) |
Here are the FOLLOW sets.
N | FOLLOW(N) |
---|---|
E | {), $} |
R | {), $} |
T | {+, ), $} |
S | {+, ), $} |
F | {+, *, ), $} |
FOLLOW(E) must contain $ because E is the start symbol, so E occurs at the end of sentential form E.
FOLLOW(E) also contains ) because the derivation
yields a sentential form where E is followed by ).
You can see from the derivation above that FOLLOW(R) contains $ (the sentential form T R shows that).
FOLLOW(R) contains ) because the derivation above can be continued as follows.
Derivation
shows that FOLLOW(T) contains $ and derivation
shows that FOLLOW(T) contains +.
You should be able to derive the other FOLLOW sets. Note that FOLLOW(T) does not contain *. Nonterminal S is the only one that derives a *. But S can only be derived with F in front of it, and F cannot be erased.
Start with FOLLOW(N) = {} for every nonterminal N. Then perform the following steps until none of the FOLLOW sets can be enlarged any more.
Add $ to FOLLOW(S), where S is the start nonterminal.
If there is a production A → αBβ, then add every token that is in FIRST(β) to FOLLOW(B). (Do not add ε to FOLLOW(B).
If there is a production A → αB, then add all members of FOLLOW(A) to FOLLOW(B).
(If t can follow A, then there must be a sentential form β A t γ Using production A → αB gives sentential form β α B t γ, where B is followed by t.
If there is a production A → αBβ where FIRST(β) contains ε, then add all members of FOLLOW(A) to FOLLOW(B).
(Reasoning is like rule 3. Just erase β.)
Let's use the algorithm to compute FOLLOW sets for the left-factored expression grammar. Like the algorithm for FIRST sets, this algorithm goes in phases, where the FOLLOW sets are enlarged until they cannot be enlarged any more.
Starting with empty sets and using rule (1) gives
N | E | T | F | R | S |
---|---|---|---|---|---|
FOLLOW(N) | {$} | {} | {} | {} | {} |
Now let's employ rule (2).
Production F → ( E ) tells us to add ) to FOLLOW(E), by rule (2).
Since FIRST(R) contains +, production E → T R tells us to add + to FOLLOW(T)
Since FIRST(S) contains *, production T → F S tells us to add * to FOLLOW(F).
N | E | T | F | R | S |
---|---|---|---|---|---|
FOLLOW(N) | {), $} | {+} | {*} | {} | {} |
Production E → T R tells us, by rules (3) and (4), to add all members of FOLLOW(E) to FOLLOW(R) and FOLLOW(T) (since FIRST(R) contains ε).
N | E | T | F | R | S |
---|---|---|---|---|---|
FOLLOW(N) | {), $} | {+, ), $} | {*} | {), $} | {} |
Now production T → F S tells us to add all members of FOLLOW(T) to FOLLOW(S) and FOLLOW(F) (since S can be erased).
N | E | T | F | R | S |
---|---|---|---|---|---|
FOLLOW(N) | {), $} | {+, ), $} | {+, *, ), $} | {), $} | {+, ), $} |
The FOLLOW sets cause additional members to be added to the top-down parsing table.
For each production N → α,
if FIRST(α) contains ε,
for every b in FOLLOW(N)
add production N → α to D(N,b).
The reasoning for that goes like this. Since production N → α can erase N, the lookahead might be a token that comes after N. So use N → α when the lookahead is a token that can follow N.