LL(1) parsers

←	• HOME • UP •	→
	Top/Top-down parsing/LL(1)

8.1. LL(1) Parsers

A top-down parser that uses a one-token lookahead is called an LL(1) parser.

The first L indicates that the input is read from left to right.
The second L says that it produces a left-to-right derivation.
And the 1 says that it uses one lookahead token. (Some parsers look ahead at the next 2 tokens, or even more than that.)

The LL(1) parsing table

The parser needs to find a production to use for nonterminal N when it sees lookahead token t.

To select which production to use, it suffices to have a table that has, as a key, a pair (N, t) and gives the number of a production to use.

Let's illustrate with an LL(1) parsing table for the expression grammar that we used earlier, which looks like this.

1. E	→	T R
2. R	→	ε
3. R	→	+ E
4. T	→	F S
5. S	→	ε
6. S	→	* T
7. F	→	n
8. F	→	( E )

Parsing table D below does the job. Each row is labeled by a nonterminal and each column by a lookahead token, or the special symbol $ that indicates the end of the input.

D(N, t) is the production to use to expand N when the lookahead is t. Blank entries mean syntax error.

Table D
	n	+	*	(	)	$
E	1			1
R		3	2		2	2
T	4			4
S		5	6		5	5
F	7			8

Now it is easy to use the table to control a top-down parse. Parsing n * n goes as follows. Start with E.

D(E, n) = 1, so expand E using production 1.

          E
         / \
        T   R

Since D(T, n) = 4, we continue by expanding T to F S.

Now D(F, n) = 7, and production 7 is F → n.

The lookahead changes to * and D(S, *) = 6. Since production 5 is S → * T, the table tells us to replace S by * T.

Now the lookahead is n, and D(T, n) = 4. After using production 4 (T → F S), the table will tell us to use production 7 (F → n), giving

The parse is almost finished. Since there are no more tokens, the lookahead is $. D(S, $) = 5 and D(R, $) = 2, which says to replace S and R by ε.

A stack-based approach

Instead of building a parse tree, it can be preferable to construct a derivation.

We use a two stacks, called Match and Todo. Stack Matched only holds tokens.

At any given point, the string that has been derived is m t where m is the contents of the Matched stack (from bottom to top) and t is the contents of the Todo stack (from top to bottom).

The action tells the production that is used, or, when a token is moved to the Matched stack and removed from the input, a match action.

Here is a parse of n + n * n using the same expression grammar.

Matched	Todo	Input	Action
	E $	*n + n n $**
	T R $	*n + n n $**	E → T R
	F S R $	*n + n n $**	T → F S
	n S R $	*n + n n $**	F → n
n	S R $	*+ n n $**	match n
n	R $	*+ n n $**	S → ε
n	+ E $	*+ n n $**	R → + E
n +	E $	*n n $**	match +
n +	T R $	*n n $**	E → T R
n +	F S R $	*n n $**	T → F S
n +	n S R $	*n n $**	F → n
n + n	S R $	* n $	match n
n + n	* T R $	* n $	S → * T
n + n *	T R $	n $	match *
n + n *	F S R $	n $	T → F S
n + n *	n S R $	n $	F → n
*n + n n**	S R $	$	match n
*n + n n**	R $	$	S → ε
*n + n n**	$	$	R → ε

←	• HOME • UP •	→
	Top/Top-down parsing/LL(1)