Prev | Next |
Syntax refers to the form, or appearance, of a program. It is not concerned with meaning. Syntax can show structure, indicating that certain parts of a program are components of other parts, but it does not tell you the semantic significance of that structure.
Syntax design | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Languages use a variety of different syntactic rules. Some are intended to look familiar, while others choose simplicity over familiarity. Here is the same thing written in three different languages.
|
Lexical rules | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Parse trees and program structure |
---|
A parse tree describes the structure of a part of a program, such as a statement or an expression. The leaves of a parse tree are tokens. The nonleaves are labeled by what kind of thing the tree describes, such as expression or statement. The following parse tree describes expression 4 + 9 + 2. It assumes that 4, 9 and 2 are lexemes that describe the token number. This tree shows structure: it indicates that 9+2 is done first, because that is a subtree. |
Describing syntax and structure: Grammars | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
You can describe the syntax of a language by describing the parse trees that are allowed. Anyone creating a parse tree must follow the rules that you give. The rules describe what is allowed locally in a tree. The example tree contains a node labeled expression with three children, labeled expression, + and expression, in that order. A rule that indicates this is allowed is
In rules, labels of nonleaf nodes are usually written in angle brackets. A collection of rules is called a grammar. A grammar that allows the parse tree shown above is
If you have several rules in a row with the same thing on the left of ::=, you normally write | in each but the first rule. So the above grammar can be written
|
Repetition | ||||||||
---|---|---|---|---|---|---|---|---|
You can use recursion to simulate repetition in a grammar. Here is a definition of a C compound statement, which consists of braces surrounding one or more statements.
Here, (empty) indicates an empty sequence of tokens. It is only written so that you understand that it is deliberately empty. |
Ambiguity | ||
---|---|---|
A grammar should, ideally, not only specify the order in which tokens can appear, but also the structure of a parse tree. So (ideally) any syntactically correct program or program part, such as an expression, should have just one possible parse tree. Unfortunately, not all grammars are ideal.
The expression grammar shown above is ambiguous. Here are two different parse trees for number + number * number
|
Designing unambiguous grammars | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
An alternative expression grammar introduces asymmetry to avoid ambiguity. Here is an unambiguous expression grammar that allows binary operators +, - and *, numbers, and parentheses.
The following is the only parse tree for expression 2+3*4 using this grammar. The following is a parse tree for expression 2+(4*5) using this grammar. |
Precedence and associativity |
---|
|
Extended BNF notation | |||||
---|---|---|---|---|---|
There are some convenient abbreviations that are not part of standard BNF notation, but are frequently used in what is called extended BNF notation.
Here is a grammar that uses EBNF to define some C statements. Note that the tokens { and } are quoted, to avoid confusion with braces used to indicate repetition.
|
Syntax diagrams |
---|
Another common notation for describing syntax is the syntax diagram, a graphical form that is equivalent to BNF, but is often easier to understand. The following is a syntax diagram for a kind of expression. |
Terminology |
---|
|
Prev | Next |