Upload
sydney
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Chapter 4. Syntax Analysis (1). Application of a production A in a derivation step i i+ 1. Formal grammars (1/3). Example : Let G 1 have N = { A , B , C }, T = { a , b , c } and the set of productions ACB BC A aABCbB bb - PowerPoint PPT Presentation
Citation preview
Chapter 4.
Syntax Analysis (1)
2
Application of a production A in a derivation step i i+1
A
i
i+1
3
Formal grammars (1/3)
Example : Let G1 have N = {A, B, C}, T = {a, b, c} and the set of productions
A CB BC
A aABC bB bb
A abC bC bc
cC cc
The reader should convince himself that the word akbkck is in L(G1) for all k 1 and that only these words are in L(G1). That is,
L(G1) = { akbkck | k 1}.
4
Formal grammars (2/3)
Example : Grammar G2 is a modification of G1:
G2: A CB BC
A aABC bB bb
A abC bC b
The reader may verify that L(G2) = { akbk | k 1}. Note that the last rule, bC b, erases all the C's from the derivation, and that only this production removes the nonterminal C from sentential forms.
5
Formal grammars (3/3)
Example : A simpler grammar that generates { akbk | k 1} is the grammar G3 :
G3: S
S aSb
S ab
A derivation of a3b3 is
S aSb aaSbb aaabbb
The reader may verify that L(G3) = { akbk | k 1}.
6
Type Format of Productions Remarks
0 φAψ→ φω ψ Unrestricted
Substitution
Rules
1 φAψ→ φω ψ, ω≠λ
∑→λ
Context
Sensitive
Context
Free
Right
Linear
Left
Linear
2 A →ω, ω≠λ
∑→λ
3 A→aB
A→a
∑→λ
A→Ba
A →a
∑→λ
Regular
Noncon-tracting
Contracting
The four types of formal grammars
7
Context-Sensitive Grammars(Type1)
Definition : A context-sensitive grammar G = (N,T,P,) is a formal grammar in which all productions are of the form
φAψ→φωψ, ω≠ The grammar may also contain the production →, if G is a context-sensitive (type1) grammar, then L(G) is a context-sensitive (type1) language.
Unrestricted Grammars(Type0)
8
Context-Free Grammars (Type2)
Definition : A context-free grammar G=(N,T,P,) is a formal grammar in which all productions are of the form
A→ω
The grammar may also contain the production →λ. If G is a context-free (type2) grammar, then L(G) is a context-free (type2) language.
A∈N {∪ }
ω∈(N∪T)*-{λ}
9
Regular Grammars (Type3) (1/2) Definition : A production of the form
A→aB or A→a
is called a right linear production. A production of the form
A→Ba or A→a
is a left linear production. A formal grammar is right linear if it contains only right linear productions, and is left linear if it contains only left linear p
roduction →λ. Left and right linear grammars are also known as regular grammars. If G is a regular (type3) grammar, then L(G) is a regular (type3) language.
A∈N {∑}∪B∈Na∈T
A∈N {∑}∪B∈Na∈T
10
Regular Grammars (Type3) (2/2)
Example: A left linear grammar G1 and a right linear grammar G2 have productions as follows:
G1 : G2 :
The reader may verify that
L(G1) = (10)*1=1(01)*=L(G2)
∑ → 1B
∑ → 1
A → 1B
B → 0A
A → 1
∑ → B1
∑ → 1
A → B1
B → A0
A → 1
11
Ambiguity (1/2)
Example : Consider the context-free grammar
G: S
S SS
S ab
We see that the derivations correspond to different tree diagrams. The grammar G is ambiguous with respect to the sentence ababab: if the tree diagrams were used as the basis for assigning meaning to the derived string, mistaken interpretation could result.
12
Ambiguity (2/2)
Definition: A context-free grammar is ambiguous if and only if it generates some sentence by two or more distinct leftmost derivations.
13
Fig. 4.1. Position of parser in compiler model.
sourceprogram
lexicalanalyzer
parser
symboltable
token
get nexttoken
parsetree
rest offront end
intermediaterepresentation
14
Syntax Error Handling (1/2)
Probable Errors– lexical, such as misspelling an identifier, keyword, or
operator
– syntactic, such as an arithmetic expression with unbalanced parentheses
– semantic, such as an operator applied to an incompatible operand
– logical, such as an infinitely recursive call
15
Syntax Error Handling (2/2)
The error handler in a parser has simple-to-state goals:– It should report the presence of errors clearly and
accurately.
– It should recover from each error quickly enough to be able to detect subsequent errors.
– It should not significantly slow down the processing of correct programs.
16
Error-Recovery Strategies
panic mode phrase level error productions global correction
17
Example 4.2
The grammar with the following productions defines simple arithmetic expressions.
exprexprexprexpr
opopopopop
expr op expr( expr )- exprid+-*/
18
Notational Conventions (1/2)
1. These symbols are terminals:i) Lower-case letters early in the alphabet such as a, b, c.
ii) Operator symbols such as +, -, etc.
iii) Punctuation symbols such as parentheses, comma, etc.
iv) The digits 0, 1, . . . , 9.
v) Boldface strings such as id or if.
2. These symbols are nonterminals:i) Upper-case letters early in the alphabet such as A, B, C.
ii) The letter S, which, when it appears, is usually the start symbol.
iii) Lower-case italic names such as expr or stmt.
3. Upper-case letters late in the alphabet, such as X, Y, Z, represent grammar symbols, that is, either nonterminals or terminals.
19
Notational Conventions (2/2)
4. Lower-case letters late in the alphabet, chiefly u, v, . . . , z, represent strings of terminals.
5. Lower-case Greek letters, , , , for example, represent strings of grammar symbols. Thus, a generic production could be written as A , indicating that there is a single nonterminal A on the left of the arrow (the left side of the production) and a string of grammar symbols to the right of the arrow (the right side of the production).
6. If A 1, A 2, . . . , A k are all productions with A on the left (we call them A-productions), we may write A 1| 2 | . . . | k . We call 1, 2, . . . , k the alternatives for A.
7. Unless otherwise stated, the left side of the first production is the start symbol.
20
Derivations
We say that A if A is a production and and are arbitrary strings of grammar symbols. If
1 2 . . . n, we say 1 derives n. The symbol means “derives in one step”. Often we wish to say “derives in zero or more steps”. For this purpose we can use the symbol . Thus,
1. for any string , and
2. If and , then .
*
*
* *
21
Fig. 4.3. Building the parse tree from derivation (4.4)
E
E- E
)(
E E
E-
E
E
)(
E
E-
E
EE +
id id
E
)(
E
E-
E
EE +
id
E
)(
E
E-
E
EE +
(Grammar 4.4 ) E -E -(E) -(E+E) -(id+E) -(id+id)
22
Eliminating Ambiguity
stmt
|
|
if expr then stmt
if expr then stmt else stmt
other
stmt
matched_stmt
unmatched_stmt
|
|
|
matched_stmt
unmatched_stmt
if expr then matched_stmt else matched_stmt
other
if expr then stmt
if expr then matched_stmt else unmatched_stmt
23
Elimination of Left Recursion
No matter how many A-productions there are, we can eliminate immediate left recursion from them by the following technique. First, we group the A-productions as
A A1 | A2 | . . . | Am | 1 | 2 | . . . | n
where no begins with an A. Then, we replace the A-productions by
A 1A' | 2A' | . . . | nA'
A' 1A' | 2A' | . . . | mA' |
24
Left Factoring
In general, if A 1 | 2 are two A-productions, and the i
nput begins with a nonempty string derived from , we do not know whether to expand A to 1 or to 2 . However, we
may defer the decision by expanding A to A'. Then, after seeing the input derived from , we expand A' to 1 or to 2 . T
hat is, left-factored, original productions become
A A' A' 1 | 2
Example 4.12.
The language L2 = { anbmcndm | n 1 and m 1 }
25
Fig. 4.9. Steps in top-down parse.
S
dc
ba
S
dc A A
S
dc
a
A
(a) (b) (c)
26
Fig. 4.10. Transition diagrams for grammar (4.11).
0 102E :T
1E'
3E' :+
4T
1065E'
7 109T :F
8T'
10T' : * 11F
101312T'
14F :(
15E
101716)
id
EE'T
T'F
TE'+TE' | FT'*FT' | (E) | id
(Grammar 4.11 )
27
Fig. 4.11. Simplified transition diagrams.
3E' :+
4T
5
106
3E' :+
4
T
106
3E :+
4
T
106
0T
3E :
+
106
0T
(a) (b)
(c) (d)
28
Fig. 4.12. Simplified transition diagrams for arithmetic expressions.
*
7 1013T :F
8
14F :(
15E
101716)
id
+
0 106E :T
3
29
Fig. 4.13. Model of a nonrecursive predictive parser.
a + b $
Predictive ParsingProgram
XYZ$
Parsing TableM
INPUT
STACK OUTPUT
30
Nonrecursive Predictive Parsing
1. If X = a = $, the parser halts and announces successful completion of parsing.
2. If X = a $, the parser pops X off the stack and advances the input pointer to the next input symbol.
3. If X is a nonterminal, the program consults entry M[X, a] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. If, for example, M[X, a] = {X UVW}, the parser replaces X on top of the stack by WVU (with U on top). As output, we shall assume that the parser just prints the production used; any other code could be executed here. If M[X, a] = error, the parser calls an error recovery routine.
31
Fig. 4.15. Parsing table M for grammar (4.11).
NONTER-MINAL
INPUT SYMBOL
Id + * ( ) $
E
E'
T
T'
F
E TE'
T FT'
F id
E' +TE'
T' T' *FT'
E TE'
T FT'
F (E)
E'
T'
E'
T'
32Fig. 4.16. Moves made by predictive parser on input id + id * id.
STACK INPUT OUTPUT
$E$E' T$E' T' F$E' T' id$E' T'$E' $E' T +$E' T$E' T' F$E' T' id$E' T' $E' T' F *$E' T' F$E' T' id$E' T'$E' $
id + id * id$id + id * id$id + id * id$id + id * id$
+ id * id$+ id * id$+ id * id$
id * id$id * id$id * id$
* id$* id$
id$id$
$$$
E T E'T F T'F id
T' E' + T E'
T F T'F id
T' * F T'
F id
T' E'
33
Fig. 4.17. Parsing table M for grammar (4.13).
NONTER-MINAL
INPUT SYMBOL
a b e i t $
S S aS iEtS
S'
S'S'
S' eSS'
E E b
SE
iEtS | iEtSeS | ab
(Grammar 4.13 )
34
Fig. 4.18. Synchronizing tokens added to parsing table of Fig. 4.15.
NONTER-MINAL
INPUT SYMBOL
id + * ( ) $
E
E'
T
T'
F
E TE'
T FT'
F id
E' +TE'
synch
T' synch
T' *FT'
synch
E TE'
T FT'
F (E)
synch
E' synch
T' synch
synch
E' synch
T' synch
35Fig. 4.19. Parsing and error recovery moves made by predictive parser.
STACK INPUT OUTPUT
$E$E$E' T$E' T' F$E' T' id$E' T'$E' T' F *$E' T' F$E' T' $E' $E' T +$E' T$E' T' F$E' T' id$E' T'$E' $
) id * + id$id * + id$id * + id$id * + id$id * + id$
* + id$* + id$
+ id$+ id$+ id$+ id$
id$id$id$
$$$
error, skip )id is in FIRST(E)
error, M[F, +] = synchF has been popped