Upload
toby-allen
View
214
Download
1
Embed Size (px)
Citation preview
c
Chuen-Liang Chen, NTUCS&IE / 1
CONTEXT-FREE GRAMMARSCONTEXT-FREE GRAMMARS
Chuen-Liang Chen
Department of Computer Science
and Information Engineering
National Taiwan University
Taipei, TAIWAN
c
Chuen-Liang Chen, NTUCS&IE / 2
ParsingParsing function: checking syntactically validity of the input string
producing structure of the corresponding parse treecallee: scanner (when need a token)
semantic routine (when match a production rule) theoretical basis: context-free grammarexecutor: parser, syntax analyzer
top-down parsing– beginning at the start symbol, expanding nonterminals in depth-fir
st manner (predictive in nature)– left-most derivation– pre-order traversal of parse tree– e.g. LL(k) [read from Left; Left-most derivation; k lookaheads
], recursive descent parsing bottom-up parsing
– beginning from terminal string, determining the production used to generate leaves
– right-most derivation in reverse order– post-order traversal of parse tree– e.g. LR(k) [read from Left; Right-most derivation; k lookahead
s]
c
Chuen-Liang Chen, NTUCS&IE / 3
Definitions about context-free grammar (1/2)Definitions about context-free grammar (1/2)context-free grammar -- G = (Vt, Vn, S, P)
Vt -- set of terminal symbols Vn -- set of nonterminal symbols
– a, b, c, ... Vt – A, B, C, ... Vn – U, V, W, ... V = Vt Vn
– u, v, w, ... Vt* – , , , ... V* S -- start symbol, goal symbol; S Vn P -- set of production rules of the form : A
derivation by production rule A one step derivation : A left-most derivation : u A lm u right-most derivation : A v rm v one or more steps derivation :
lm rm
zero or more steps derivation : * *lm *rm
c
Chuen-Liang Chen, NTUCS&IE / 4
Definitions about context-free grammar (2/2)Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = { | S* }
left-most sentential form -- the so that S*lm right-most sentential form -- the so that S*rm
context-free language -- L(G) = SF(G) Vt* parse tree, derivation tree --
graphic representation of derivations root -- start symbol leaf nodes -- grammar symbols or interior nodes -- nonterminals offspring of a nonterminal -- a production
for a given sentential form -- phrase -- a sequence of symbols derived from a single nonterminal simple phrase, prime phrase -- minimal phrase handle -- left-most simple phrase
c
Chuen-Liang Chen, NTUCS&IE / 5
Example of context-free grammarExample of context-free grammargrammar G0 --
E Prefix ( E ) | V TailPrefix F | Tail + E |
left-most derivation -- right-most derivation --
E lm Prefix ( E ) Erm Prefix ( E )lm F ( E ) rm Prefix ( V Tail )lm F ( V Tail ) rm Prefix ( V + E )lm F ( V + E ) rm Prefix ( V + V Tail )lm F ( V + V Tail ) rm Prefix ( V + V )lm F ( V + V ) rm F ( V + V )
right-most sentential forms -- 1. E 2. Prefix ( E ) 3. Prefix ( V Tail ) 4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V )8. and so on
L(G0) { F ( V + V ) }
c
Chuen-Liang Chen, NTUCS&IE / 6
parse trees of left-most derivations
blue symbols : left-most sentential forms
Example of left-most derivationExample of left-most derivation
Tail
E
Prefix ( E )
F V Tail
+ E
V
E
Prefix ( E )
E
Prefix ( E )
F V Tail
E
Prefix ( E )
F V Tail
+ ETail
E
Prefix ( E )
F V Tail
+ E
V
E E
Prefix ( E )
F
c
Chuen-Liang Chen, NTUCS&IE / 7
ParsingParsing function: checking syntactically validity of the input string
producing structure of the corresponding parse treecallee: scanner (when need a token)
semantic routine (when match a production rule) theoretical basis: context-free grammarexecutor: parser, syntax analyzer
top-down parsing– beginning at the start symbol, expanding nonterminals in depth-fir
st manner (predictive in nature)– left-most derivation– pre-order traversal of parse tree– e.g. LL(k) [read from Left; Left-most derivation; k lookaheads
], recursive descent parsing bottom-up parsing
– beginning from terminal string, determining the production used to generate leaves
– right-most derivation in reverse order– post-order traversal of parse tree– e.g. LR(k) [read from Left; Right-most derivation; k lookahead
s]
c
Chuen-Liang Chen, NTUCS&IE / 8
trace of top-down parsing (left-most derivation)
orange : just derived (predicted) blue : just read (matched) black : derived or read green : un-processed (parse stack)
Example of top-down parsingExample of top-down parsing
Tail
E
Prefix ( E )
F V Tail
+ E
V
E
Prefix ( E )
E
Prefix ( E )
F V Tail
E
Prefix ( E )
F V Tail
+ ETail
E
Prefix ( E )
F V Tail
+ E
V
E E
Prefix ( E )
F
c
Chuen-Liang Chen, NTUCS&IE / 9
Definitions about context-free grammar (2/2)Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = { | S* }
left-most sentential form -- the so that S*lm right-most sentential form -- the so that S*rm
context-free language -- L(G) = SF(G) Vt* parse tree, derivation tree --
graphic representation of derivations root -- start symbol leaf nodes -- grammar symbols or interior nodes -- nonterminals offspring of a nonterminal -- a production
for a given sentential form -- phrase -- a sequence of symbols derived from a single nonterminal simple phrase, prime phrase -- minimal phrase handle -- left-most simple phrase
c
Chuen-Liang Chen, NTUCS&IE / 10
Example of right-most derivation (1/2)Example of right-most derivation (1/2)
parse trees of right-most derivations and corresponding sentential form, phrases, simple phrases, handle blue symbols : sentential form : phrase : simple phrase : handle
E
Prefix ( E )
E
Prefix ( E )
V Tail
E
Prefix ( E )
V Tail
+ E
E
Prefix ( V + E )Prefix ( V Tail )E Prefix ( E )
c
Chuen-Liang Chen, NTUCS&IE / 11
Example of right-most derivation (2/2)Example of right-most derivation (2/2)
E
Prefix ( E )
F V Tail
+ E
V Tail
E
Prefix ( E )
V Tail
+ E
V Tail
E
Prefix ( E )
V Tail
+ E
V Tail
Prefix ( V + V Tail ) Prefix ( V + V ) F ( V + V )
c
Chuen-Liang Chen, NTUCS&IE / 12
ParsingParsing function: checking syntactically validity of the input string
producing structure of the corresponding parse treecallee: scanner (when need a token)
semantic routine (when match a production rule) theoretical basis: context-free grammarexecutor: parser, syntax analyzer
top-down parsing– beginning at the start symbol, expanding nonterminals in depth-fir
st manner (predictive in nature)– left-most derivation– pre-order traversal of parse tree– e.g. LL(k) [read from Left; Left-most derivation; k lookaheads
], recursive descent parsing bottom-up parsing
– beginning from terminal string, determining the production used to generate leaves
– right-most derivation in reverse order– post-order traversal of parse tree– e.g. LR(k) [read from Left; Right-most derivation; k lookahead
s]
c
Chuen-Liang Chen, NTUCS&IE / 13
trace of bottom-up parsing (inverse order of right-most derivation)
blue : just read (shifted) orange : just derived (reduced to) pink : not read green : derived or read (parse stack)
Example of bottom-up parsingExample of bottom-up parsing
( )F V + V
Prefix ( )
F
V + E
V Tail
Prefix ( )
F
V + V Prefix ( )
F
V + V Tail
Prefix ( )
F
V Tail
+ E
V Tail
Prefix ( E )
F V Tail
+ E
V Tail
E
Prefix ( E )
F V Tail
+ E
V Tail
c
Chuen-Liang Chen, NTUCS&IE / 14
Examples - 排骨麵特餐Examples - 排骨麵特餐
example 1
排骨麵特餐 冰紅茶 排骨麵 柳丁切片排骨麵 炸排骨 湯麵
lookahead is unnecessary
example 2
排骨麵特餐 service 排骨麵 炸排骨 湯麵 | 湯麵 炸排骨 service 芋仔冰 | 別想了 ()
lookahed is required
c
Chuen-Liang Chen, NTUCS&IE / 15
Ambiguity of grammarAmbiguity of grammar
a string with two different parse trees (i.e., two different structures)
example : <exp> <exp> - <exp><exp> id
for an unambiguous grammar, parse trees of leftmost derivation and right-most derivation are the same
<exp> <exp>
<exp> <exp>
<exp>
id
-
-
id
id
<exp> <exp>
<exp> <exp>
<exp>
id
-
-
id
id
c
Chuen-Liang Chen, NTUCS&IE / 16
First set and Follow set (1/2)First set and Follow set (1/2)
First() = { a Vt | * a } ( if * then {} else )
set of all terminals that can begin a sentential form derived from Firstk() -- set of k-symbol terminal strings that can begin a sententi
al form derived from QUIZ: for what?QUIZ: for what?
Follow(A) = { a Vt | S + A a } ( if S + A then {} else )
set of all terminals that may follow A in some sentential form
Followk(A) -- set of k-symbol terminal strings that may follow A in some sentential form
QUIZ: for what?QUIZ: for what?
c
Chuen-Liang Chen, NTUCS&IE / 17
First set and Follow set (2/2)First set and Follow set (2/2)
example 1 --
E Prefix ( E )
E V Tail
Prefix F |
Tail + E | example 2 --
S a S e | B
B b B e | C
C c C e | dexample 3 --
S A B c
A a | B b |
c
Chuen-Liang Chen, NTUCS&IE / 18
Algorithms for First & Follow sets (1/6)Algorithms for First & Follow sets (1/6)
typedef int symbol;/* a symbol in the grammar */
/* The symbolic constants used * below, NUM_TERMINALS, * NUM_NONTERMINALS, and * NUM_PRODUCTIONS are * determined by the grammar. * MAX_RHS_LENGTH should * simply be "big enough." */
#define VOCABULARY (NUM_NONTERMINALS + NUM_TERMINALS)
typedef struct gram {symbol terminals[NUM_TERMINALS];symbol nonterminals[NUM_NONTERMINALS];symbol start_symbol;int num_productions;struct prod {
symbol lhs;int rhs_length;symbol rhs[MAX_RHS_LENGTH];
} productions[NUM_PRODUCTIONS];symbol vocabulary[VOCABULARY];
} grammar;
typedef struct prod production;
typedef symbol terminal;typedef symbol nonterminal;
c
Chuen-Liang Chen, NTUCS&IE / 19
Algorithms for First & Follow sets (2/6)Algorithms for First & Follow sets (2/6)
typedef short boolean;typedef boolean marked_vocabulary[VOCABULARY];/* * Mark those vocabulary symbols found to derive (directly or indirectly). */marked_vocabulary mark_lambda(const grammar g){
static marked_vocabulary derives_lambda;boolean changes; /* any changes during last iteration? */boolean rhs_derives_lambda; /* does the RHS derive ? */symbol v; /* a word in the vocabulary */production p; /* a production in the grammar */int i, j; /* loop variables */
for (v = 0; v < VOCABULARY; v++)derives_lambda[v] = FALSE;/* initially, nothing is marked */
c
Chuen-Liang Chen, NTUCS&IE / 20
Algorithms for First & Follow sets (3/6)Algorithms for First & Follow sets (3/6)
do {changes = FALSE;for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];if (! derives_lambda[p.lhs]) {
if (p.rhs_length == 0) {/* derives directly */changes = derives_lambda[p.lhs] = TRUE;continue;
}/* does each part of RHS derive ? */rhs_derives_lambda = derives_lambda[p.rhs[0]];for (j = 1; j < p.rhs_length, j++)
rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]];if (rhs_derives_lambda)
changes = derives_lambda[p.lhs] = TRUE;}
}} while (changes);return derives_lambda;
}
c
Chuen-Liang Chen, NTUCS&IE / 21
Algorithms for First & Follow sets (4/6)Algorithms for First & Follow sets (4/6)
typedef set_of_terminal_or_lambda termset;termset follow_set[NUM_NONTERMINAL];termset first_set[SYMBOL];marked_vocabulary derives_lambda = mark_lambda(g);/* mark_lambda(g) as defined above */termset compute_first(string_of_symbols alpha){
int i, k;termset result;k = length(alpha);if (k == 0)
result = SET_OF( );else {
result = first_set[alpha[0]] - SET_OF( ) ;for (i = 1; i < k && first_set[alpha[i-1] ]; i++)
result = result ( first_set[alpha[i]] - SET_OF( ) );if (i == k && first_set[alpha[k - 1]])
result = result SET_OF( );}return result;
}
c
Chuen-Liang Chen, NTUCS&IE / 22
Algorithms for First & Follow sets (5/6)Algorithms for First & Follow sets (5/6)
extern grammar g;
void fill_first_set(void){
nonterminal A;terminal a;production p;boolean changes;int i, j;
for (i = 0; i < NUM_NONTERMINAL;i++) {
A = g.nonterminals[i];if (derives_lambda[A])
first_set[A] = SET_OF( );else
first_set[A] = ;}
for (i = 0; i < NUM_TERMINAL; i++) {a = g.terminals[i];first_set[a] = SET_OF( a );for (j = 0; j < NUM_NONTERMINAL; j++) {
A = g.nonterminals[j];if (there exists a production Aa)
first_set[A] = first_set[A] SET_OF( a );
}}do {
changes = FALSE;for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];first_set[p.lhs] = first_set[p.lhs]
compute_first(p.rhs);if ( first_set changed )
changes = TRUE;}
} while (changes);}
QUIZ: termination?QUIZ: termination? QUIZ: correctness?QUIZ: correctness?
c
Chuen-Liang Chen, NTUCS&IE / 23
Algorithms for First & Follow sets (6/6)Algorithms for First & Follow sets (6/6)
void fill_follow_set(void){
nonterminal A, B;int i;boolean changes;
for (i = 0; i < NUM_NONTERMINAL; i++) {A = g.nonterminals[i];follow_set[A] = ;
}follow_set[g.start_symbol] = SET_OF( );
do {changes = FALSE;for (each production A B ) {
/* * I.e. for each production and each * occurrence of a nonterminal in it
s * right-hand side. */follow_set[B] = follow_set[B]
(compute_first() - SET_OF( ));
if ( compute_first() )follow_set[B] = follow_set[B]
follow_set[A];if ( follow_set[B] changed )
changes = TRUE;}
} while (changes);}
QUIZ: termination?QUIZ: termination? QUIZ: correctness?QUIZ: correctness?
c
Chuen-Liang Chen, NTUCS&IE / 24
Tracing examplesTracing examplesexample 1 --
E Prefix( E‚)Œ
E V TailƒPrefix F |
Tail + E„ | ‘example 2 --
S a SeŒ | B‚’
B b Bƒe | C„
C c C…e | d‘example 3 --
S AB‚cŒA a |
B b | ‚
ŒŒŒ Ž
„„‚‚
’ ’
ƒ …
Œ Ž ‘
‚ ƒ ƒ
ŒŒ
„„
Ž ‘
c
Chuen-Liang Chen, NTUCS&IE / 25
From extended BNF to CFGFrom extended BNF to CFG
<statement list> <statement> { <statement> }
<statement list> <statement> <statement tail>
<statement tail> <statement> <statement tail>
<statement tail>
QUIZ: how, systematically?QUIZ: how, systematically?
c
Chuen-Liang Chen, NTUCS&IE / 26
Other types of grammarsOther types of grammars regular grammar -- A a B or C
QUIZ: how?QUIZ: how?
context-free grammar -- A
context-sensitive grammar -- A
type 0 grammar --
regular grammar : too simple, e.g., { [ i ] i | i 1 } QUIZ: how to specify { [QUIZ: how to specify { [ i i ] ] i i | i | i 1 } by context-free grammar? 1 } by context-free grammar?
context-sensitive, type 0 : without sufficient parser
context-free grammar : a balance between generality and practicality