C Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University

c

Chuen-Liang Chen, NTUCS&IE / 1

CONTEXT-FREE GRAMMARSCONTEXT-FREE GRAMMARS

Chuen-Liang Chen

Department of Computer Science

and Information Engineering

National Taiwan University

Taipei, TAIWAN

c


ParsingParsing function: checking syntactically validity of the input string

producing structure of the corresponding parse treecallee: scanner (when need a token)

semantic routine (when match a production rule) theoretical basis: context-free grammarexecutor: parser, syntax analyzer

top-down parsing– beginning at the start symbol, expanding nonterminals in depth-fir

st manner (predictive in nature)– left-most derivation– pre-order traversal of parse tree– e.g. LL(k) [read from Left; Left-most derivation; k lookaheads

], recursive descent parsing bottom-up parsing

– beginning from terminal string, determining the production used to generate leaves

– right-most derivation in reverse order– post-order traversal of parse tree– e.g. LR(k) [read from Left; Right-most derivation; k lookahead

s]

c


Definitions about context-free grammar (1/2)Definitions about context-free grammar (1/2)context-free grammar -- G = (Vt, Vn, S, P)

Vt -- set of terminal symbols Vn -- set of nonterminal symbols

– a, b, c, ... Vt – A, B, C, ... Vn – U, V, W, ... V = Vt Vn

– u, v, w, ... Vt* – , , , ... V* S -- start symbol, goal symbol; S Vn P -- set of production rules of the form : A

derivation by production rule A one step derivation : A left-most derivation : u A lm u right-most derivation : A v rm v one or more steps derivation :

lm rm

zero or more steps derivation : * *lm *rm

c


Definitions about context-free grammar (2/2)Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = { | S* }

left-most sentential form -- the so that S*lm right-most sentential form -- the so that S*rm

context-free language -- L(G) = SF(G) Vt* parse tree, derivation tree --

graphic representation of derivations root -- start symbol leaf nodes -- grammar symbols or interior nodes -- nonterminals offspring of a nonterminal -- a production

for a given sentential form -- phrase -- a sequence of symbols derived from a single nonterminal simple phrase, prime phrase -- minimal phrase handle -- left-most simple phrase

c


Example of context-free grammarExample of context-free grammargrammar G0 --

E Prefix ( E ) | V TailPrefix F | Tail + E |

left-most derivation -- right-most derivation --

E lm Prefix ( E ) Erm Prefix ( E )lm F ( E ) rm Prefix ( V Tail )lm F ( V Tail ) rm Prefix ( V + E )lm F ( V + E ) rm Prefix ( V + V Tail )lm F ( V + V Tail ) rm Prefix ( V + V )lm F ( V + V ) rm F ( V + V )

right-most sentential forms -- 1. E 2. Prefix ( E ) 3. Prefix ( V Tail ) 4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V )8. and so on

L(G0) { F ( V + V ) }

c


parse trees of left-most derivations

blue symbols : left-most sentential forms

Example of left-most derivationExample of left-most derivation

Tail

E

Prefix ( E )

F V Tail

+ E

V

E

Prefix ( E )

E

Prefix ( E )

F V Tail

E

Prefix ( E )

F V Tail

+ ETail

E

Prefix ( E )

F V Tail

+ E

V

E E

Prefix ( E )

F

c










s]

c


trace of top-down parsing (left-most derivation)

orange : just derived (predicted) blue : just read (matched) black : derived or read green : un-processed (parse stack)

Example of top-down parsingExample of top-down parsing

Tail

E

Prefix ( E )

F V Tail

+ E

V

E

Prefix ( E )

E

Prefix ( E )

F V Tail

E

Prefix ( E )

F V Tail

+ ETail

E

Prefix ( E )

F V Tail

+ E

V

E E

Prefix ( E )

F

c


Definitions about context-free grammar (2/2)Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = { | S* }

left-most sentential form -- the so that S*lm right-most sentential form -- the so that S*rm

context-free language -- L(G) = SF(G) Vt* parse tree, derivation tree --

graphic representation of derivations root -- start symbol leaf nodes -- grammar symbols or interior nodes -- nonterminals offspring of a nonterminal -- a production

for a given sentential form -- phrase -- a sequence of symbols derived from a single nonterminal simple phrase, prime phrase -- minimal phrase handle -- left-most simple phrase

c


Example of right-most derivation (1/2)Example of right-most derivation (1/2)

parse trees of right-most derivations and corresponding sentential form, phrases, simple phrases, handle blue symbols : sentential form : phrase : simple phrase : handle

E

Prefix ( E )

E

Prefix ( E )

V Tail

E

Prefix ( E )

V Tail

+ E

E

Prefix ( V + E )Prefix ( V Tail )E Prefix ( E )

c


Example of right-most derivation (2/2)Example of right-most derivation (2/2)

E

Prefix ( E )

F V Tail

+ E

V Tail

E

Prefix ( E )

V Tail

+ E

V Tail

E

Prefix ( E )

V Tail

+ E

V Tail

Prefix ( V + V Tail ) Prefix ( V + V ) F ( V + V )

c










s]

c


trace of bottom-up parsing (inverse order of right-most derivation)

blue : just read (shifted) orange : just derived (reduced to) pink : not read green : derived or read (parse stack)

Example of bottom-up parsingExample of bottom-up parsing

( )F V + V

Prefix ( )

F

V + E

V Tail

Prefix ( )

F

V + V Prefix ( )

F

V + V Tail

Prefix ( )

F

V Tail

+ E

V Tail

Prefix ( E )

F V Tail

+ E

V Tail

E

Prefix ( E )

F V Tail

+ E

V Tail

c


Examples - 排骨麵特餐Examples - 排骨麵特餐

example 1

排骨麵特餐冰紅茶排骨麵柳丁切片排骨麵炸排骨湯麵

lookahead is unnecessary

example 2

排骨麵特餐 service 排骨麵炸排骨湯麵 | 湯麵炸排骨 service 芋仔冰 | 別想了 ()

lookahed is required

c


Ambiguity of grammarAmbiguity of grammar

a string with two different parse trees (i.e., two different structures)

example : <exp> <exp> - <exp><exp> id

for an unambiguous grammar, parse trees of leftmost derivation and right-most derivation are the same

<exp> <exp>

<exp> <exp>

<exp>

id

-

-

id

id

<exp> <exp>

<exp> <exp>

<exp>

id

-

-

id

id

c


First set and Follow set (1/2)First set and Follow set (1/2)

First() = { a Vt | * a } ( if * then {} else )

set of all terminals that can begin a sentential form derived from Firstk() -- set of k-symbol terminal strings that can begin a sententi

al form derived from QUIZ: for what?QUIZ: for what?

Follow(A) = { a Vt | S + A a } ( if S + A then {} else )

set of all terminals that may follow A in some sentential form

Followk(A) -- set of k-symbol terminal strings that may follow A in some sentential form

QUIZ: for what?QUIZ: for what?

c


First set and Follow set (2/2)First set and Follow set (2/2)

example 1 --

E Prefix ( E )

E V Tail

Prefix F |

Tail + E | example 2 --

S a S e | B

B b B e | C

C c C e | dexample 3 --

S A B c

A a | B b |

c


Algorithms for First & Follow sets (1/6)Algorithms for First & Follow sets (1/6)

typedef int symbol;/* a symbol in the grammar */

/* The symbolic constants used * below, NUM_TERMINALS, * NUM_NONTERMINALS, and * NUM_PRODUCTIONS are * determined by the grammar. * MAX_RHS_LENGTH should * simply be "big enough." */

#define VOCABULARY (NUM_NONTERMINALS + NUM_TERMINALS)

typedef struct gram {symbol terminals[NUM_TERMINALS];symbol nonterminals[NUM_NONTERMINALS];symbol start_symbol;int num_productions;struct prod {

symbol lhs;int rhs_length;symbol rhs[MAX_RHS_LENGTH];

} productions[NUM_PRODUCTIONS];symbol vocabulary[VOCABULARY];

} grammar;

typedef struct prod production;

typedef symbol terminal;typedef symbol nonterminal;

c



typedef short boolean;typedef boolean marked_vocabulary[VOCABULARY];/* * Mark those vocabulary symbols found to derive (directly or indirectly). */marked_vocabulary mark_lambda(const grammar g){

static marked_vocabulary derives_lambda;boolean changes; /* any changes during last iteration? */boolean rhs_derives_lambda; /* does the RHS derive ? */symbol v; /* a word in the vocabulary */production p; /* a production in the grammar */int i, j; /* loop variables */

for (v = 0; v < VOCABULARY; v++)derives_lambda[v] = FALSE;/* initially, nothing is marked */

c



do {changes = FALSE;for (i = 0; i < g.num_productions; i++) {

p = g.productions[i];if (! derives_lambda[p.lhs]) {

if (p.rhs_length == 0) {/* derives directly */changes = derives_lambda[p.lhs] = TRUE;continue;

}/* does each part of RHS derive ? */rhs_derives_lambda = derives_lambda[p.rhs[0]];for (j = 1; j < p.rhs_length, j++)

rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]];if (rhs_derives_lambda)

changes = derives_lambda[p.lhs] = TRUE;}

}} while (changes);return derives_lambda;

}

c



typedef set_of_terminal_or_lambda termset;termset follow_set[NUM_NONTERMINAL];termset first_set[SYMBOL];marked_vocabulary derives_lambda = mark_lambda(g);/* mark_lambda(g) as defined above */termset compute_first(string_of_symbols alpha){

int i, k;termset result;k = length(alpha);if (k == 0)

result = SET_OF( );else {

result = first_set[alpha[0]] - SET_OF( ) ;for (i = 1; i < k && first_set[alpha[i-1] ]; i++)

result = result ( first_set[alpha[i]] - SET_OF( ) );if (i == k && first_set[alpha[k - 1]])

result = result SET_OF( );}return result;

}

c



extern grammar g;

void fill_first_set(void){

nonterminal A;terminal a;production p;boolean changes;int i, j;

for (i = 0; i < NUM_NONTERMINAL;i++) {

A = g.nonterminals[i];if (derives_lambda[A])

first_set[A] = SET_OF( );else

first_set[A] = ;}

for (i = 0; i < NUM_TERMINAL; i++) {a = g.terminals[i];first_set[a] = SET_OF( a );for (j = 0; j < NUM_NONTERMINAL; j++) {

A = g.nonterminals[j];if (there exists a production Aa)

first_set[A] = first_set[A] SET_OF( a );

}}do {

changes = FALSE;for (i = 0; i < g.num_productions; i++) {

p = g.productions[i];first_set[p.lhs] = first_set[p.lhs]

compute_first(p.rhs);if ( first_set changed )

changes = TRUE;}

} while (changes);}

QUIZ: termination?QUIZ: termination? QUIZ: correctness?QUIZ: correctness?

c



void fill_follow_set(void){

nonterminal A, B;int i;boolean changes;

for (i = 0; i < NUM_NONTERMINAL; i++) {A = g.nonterminals[i];follow_set[A] = ;

}follow_set[g.start_symbol] = SET_OF( );

do {changes = FALSE;for (each production A B ) {

/* * I.e. for each production and each * occurrence of a nonterminal in it

s * right-hand side. */follow_set[B] = follow_set[B]

(compute_first() - SET_OF( ));

if ( compute_first() )follow_set[B] = follow_set[B]

follow_set[A];if ( follow_set[B] changed )

changes = TRUE;}

} while (changes);}

QUIZ: termination?QUIZ: termination? QUIZ: correctness?QUIZ: correctness?

c


Tracing examplesTracing examplesexample 1 --

E Prefix( E‚)Œ

E V TailƒPrefix F |

Tail + E„ | ‘example 2 --

S a SeŒ | B‚’

B b Bƒe | C„

C c C…e | d‘example 3 --

S AB‚cŒA a |

B b | ‚

ŒŒŒ Ž

„„‚‚

’ ’

ƒ …

Œ Ž ‘

‚ ƒ ƒ

ŒŒ

„„

Ž ‘

c


From extended BNF to CFGFrom extended BNF to CFG

<statement list> <statement> { <statement> }

<statement list> <statement> <statement tail>

<statement tail> <statement> <statement tail>

<statement tail>

QUIZ: how, systematically?QUIZ: how, systematically?

c


Other types of grammarsOther types of grammars regular grammar -- A a B or C

QUIZ: how?QUIZ: how?

context-free grammar -- A

context-sensitive grammar -- A

type 0 grammar --

regular grammar : too simple, e.g., { [ i ] i | i 1 } QUIZ: how to specify { [QUIZ: how to specify { [ i i ] ] i i | i | i 1 } by context-free grammar? 1 } by context-free grammar?

context-sensitive, type 0 : without sufficient parser

context-free grammar : a balance between generality and practicality

Documents

C Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University