Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Lexical and Syntax Analysis(of Programming Languages)

Top-Down Parsing

Lexical and Syntax Analysis(of Programming Languages)

Top-Down Parsing

Data structure

Easy for programs

to transform

String ofcharacters

Easy for humansto write andunderstand

Lexemes identified

String oftokens

Data structure

Easy for programs

to transform

String ofcharacters

Easy for humansto write andunderstand

Lexemes identified

String oftokens

PART 1:SYNTAX OF LANGUAGES

• Context-Free Grammars

• Derivations

• Parse Trees

• Ambiguity

• Precedence and Associativity

PART 1:SYNTAX OF LANGUAGES

• Context-Free Grammars

• Derivations

• Parse Trees

• Ambiguity

• Precedence and Associativity

Syntax

The syntax is a set of rulesdefining valid strings of alanguage, often specified by acontext-free grammar.

For example, a grammar E forarithmetic expressions:

e → x| y| e + e| e – e| e * e| ( e )

Syntax

The syntax is a set of rulesdefining valid strings of alanguage, often specified by acontext-free grammar.

For example, a grammar E forarithmetic expressions:

e → x| y| e + e| e – e| e * e| ( e )

Context-free grammars

Have four components:

1. A set of terminal symbols.

2. A set of non-terminal symbols.

3. A set of productions (or rules) ofthe form:

where n is a non-terminal andX1⋯Xn is any sequence ofterminals, non-terminals, and 𝜀.

4. The start symbol (one of thenon-terminals).

n → X1⋯ Xn

Context-free grammars

Have four components:

1. A set of terminal symbols.

2. A set of non-terminal symbols.

3. A set of productions (or rules) ofthe form:

where n is a non-terminal andX1⋯Xn is any sequence ofterminals, non-terminals, and 𝜀.

4. The start symbol (one of thenon-terminals).

n → X1⋯ Xn

Notation

Non-terminals are underlined.

Rather than writing

we may write:

(Also, symbols → and ::= will beused interchangeably.)

e → xe → e + e

e → x| e + e

Notation

Non-terminals are underlined.

Rather than writing

we may write:

(Also, symbols → and ::= will beused interchangeably.)

e → xe → e + e

e → x| e + e

Why context-free?

Regular

ContextFree

ContextSensitive

Unrestricted

Nice balance between expressivepower and efficiency of parsing.

Why context-free?

Regular

ContextFree

ContextSensitive

Unrestricted

Nice balance between expressivepower and efficiency of parsing.

Derivations

A derivation is a proof that thesome string conforms to agrammar.

For example:

e ⇒ e + e⇒ x + e⇒ x + ( e )⇒ x + ( e * e )⇒ x + ( y * e )⇒ x + ( y * x )

Derivations

A derivation is a proof that thesome string conforms to agrammar.

For example:

e ⇒ e + e⇒ x + e⇒ x + ( e )⇒ x + ( e * e )⇒ x + ( y * e )⇒ x + ( y * x )

Derivations

Leftmost derivation: alwaysexpand the leftmost non-terminal when applying thegrammar rules.

Rightmost derivation: alwaysexpand the rightmost non-terminal, e.g.

e ⇒ e + e⇒ e + ( e )⇒ e + ( x )⇒ x + ( x )

Derivations

Leftmost derivation: alwaysexpand the leftmost non-terminal when applying thegrammar rules.

Rightmost derivation: alwaysexpand the rightmost non-terminal, e.g.

e ⇒ e + e⇒ e + ( e )⇒ e + ( x )⇒ x + ( x )

Parse tree:motivation

Like a derivation: a proof that agiven input is valid according tothe grammar. But a parse tree:

is more concise: we don’t writeout the sentence every time anon-terminal is expanded.

abstracts over the order inwhich rules are applied.

Parse tree:motivation

Like a derivation: a proof that agiven input is valid according tothe grammar. But a parse tree:

is more concise: we don’t writeout the sentence every time anon-terminal is expanded.

abstracts over the order inwhich rules are applied.

Parse tree:intuition

If non-terminal n has a production

n → X Y Z

where X, Y, and Z are terminals ornon-terminals, then a parse treemay have an interior node labelledn with three children labelled X, Y,and Z.

n

X Y Z

Parse tree:intuition

If non-terminal n has a production

n → X Y Z

where X, Y, and Z are terminals ornon-terminals, then a parse treemay have an interior node labelledn with three children labelled X, Y,and Z.

n

X Y Z

Parse tree:definition

A parse tree is a tree in which:

the root is labelled by the startsymbol;

each leaf is labelled by a terminalsymbol, or 𝜀;

each interior node is labelled by anon-terminal;

if n is a non-terminal labelling aninterior node whose children areX1, X2, ⋯, Xn then there must exista production n→ X1 X2 ⋯ Xn.

Parse tree:definition

A parse tree is a tree in which:

the root is labelled by the startsymbol;

each leaf is labelled by a terminalsymbol, or 𝜀;

each interior node is labelled by anon-terminal;

if n is a non-terminal labelling aninterior node whose children areX1, X2, ⋯, Xn then there must exista production n→ X1 X2 ⋯ Xn.

Example 1

Example input string:

Resulting parse tree accordingto grammar E:

x + y * x

e

x

+

*e

e

e

y

x

e

Example 1

Example input string:

Resulting parse tree accordingto grammar E:

x + y * x

e

x

+

*e

e

e

y

x

e

Example 2

The following is not a parse treeaccording to grammar E.

e

x

+

*e

e

e

y

x

Why? Because e → x + e is not aproduction in grammar E.

Example 2

The following is not a parse treeaccording to grammar E.

e

x

+

*e

e

e

y

x

Why? Because e → x + e is not aproduction in grammar E.

Syntax Analysis

String of symbols

Parse tree

A parse tree is:

1. A proof that a given input is validaccording to the grammar;

2. A structure-rich representation ofthe input that can be stored in adata structure that is convenientto process.

(Syntax analysis may also report thatthe input string is invalid.)

Syntax Analysis

String of symbols

Parse tree

A parse tree is:

1. A proof that a given input is validaccording to the grammar;

2. A structure-rich representation ofthe input that can be stored in adata structure that is convenientto process.

(Syntax analysis may also report thatthe input string is invalid.)

Ambiguity

If there exists more than oneparse tree for any string then thegrammar is ambiguous. Forexample, the string x+y*x hastwo parse trees:

e

e + e

x e * e

y x

e

*e

e + e

x y

e

x

Ambiguity

If there exists more than oneparse tree for any string then thegrammar is ambiguous. Forexample, the string x+y*x hastwo parse trees:

e

e + e

x e * e

y x

e

*e

e + e

x y

e

x

Operator precedence

Different parse trees often havedifferent meanings, so we usuallywant unambiguous grammars.

Conventionally, * has a higherprecedence (binds tighter) than +,so there is only one interpretationof x+y*x, namely x+(y*x).

Operator precedence

Different parse trees often havedifferent meanings, so we usuallywant unambiguous grammars.

Conventionally, * has a higherprecedence (binds tighter) than +,so there is only one interpretationof x+y*x, namely x+(y*x).

Operator associativity

Binary operators are either:

Conventionally, - is left-associative,so there is only one interpretationof x-x-x, namely (x-x)-x.

left-associative;

right-associative;

non-associative.

Even with operator precedencerules, ambiguity remains, e.g. x-x-x.

Operator associativity

Binary operators are either:

Conventionally, - is left-associative,so there is only one interpretationof x-x-x, namely (x-x)-x.

left-associative;

right-associative;

non-associative.

Even with operator precedencerules, ambiguity remains, e.g. x-x-x.

Exercise 1

Give an unambiguous grammar forexpressions, using these rules ofassociativity and precedence.

Let all operators be left associative,and let * bind tighter than + and –.

e → x| y| e + e| e – e| e * e| ( e )

Recall grammar E:

Exercise 1

Give an unambiguous grammar forexpressions, using these rules ofassociativity and precedence.

Let all operators be left associative,and let * bind tighter than + and –.

e → x| y| e + e| e – e| e * e| ( e )

Recall grammar E:

Answer: step-by-step

Given a non-terminal e whichinvolves operators at n levels ofprecedence:

Step 1: introduce n+1 new non-terminals, e0 ⋯ en.

Answer: step-by-step

Given a non-terminal e whichinvolves operators at n levels ofprecedence:

Step 1: introduce n+1 new non-terminals, e0 ⋯ en.

Step 2: replace each production

e → e op e

with

ei → ei op ei+1

| ei+1

if op is left-associative, or

ei → ei+1 op ei

| ei+1

if op is right-associative

Let op denote an operator withprecedence i.


e → e op e

with

ei → ei op ei+1

| ei+1

if op is left-associative, or

ei → ei+1 op ei

| ei+1

if op is right-associative

Let op denote an operator withprecedence i.

Grammar E after step 2 becomes:

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e → ( e )| x| y

Operator Precedence

+, - 0

* 1

Construct the precedence table:

Grammar E after step 2 becomes:

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e → ( e )| x| y

Operator Precedence

+, - 0

* 1

Construct the precedence table:


e → ⋯

with

en → ⋯

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 3:


e → ⋯

with

en → ⋯

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 3:

Step 4: replace all occurrences ofe0 with e.

e → e + e1

| e – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 4:

Step 4: replace all occurrences ofe0 with e.

e → e + e1

| e – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 4:

Exercise 2

Consider the following ambiguousgrammar for logical propositions.

p → 0 (Zero)| 1 (One)| ~ p (Negation)| p + p (Disjunction)| p * p (Conjunction)

Now let + and * be right associativeand the operators in increasing orderof binding strength be : +, *, ~.

Give an unambiguous grammar forlogical propositions.

Exercise 2

Consider the following ambiguousgrammar for logical propositions.

p → 0 (Zero)| 1 (One)| ~ p (Negation)| p + p (Disjunction)| p * p (Conjunction)

Now let + and * be right associativeand the operators in increasing orderof binding strength be : +, *, ~.

Give an unambiguous grammar forlogical propositions.

Exercise 3

Which of the following grammarsare ambiguous?

s → if b then s| if b then s else s| skip

e → + e e| – e e| x

b → 0 b 1| 0 1

Exercise 3

Which of the following grammarsare ambiguous?

s → if b then s| if b then s else s| skip

e → + e e| – e e| x

b → 0 b 1| 0 1

Summary of Part 1

Syntax of a language is oftenspecified by a context-freegrammar

Derivations and parse trees areproofs that a string is acceptedby a grammar.

Construction of unambiguousgrammars using rules ofprecedence and associativity.

Summary of Part 1

Syntax of a language is oftenspecified by a context-freegrammar

Derivations and parse trees areproofs that a string is acceptedby a grammar.

Construction of unambiguousgrammars using rules ofprecedence and associativity.

PART 2:TOP-DOWN PARSING

• Recursive-Descent

• Backtracking

• Left-Factoring

• Predictive Parsing

• Left-Recursion Removal

• First and Follow Sets

• Parsing tables and LL(1)

PART 2:TOP-DOWN PARSING

• Recursive-Descent

• Backtracking

• Left-Factoring

• Predictive Parsing

• Left-Recursion Removal

• First and Follow Sets

• Parsing tables and LL(1)

Top-down parsing

Top-down: begin with the startsymbol and expand non-terminals,succeeding when the input stringis matched.

A good strategy for writing parsers:

1. Implement a syntax checker toaccept or refute input strings.

2. Modify the checker to constructa parse tree – straightforward.

Top-down parsing

Top-down: begin with the startsymbol and expand non-terminals,succeeding when the input stringis matched.

A good strategy for writing parsers:

1. Implement a syntax checker toaccept or refute input strings.

2. Modify the checker to constructa parse tree – straightforward.

RECURSIVE DESCENT

A popular top-down parsing technique.

RECURSIVE DESCENT

A popular top-down parsing technique.

Recursive descent

A recursive descent parserconsists of a set of functions,one for each non-terminal.

The function for non-terminal nreturns true if some prefix ofthe input string can be derivedfrom n, and false otherwise.

Recursive descent

A recursive descent parserconsists of a set of functions,one for each non-terminal.

The function for non-terminal nreturns true if some prefix ofthe input string can be derivedfrom n, and false otherwise.

Consuming the input

int eat(char c) {if (*next == c) {

next++;return 1;

}return 0;

}

Consume c from input if possible.

We assume a global variable nextpoints to the input string.

char* next;

Consuming the input

int eat(char c) {if (*next == c) {

next++;return 1;

}return 0;

}

Consume c from input if possible.

We assume a global variable nextpoints to the input string.

char* next;

Recursive descent

int N() {char* save = next;

for each N → X1 X2 ⋯ Xn

if (parser(X1) &&parser(X2) &&

⋯ &&parser(Xn)) return 1;

else next = save;

return 0;}

For each non-terminal N, introduce:

Let parser(X) denote

X() if X is a non-terminal

eat(X) if X is a terminal

Backtrack

Recursive descent

int N() {char* save = next;

for each N → X1 X2 ⋯ Xn

if (parser(X1) &&parser(X2) &&

⋯ &&parser(Xn)) return 1;

else next = save;

return 0;}

For each non-terminal N, introduce:

Let parser(X) denote

X() if X is a non-terminal

eat(X) if X is a terminal

Backtrack

Exercise 4

Consider the following grammar Gwith start symbol e.

Using recursive descent, write asyntax checker for grammar G.

e → ( e + e )| ( e * e )| v

v → x| y

Exercise 4

Consider the following grammar Gwith start symbol e.

Using recursive descent, write asyntax checker for grammar G.

e → ( e + e )| ( e * e )| v

v → x| y

Answer (part 1)

int e() {char* save = next;

if (eat('(') && e() && eat('+') &&e() && eat(')')) return 1;

else next = save;

if (eat('(') && e() && eat('*') &&e() && eat(')')) return 1;

else next = save;

if (v()) return 1;else next = save;

return 0;}

Answer (part 1)


if (eat('(') && e() && eat('+') &&e() && eat(')')) return 1;

else next = save;

if (eat('(') && e() && eat('*') &&e() && eat(')')) return 1;

else next = save;

if (v()) return 1;else next = save;

return 0;}

Answer (part 2)

int v() {char* save = next;

if (eat('x')) return 1;else next = save;

if (eat('y')) return 1;else next = save;

return 0;}

Answer (part 2)

int v() {char* save = next;

if (eat('x')) return 1;else next = save;

if (eat('y')) return 1;else next = save;

return 0;}

Exercise 5

How many function calls aremade by the recursive descentparser to parse the followingstrings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

(See animation of backtracking.)

Exercise 5

How many function calls aremade by the recursive descentparser to parse the followingstrings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

(See animation of backtracking.)

Answer

Input string Length Calls

(x*x) 5 21

((x*x)*x) 9 53

(((x*x)*x)*x) 13 117

Number of calls is quadratic inthe length of the input string.

Lesson: backtracking expensive!

Answer


(x*x) 5 21

((x*x)*x) 9 53

(((x*x)*x)*x) 13 117

Number of calls is quadratic inthe length of the input string.

Lesson: backtracking expensive!

LEFT FACTORING

Reducing backtracking!

LEFT FACTORING

Reducing backtracking!

Left factoring

When two productions for anon-terminal share a commonprefix, expensive backtrackingcan be avoided by left-factoringthe grammar.

Idea: Introduce a new non-terminal that accepts each ofthe different suffixes.

Left factoring

When two productions for anon-terminal share a commonprefix, expensive backtrackingcan be avoided by left-factoringthe grammar.

Idea: Introduce a new non-terminal that accepts each ofthe different suffixes.

Example 3

Left-factoring grammar G byintroducing non-terminal r:

e → ( e r| v

r → + e )| * e )

v → x| y

Common prefix

Different suffixes

Example 3

Left-factoring grammar G byintroducing non-terminal r:

e → ( e r| v

r → + e )| * e )

v → x| y

Common prefix

Different suffixes

Exercise 6

How many function calls aremade by the recursive descentparser (after left-factoring) toparse the following strings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

Exercise 6

How many function calls aremade by the recursive descentparser (after left-factoring) toparse the following strings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

Answer


(x*x) 5 13

((x*x)*x) 9 22

(((x*x)*x)*x) 13 31

Number of calls is now linear inthe length of input string.

Lesson: left-factoring a grammarreduces backtracking.

Answer


(x*x) 5 13

((x*x)*x) 9 22

(((x*x)*x)*x) 13 31

Number of calls is now linear inthe length of input string.

Lesson: left-factoring a grammarreduces backtracking.

PREDICTIVE PARSING

Eliminating backtracking!

PREDICTIVE PARSING

Eliminating backtracking!

Predictive parsing

Idea: know which production of anon-terminal to choose basedsolely on the next input symbol.

Advantage: very efficient since iteliminates all backtracking.

Disadvantage: not all grammarscan be parsed in this way. (Butmany useful ones can.)

Predictive parsing

Idea: know which production of anon-terminal to choose basedsolely on the next input symbol.

Advantage: very efficient since iteliminates all backtracking.

Disadvantage: not all grammarscan be parsed in this way. (Butmany useful ones can.)

Running example

The following grammar H will beused as a running example todemonstrate predictive parsing.

Example:

e → e + e| e * e| ( e )| x| y

x+y*(y+x)

Running example

The following grammar H will beused as a running example todemonstrate predictive parsing.

Example:

e → e + e| e * e| ( e )| x| y

x+y*(y+x)

Removing ambiguity

Since + and * are left-associativeand * binds tighter than +, wecan derive an unambiguousvariant of H.

e → e + t| t

t → t * f| f

f → ( e )| x| y

Removing ambiguity

Since + and * are left-associativeand * binds tighter than +, wecan derive an unambiguousvariant of H.

e → e + t| t

t → t * f| f

f → ( e )| x| y

Left recursion

Problem: left-recursive grammarscause recursive descent parsers toloop forever.


if (e() && eat('+') && t()) return 1;next = save;

if (t()) return 1;next = save;

return 0;}

Call to self withoutconsuming any input

Left recursion

Problem: left-recursive grammarscause recursive descent parsers toloop forever.


if (e() && eat('+') && t()) return 1;next = save;

if (t()) return 1;next = save;

return 0;}

Call to self withoutconsuming any input

Eliminating left recursion

n → 𝛼 n → 𝛼 n'⟹

n' → 𝛼 n'⟹Rule 1

Rule 2

where 𝛼 does not begin with n

Let 𝛼 denote any sequence ofgrammar symbols.

n' → 𝜀

Rule 3Introduce new

production

n → n 𝛼

Eliminating left recursion

n → 𝛼 n → 𝛼 n'⟹

n' → 𝛼 n'⟹Rule 1

Rule 2

where 𝛼 does not begin with n


n' → 𝜀

Rule 3Introduce new

production

n → n 𝛼

Example 4

Running example, after eliminatingleft-recursion.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Example 4

Running example, after eliminatingleft-recursion.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

first and follow sets

Predictive parsers are built usingthe first and follow sets of eachnon-terminal in a grammar.

The first set of a non-terminal n isthe set of symbols that can begin astring derived from n.

The follow set of a non-terminal nis the set of symbols that canimmediately follow n in any step ofa derivation.

first and follow sets

Predictive parsers are built usingthe first and follow sets of eachnon-terminal in a grammar.

The first set of a non-terminal n isthe set of symbols that can begin astring derived from n.

The follow set of a non-terminal nis the set of symbols that canimmediately follow n in any step ofa derivation.

Definition of first sets


If 𝛼 can derive a string beginningwith terminal a then a ∊ first(𝛼).

If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).

Definition of first sets


If 𝛼 can derive a string beginningwith terminal a then a ∊ first(𝛼).

If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).

Computing first sets

If a is a terminal then a ∊ first(a).

If there exists a production

n → X1 X2 ⋯ Xn

and ∃i · a ∊ first(Xi)

and ∀j < i · 𝜀 ∊ first(Xj)

then a ∊ first(n).

If n → 𝜀 then 𝜀 ∊ first(n).

Computing first sets

If a is a terminal then a ∊ first(a).

If there exists a production

n → X1 X2 ⋯ Xn

and ∃i · a ∊ first(Xi)

and ∀j < i · 𝜀 ∊ first(Xj)

then a ∊ first(n).

If n → 𝜀 then 𝜀 ∊ first(n).

Exercise 7

What are the first sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Exercise 7

What are the first sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Answer

first( f ) = { ‘(‘, ‘x’, ‘y’ }

first( t' ) = { ‘*’, 𝜀 }first( t ) = { ‘(‘, ‘x’, ‘y’ }

first( e' ) = { ‘+’, 𝜀 }first( e ) = { ‘(‘, ‘x’, ‘y’ }

Answer

first( f ) = { ‘(‘, ‘x’, ‘y’ }

first( t' ) = { ‘*’, 𝜀 }first( t ) = { ‘(‘, ‘x’, ‘y’ }

first( e' ) = { ‘+’, 𝜀 }first( e ) = { ‘(‘, ‘x’, ‘y’ }

Definition of follow sets

Let 𝛼 and 𝛽 denote any sequenceof grammar symbols.

Terminal a ∊ follow(n) if the startsymbol of the grammar can derivea string of grammar symbols inwhich a immediately follows n.

The set follow(n) never contains 𝜀.

Definition of follow sets

Let 𝛼 and 𝛽 denote any sequenceof grammar symbols.

Terminal a ∊ follow(n) if the startsymbol of the grammar can derivea string of grammar symbols inwhich a immediately follows n.

The set follow(n) never contains 𝜀.

End markers

In predictive parsing, it is useful tomark the end of the input stringwith a $ symbol.

If the start symbol can derive astring of grammar symbols inwhich n is the rightmost symbolthen $ is in follow(n).

End markers

In predictive parsing, it is useful tomark the end of the input stringwith a $ symbol.

If the start symbol can derive astring of grammar symbols inwhich n is the rightmost symbolthen $ is in follow(n).

Computing follow sets

If s is the start symbol of thegrammar then $ ∊ follow(s).

If n → 𝛼 x 𝛽 then everything infirst(𝛽) except 𝜀 is in follow(x).

If n → 𝛼 x

or n → 𝛼 x𝛽 and 𝜀 ∊ first(𝛽)

then everything in follow(n) is infollow(x).

Computing follow sets

If s is the start symbol of thegrammar then $ ∊ follow(s).

If n → 𝛼 x 𝛽 then everything infirst(𝛽) except 𝜀 is in follow(x).

If n → 𝛼 x

or n → 𝛼 x𝛽 and 𝜀 ∊ first(𝛽)

then everything in follow(n) is infollow(x).

Exercise 8

What are the follow sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Exercise 8

What are the follow sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Answer

follow( e' ) = { $, ‘)’ }follow( e ) = { $, ‘)’ }

follow( t' ) = { ‘+’, $, ‘)’ }follow( t ) = { ‘+’, $, ‘)’ }

follow( f ) = { ‘*’, ‘+’, ‘)’, $ }

Answer

follow( e' ) = { $, ‘)’ }follow( e ) = { $, ‘)’ }

follow( t' ) = { ‘+’, $, ‘)’ }follow( t ) = { ‘+’, $, ‘)’ }

follow( f ) = { ‘*’, ‘+’, ‘)’, $ }

Predictive parsing table

For each non-terminal n, a parsetable T defines which productionof n should be chosen, based onthe next input symbol.

for each production n → 𝛼for each a ∊ first(𝛼)

add n → 𝛼 to T[n , a]if 𝜀 ∊ first(𝛼) then

for each b ∊ follow(n)add n → 𝛼 to T[n , a]

Predictive parsing table

For each non-terminal n, a parsetable T defines which productionof n should be chosen, based onthe next input symbol.

for each production n → 𝛼for each a ∊ first(𝛼)

add n → 𝛼 to T[n , a]if 𝜀 ∊ first(𝛼) then

for each b ∊ follow(n)add n → 𝛼 to T[n , a]

Exercise 9

Construct a predictive parsingtable for the following grammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Exercise 9

Construct a predictive parsingtable for the following grammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

LL(1) grammars

If each cell in the parse tablecontains at most one entry thenthe a non-backtracking parsercan be constructed and thegrammar is said to be LL(1).

First L: left-to-right scanning ofthe input.

Second L: a leftmost derivationis constructed.

The (1): using one input symbolof look-ahead to decide whichgrammar production to choose.

LL(1) grammars

If each cell in the parse tablecontains at most one entry thenthe a non-backtracking parsercan be constructed and thegrammar is said to be LL(1).

First L: left-to-right scanning ofthe input.

Second L: a leftmost derivationis constructed.

The (1): using one input symbolof look-ahead to decide whichgrammar production to choose.

Exercise 10

Write a syntax checker for thegrammar of Exercise 9, utilisingthe predictive parsing table.

int e() {...

}

It should return a non-zero valueif some prefix of the stringpointed to by next conforms tothe grammar, otherwise it shouldreturn zero.

Exercise 10

Write a syntax checker for thegrammar of Exercise 9, utilisingthe predictive parsing table.

int e() {...

}

It should return a non-zero valueif some prefix of the stringpointed to by next conforms tothe grammar, otherwise it shouldreturn zero.

Answer (part 1)

int e() {if (*next == 'x') return t() && e1();if (*next == 'y') return t() && e1();if (*next == '(') return t() && e1();return 0;

}

int e1(){

if (*next == '+')return eat('+') && t() && e1();

if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Answer (part 1)

int e() {if (*next == 'x') return t() && e1();if (*next == 'y') return t() && e1();if (*next == '(') return t() && e1();return 0;

}

int e1(){

if (*next == '+')return eat('+') && t() && e1();

if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Answer (part 2)

int t() {if (*next == 'x') return f() && t1();if (*next == 'y') return f() && t1();if (*next == '(') return f() && t1();return 0;

}

int t1() {if (*next == '+') return 1;if (*next == '*‘)

return eat('*') && f() && t1();if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Answer (part 2)

int t() {if (*next == 'x') return f() && t1();if (*next == 'y') return f() && t1();if (*next == '(') return f() && t1();return 0;

}

int t1() {if (*next == '+') return 1;if (*next == '*‘)

return eat('*') && f() && t1();if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Answer (part 3)

int f() {if (*next == 'x') return eat('x');if (*next == 'y') return eat('y');if (*next == '(')

return eat('(') && e() && eat(')');return 0;

}

(Notice how backtracking is notrequired.)

Answer (part 3)

int f() {if (*next == 'x') return eat('x');if (*next == 'y') return eat('y');if (*next == '(')

return eat('(') && e() && eat(')');return 0;

}

(Notice how backtracking is notrequired.)

Predictive parsing algorithm

Let s be a stack, initially containing thestart symbol of the grammar, and letnext point to the input string.

while (top(s) != $)if (top(s) is a terminal) {

if (top(s) == *next) { pop(s); next++; }else error();

}else if (T[top(s), *next] == X → Y1⋯ Yn) {

pop(s);push(s, Yn⋯ Y1) /* Y1 on top */

}

Predictive parsing algorithm

Let s be a stack, initially containing thestart symbol of the grammar, and letnext point to the input string.

while (top(s) != $)if (top(s) is a terminal) {

if (top(s) == *next) { pop(s); next++; }else error();

}else if (T[top(s), *next] == X → Y1⋯ Yn) {

pop(s);push(s, Yn⋯ Y1) /* Y1 on top */

}

Exercise 11

Give the steps that a predictiveparser takes to parse thefollowing input.

x + x * y

For each step (loop iteration),show the input stream, the stack,and the parser action.

Exercise 11

Give the steps that a predictiveparser takes to parse thefollowing input.

x + x * y

For each step (loop iteration),show the input stream, the stack,and the parser action.

Acknowledgements

Plus Stanford University lecturenotes by Maggie Johnson andJulie Zelenski.

Acknowledgements

Plus Stanford University lecturenotes by Maggie Johnson andJulie Zelenski.

APPENDIX

APPENDIX

Chomsky hierarchy

Grammar Valid productions

Unrestricted 𝛼 → 𝛽

Context-Sensitive 𝛼 x γ → 𝛼𝛽 γ

Context-Free x → 𝛽

Regularx → tx → t zx → 𝜀

Let t range over terminals, x andz over non-terminals and , 𝛽 andγ over sequences of terminals, non-

terminals, and 𝜀.

Chomsky hierarchy

Grammar Valid productions

Unrestricted 𝛼 → 𝛽

Context-Sensitive 𝛼 x γ → 𝛼𝛽 γ

Context-Free x → 𝛽

Regularx → tx → t zx → 𝜀

Let t range over terminals, x andz over non-terminals and , 𝛽 andγ over sequences of terminals, non-

terminals, and 𝜀.

Backus-Naur Form

BNF is a standard ASCII notationfor specification of context-freegrammars whose terminals areASCII characters. For example:

<exp> ::= <exp> "+" <exp>| <exp> "-" <exp>| <var>

<var> ::= "x" | "y"

The BNF notation can itself bespecified in BNF.

Documents

Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic