145
Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing

Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Lexical and Syntax Analysis(of Programming Languages)

Top-Down Parsing

Page 2: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Lexical and Syntax Analysis(of Programming Languages)

Top-Down Parsing

Page 3: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Data structure

Easy for programs

to transform

String ofcharacters

Easy for humansto write andunderstand

Lexemes identified

String oftokens

Page 4: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Data structure

Easy for programs

to transform

String ofcharacters

Easy for humansto write andunderstand

Lexemes identified

String oftokens

Page 5: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

PART 1:SYNTAX OF LANGUAGES

• Context-Free Grammars

• Derivations

• Parse Trees

• Ambiguity

• Precedence and Associativity

Page 6: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

PART 1:SYNTAX OF LANGUAGES

• Context-Free Grammars

• Derivations

• Parse Trees

• Ambiguity

• Precedence and Associativity

Page 7: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Syntax

The syntax is a set of rulesdefining valid strings of alanguage, often specified by acontext-free grammar.

For example, a grammar E forarithmetic expressions:

e → x| y| e + e| e – e| e * e| ( e )

Page 8: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Syntax

The syntax is a set of rulesdefining valid strings of alanguage, often specified by acontext-free grammar.

For example, a grammar E forarithmetic expressions:

e → x| y| e + e| e – e| e * e| ( e )

Page 9: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Context-free grammars

Have four components:

1. A set of terminal symbols.

2. A set of non-terminal symbols.

3. A set of productions (or rules) ofthe form:

where n is a non-terminal andX1⋯Xn is any sequence ofterminals, non-terminals, and 𝜀.

4. The start symbol (one of thenon-terminals).

n → X1⋯ Xn

Page 10: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Context-free grammars

Have four components:

1. A set of terminal symbols.

2. A set of non-terminal symbols.

3. A set of productions (or rules) ofthe form:

where n is a non-terminal andX1⋯Xn is any sequence ofterminals, non-terminals, and 𝜀.

4. The start symbol (one of thenon-terminals).

n → X1⋯ Xn

Page 11: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Notation

Non-terminals are underlined.

Rather than writing

we may write:

(Also, symbols → and ::= will beused interchangeably.)

e → xe → e + e

e → x| e + e

Page 12: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Notation

Non-terminals are underlined.

Rather than writing

we may write:

(Also, symbols → and ::= will beused interchangeably.)

e → xe → e + e

e → x| e + e

Page 13: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Why context-free?

Regular

ContextFree

ContextSensitive

Unrestricted

Nice balance between expressivepower and efficiency of parsing.

Page 14: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Why context-free?

Regular

ContextFree

ContextSensitive

Unrestricted

Nice balance between expressivepower and efficiency of parsing.

Page 15: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Derivations

A derivation is a proof that thesome string conforms to agrammar.

For example:

e ⇒ e + e⇒ x + e⇒ x + ( e )⇒ x + ( e * e )⇒ x + ( y * e )⇒ x + ( y * x )

Page 16: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Derivations

A derivation is a proof that thesome string conforms to agrammar.

For example:

e ⇒ e + e⇒ x + e⇒ x + ( e )⇒ x + ( e * e )⇒ x + ( y * e )⇒ x + ( y * x )

Page 17: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Derivations

Leftmost derivation: alwaysexpand the leftmost non-terminal when applying thegrammar rules.

Rightmost derivation: alwaysexpand the rightmost non-terminal, e.g.

e ⇒ e + e⇒ e + ( e )⇒ e + ( x )⇒ x + ( x )

Page 18: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Derivations

Leftmost derivation: alwaysexpand the leftmost non-terminal when applying thegrammar rules.

Rightmost derivation: alwaysexpand the rightmost non-terminal, e.g.

e ⇒ e + e⇒ e + ( e )⇒ e + ( x )⇒ x + ( x )

Page 19: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Parse tree:motivation

Like a derivation: a proof that agiven input is valid according tothe grammar. But a parse tree:

is more concise: we don’t writeout the sentence every time anon-terminal is expanded.

abstracts over the order inwhich rules are applied.

Page 20: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Parse tree:motivation

Like a derivation: a proof that agiven input is valid according tothe grammar. But a parse tree:

is more concise: we don’t writeout the sentence every time anon-terminal is expanded.

abstracts over the order inwhich rules are applied.

Page 21: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Parse tree:intuition

If non-terminal n has a production

n → X Y Z

where X, Y, and Z are terminals ornon-terminals, then a parse treemay have an interior node labelledn with three children labelled X, Y,and Z.

n

X Y Z

Page 22: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Parse tree:intuition

If non-terminal n has a production

n → X Y Z

where X, Y, and Z are terminals ornon-terminals, then a parse treemay have an interior node labelledn with three children labelled X, Y,and Z.

n

X Y Z

Page 23: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Parse tree:definition

A parse tree is a tree in which:

the root is labelled by the startsymbol;

each leaf is labelled by a terminalsymbol, or 𝜀;

each interior node is labelled by anon-terminal;

if n is a non-terminal labelling aninterior node whose children areX1, X2, ⋯, Xn then there must exista production n→ X1 X2 ⋯ Xn.

Page 24: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Parse tree:definition

A parse tree is a tree in which:

the root is labelled by the startsymbol;

each leaf is labelled by a terminalsymbol, or 𝜀;

each interior node is labelled by anon-terminal;

if n is a non-terminal labelling aninterior node whose children areX1, X2, ⋯, Xn then there must exista production n→ X1 X2 ⋯ Xn.

Page 25: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 1

Example input string:

Resulting parse tree accordingto grammar E:

x + y * x

e

x

+

*e

e

e

y

x

e

Page 26: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 1

Example input string:

Resulting parse tree accordingto grammar E:

x + y * x

e

x

+

*e

e

e

y

x

e

Page 27: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 2

The following is not a parse treeaccording to grammar E.

e

x

+

*e

e

e

y

x

Why? Because e → x + e is not aproduction in grammar E.

Page 28: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 2

The following is not a parse treeaccording to grammar E.

e

x

+

*e

e

e

y

x

Why? Because e → x + e is not aproduction in grammar E.

Page 29: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Syntax Analysis

String of symbols

Parse tree

A parse tree is:

1. A proof that a given input is validaccording to the grammar;

2. A structure-rich representation ofthe input that can be stored in adata structure that is convenientto process.

(Syntax analysis may also report thatthe input string is invalid.)

Page 30: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Syntax Analysis

String of symbols

Parse tree

A parse tree is:

1. A proof that a given input is validaccording to the grammar;

2. A structure-rich representation ofthe input that can be stored in adata structure that is convenientto process.

(Syntax analysis may also report thatthe input string is invalid.)

Page 31: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Ambiguity

If there exists more than oneparse tree for any string then thegrammar is ambiguous. Forexample, the string x+y*x hastwo parse trees:

e

e + e

x e * e

y x

e

*e

e + e

x y

e

x

Page 32: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Ambiguity

If there exists more than oneparse tree for any string then thegrammar is ambiguous. Forexample, the string x+y*x hastwo parse trees:

e

e + e

x e * e

y x

e

*e

e + e

x y

e

x

Page 33: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Operator precedence

Different parse trees often havedifferent meanings, so we usuallywant unambiguous grammars.

Conventionally, * has a higherprecedence (binds tighter) than +,so there is only one interpretationof x+y*x, namely x+(y*x).

Page 34: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Operator precedence

Different parse trees often havedifferent meanings, so we usuallywant unambiguous grammars.

Conventionally, * has a higherprecedence (binds tighter) than +,so there is only one interpretationof x+y*x, namely x+(y*x).

Page 35: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Operator associativity

Binary operators are either:

Conventionally, - is left-associative,so there is only one interpretationof x-x-x, namely (x-x)-x.

left-associative;

right-associative;

non-associative.

Even with operator precedencerules, ambiguity remains, e.g. x-x-x.

Page 36: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Operator associativity

Binary operators are either:

Conventionally, - is left-associative,so there is only one interpretationof x-x-x, namely (x-x)-x.

left-associative;

right-associative;

non-associative.

Even with operator precedencerules, ambiguity remains, e.g. x-x-x.

Page 37: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 1

Give an unambiguous grammar forexpressions, using these rules ofassociativity and precedence.

Let all operators be left associative,and let * bind tighter than + and –.

e → x| y| e + e| e – e| e * e| ( e )

Recall grammar E:

Page 38: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 1

Give an unambiguous grammar forexpressions, using these rules ofassociativity and precedence.

Let all operators be left associative,and let * bind tighter than + and –.

e → x| y| e + e| e – e| e * e| ( e )

Recall grammar E:

Page 39: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer: step-by-step

Given a non-terminal e whichinvolves operators at n levels ofprecedence:

Step 1: introduce n+1 new non-terminals, e0 ⋯ en.

Page 40: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer: step-by-step

Given a non-terminal e whichinvolves operators at n levels ofprecedence:

Step 1: introduce n+1 new non-terminals, e0 ⋯ en.

Page 41: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Step 2: replace each production

e → e op e

with

ei → ei op ei+1

| ei+1

if op is left-associative, or

ei → ei+1 op ei

| ei+1

if op is right-associative

Let op denote an operator withprecedence i.

Page 42: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Step 2: replace each production

e → e op e

with

ei → ei op ei+1

| ei+1

if op is left-associative, or

ei → ei+1 op ei

| ei+1

if op is right-associative

Let op denote an operator withprecedence i.

Page 43: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Grammar E after step 2 becomes:

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e → ( e )| x| y

Operator Precedence

+, - 0

* 1

Construct the precedence table:

Page 44: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Grammar E after step 2 becomes:

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e → ( e )| x| y

Operator Precedence

+, - 0

* 1

Construct the precedence table:

Page 45: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Step 3: replace each production

e → ⋯

with

en → ⋯

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 3:

Page 46: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Step 3: replace each production

e → ⋯

with

en → ⋯

e0 → e0 + e1

| e0 – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 3:

Page 47: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Step 4: replace all occurrences ofe0 with e.

e → e + e1

| e – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 4:

Page 48: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Step 4: replace all occurrences ofe0 with e.

e → e + e1

| e – e1

| e1

e1 → e1 * e2

| e2

e2 → ( e )| x| y

After step 4:

Page 49: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 2

Consider the following ambiguousgrammar for logical propositions.

p → 0 (Zero)| 1 (One)| ~ p (Negation)| p + p (Disjunction)| p * p (Conjunction)

Now let + and * be right associativeand the operators in increasing orderof binding strength be : +, *, ~.

Give an unambiguous grammar forlogical propositions.

Page 50: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 2

Consider the following ambiguousgrammar for logical propositions.

p → 0 (Zero)| 1 (One)| ~ p (Negation)| p + p (Disjunction)| p * p (Conjunction)

Now let + and * be right associativeand the operators in increasing orderof binding strength be : +, *, ~.

Give an unambiguous grammar forlogical propositions.

Page 51: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 3

Which of the following grammarsare ambiguous?

s → if b then s| if b then s else s| skip

e → + e e| – e e| x

b → 0 b 1| 0 1

Page 52: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 3

Which of the following grammarsare ambiguous?

s → if b then s| if b then s else s| skip

e → + e e| – e e| x

b → 0 b 1| 0 1

Page 53: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Summary of Part 1

Syntax of a language is oftenspecified by a context-freegrammar

Derivations and parse trees areproofs that a string is acceptedby a grammar.

Construction of unambiguousgrammars using rules ofprecedence and associativity.

Page 54: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Summary of Part 1

Syntax of a language is oftenspecified by a context-freegrammar

Derivations and parse trees areproofs that a string is acceptedby a grammar.

Construction of unambiguousgrammars using rules ofprecedence and associativity.

Page 55: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

PART 2:TOP-DOWN PARSING

• Recursive-Descent

• Backtracking

• Left-Factoring

• Predictive Parsing

• Left-Recursion Removal

• First and Follow Sets

• Parsing tables and LL(1)

Page 56: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

PART 2:TOP-DOWN PARSING

• Recursive-Descent

• Backtracking

• Left-Factoring

• Predictive Parsing

• Left-Recursion Removal

• First and Follow Sets

• Parsing tables and LL(1)

Page 57: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Top-down parsing

Top-down: begin with the startsymbol and expand non-terminals,succeeding when the input stringis matched.

A good strategy for writing parsers:

1. Implement a syntax checker toaccept or refute input strings.

2. Modify the checker to constructa parse tree – straightforward.

Page 58: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Top-down parsing

Top-down: begin with the startsymbol and expand non-terminals,succeeding when the input stringis matched.

A good strategy for writing parsers:

1. Implement a syntax checker toaccept or refute input strings.

2. Modify the checker to constructa parse tree – straightforward.

Page 59: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

RECURSIVE DESCENT

A popular top-down parsing technique.

Page 60: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

RECURSIVE DESCENT

A popular top-down parsing technique.

Page 61: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Recursive descent

A recursive descent parserconsists of a set of functions,one for each non-terminal.

The function for non-terminal nreturns true if some prefix ofthe input string can be derivedfrom n, and false otherwise.

Page 62: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Recursive descent

A recursive descent parserconsists of a set of functions,one for each non-terminal.

The function for non-terminal nreturns true if some prefix ofthe input string can be derivedfrom n, and false otherwise.

Page 63: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Consuming the input

int eat(char c) {if (*next == c) {

next++;return 1;

}return 0;

}

Consume c from input if possible.

We assume a global variable nextpoints to the input string.

char* next;

Page 64: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Consuming the input

int eat(char c) {if (*next == c) {

next++;return 1;

}return 0;

}

Consume c from input if possible.

We assume a global variable nextpoints to the input string.

char* next;

Page 65: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Recursive descent

int N() {char* save = next;

for each N → X1 X2 ⋯ Xn

if (parser(X1) &&parser(X2) &&

⋯ &&parser(Xn)) return 1;

else next = save;

return 0;}

For each non-terminal N, introduce:

Let parser(X) denote

X() if X is a non-terminal

eat(X) if X is a terminal

Backtrack

Page 66: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Recursive descent

int N() {char* save = next;

for each N → X1 X2 ⋯ Xn

if (parser(X1) &&parser(X2) &&

⋯ &&parser(Xn)) return 1;

else next = save;

return 0;}

For each non-terminal N, introduce:

Let parser(X) denote

X() if X is a non-terminal

eat(X) if X is a terminal

Backtrack

Page 67: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 4

Consider the following grammar Gwith start symbol e.

Using recursive descent, write asyntax checker for grammar G.

e → ( e + e )| ( e * e )| v

v → x| y

Page 68: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 4

Consider the following grammar Gwith start symbol e.

Using recursive descent, write asyntax checker for grammar G.

e → ( e + e )| ( e * e )| v

v → x| y

Page 69: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 1)

int e() {char* save = next;

if (eat('(') && e() && eat('+') &&e() && eat(')')) return 1;

else next = save;

if (eat('(') && e() && eat('*') &&e() && eat(')')) return 1;

else next = save;

if (v()) return 1;else next = save;

return 0;}

Page 70: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 1)

int e() {char* save = next;

if (eat('(') && e() && eat('+') &&e() && eat(')')) return 1;

else next = save;

if (eat('(') && e() && eat('*') &&e() && eat(')')) return 1;

else next = save;

if (v()) return 1;else next = save;

return 0;}

Page 71: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 2)

int v() {char* save = next;

if (eat('x')) return 1;else next = save;

if (eat('y')) return 1;else next = save;

return 0;}

Page 72: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 2)

int v() {char* save = next;

if (eat('x')) return 1;else next = save;

if (eat('y')) return 1;else next = save;

return 0;}

Page 73: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 5

How many function calls aremade by the recursive descentparser to parse the followingstrings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

(See animation of backtracking.)

Page 74: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 5

How many function calls aremade by the recursive descentparser to parse the followingstrings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

(See animation of backtracking.)

Page 75: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

Input string Length Calls

(x*x) 5 21

((x*x)*x) 9 53

(((x*x)*x)*x) 13 117

Number of calls is quadratic inthe length of the input string.

Lesson: backtracking expensive!

Page 76: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

Input string Length Calls

(x*x) 5 21

((x*x)*x) 9 53

(((x*x)*x)*x) 13 117

Number of calls is quadratic inthe length of the input string.

Lesson: backtracking expensive!

Page 77: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

LEFT FACTORING

Reducing backtracking!

Page 78: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

LEFT FACTORING

Reducing backtracking!

Page 79: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Left factoring

When two productions for anon-terminal share a commonprefix, expensive backtrackingcan be avoided by left-factoringthe grammar.

Idea: Introduce a new non-terminal that accepts each ofthe different suffixes.

Page 80: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Left factoring

When two productions for anon-terminal share a commonprefix, expensive backtrackingcan be avoided by left-factoringthe grammar.

Idea: Introduce a new non-terminal that accepts each ofthe different suffixes.

Page 81: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 3

Left-factoring grammar G byintroducing non-terminal r:

e → ( e r| v

r → + e )| * e )

v → x| y

Common prefix

Different suffixes

Page 82: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 3

Left-factoring grammar G byintroducing non-terminal r:

e → ( e r| v

r → + e )| * e )

v → x| y

Common prefix

Different suffixes

Page 83: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 6

How many function calls aremade by the recursive descentparser (after left-factoring) toparse the following strings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

Page 84: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 6

How many function calls aremade by the recursive descentparser (after left-factoring) toparse the following strings?

(x*x)

((x*x)*x)

(((x*x)*x)*x)

Page 85: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

Input string Length Calls

(x*x) 5 13

((x*x)*x) 9 22

(((x*x)*x)*x) 13 31

Number of calls is now linear inthe length of input string.

Lesson: left-factoring a grammarreduces backtracking.

Page 86: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

Input string Length Calls

(x*x) 5 13

((x*x)*x) 9 22

(((x*x)*x)*x) 13 31

Number of calls is now linear inthe length of input string.

Lesson: left-factoring a grammarreduces backtracking.

Page 87: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

PREDICTIVE PARSING

Eliminating backtracking!

Page 88: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

PREDICTIVE PARSING

Eliminating backtracking!

Page 89: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Predictive parsing

Idea: know which production of anon-terminal to choose basedsolely on the next input symbol.

Advantage: very efficient since iteliminates all backtracking.

Disadvantage: not all grammarscan be parsed in this way. (Butmany useful ones can.)

Page 90: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Predictive parsing

Idea: know which production of anon-terminal to choose basedsolely on the next input symbol.

Advantage: very efficient since iteliminates all backtracking.

Disadvantage: not all grammarscan be parsed in this way. (Butmany useful ones can.)

Page 91: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Running example

The following grammar H will beused as a running example todemonstrate predictive parsing.

Example:

e → e + e| e * e| ( e )| x| y

x+y*(y+x)

Page 92: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Running example

The following grammar H will beused as a running example todemonstrate predictive parsing.

Example:

e → e + e| e * e| ( e )| x| y

x+y*(y+x)

Page 93: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Removing ambiguity

Since + and * are left-associativeand * binds tighter than +, wecan derive an unambiguousvariant of H.

e → e + t| t

t → t * f| f

f → ( e )| x| y

Page 94: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Removing ambiguity

Since + and * are left-associativeand * binds tighter than +, wecan derive an unambiguousvariant of H.

e → e + t| t

t → t * f| f

f → ( e )| x| y

Page 95: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Left recursion

Problem: left-recursive grammarscause recursive descent parsers toloop forever.

int e() {char* save = next;

if (e() && eat('+') && t()) return 1;next = save;

if (t()) return 1;next = save;

return 0;}

Call to self withoutconsuming any input

Page 96: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Left recursion

Problem: left-recursive grammarscause recursive descent parsers toloop forever.

int e() {char* save = next;

if (e() && eat('+') && t()) return 1;next = save;

if (t()) return 1;next = save;

return 0;}

Call to self withoutconsuming any input

Page 97: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Eliminating left recursion

n → 𝛼 n → 𝛼 n'⟹

n' → 𝛼 n'⟹Rule 1

Rule 2

where 𝛼 does not begin with n

Let 𝛼 denote any sequence ofgrammar symbols.

n' → 𝜀

Rule 3Introduce new

production

n → n 𝛼

Page 98: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Eliminating left recursion

n → 𝛼 n → 𝛼 n'⟹

n' → 𝛼 n'⟹Rule 1

Rule 2

where 𝛼 does not begin with n

Let 𝛼 denote any sequence ofgrammar symbols.

n' → 𝜀

Rule 3Introduce new

production

n → n 𝛼

Page 99: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 4

Running example, after eliminatingleft-recursion.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 100: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Example 4

Running example, after eliminatingleft-recursion.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 101: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

first and follow sets

Predictive parsers are built usingthe first and follow sets of eachnon-terminal in a grammar.

The first set of a non-terminal n isthe set of symbols that can begin astring derived from n.

The follow set of a non-terminal nis the set of symbols that canimmediately follow n in any step ofa derivation.

Page 102: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

first and follow sets

Predictive parsers are built usingthe first and follow sets of eachnon-terminal in a grammar.

The first set of a non-terminal n isthe set of symbols that can begin astring derived from n.

The follow set of a non-terminal nis the set of symbols that canimmediately follow n in any step ofa derivation.

Page 103: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Definition of first sets

Let 𝛼 denote any sequence ofgrammar symbols.

If 𝛼 can derive a string beginningwith terminal a then a ∊ first(𝛼).

If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).

Page 104: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Definition of first sets

Let 𝛼 denote any sequence ofgrammar symbols.

If 𝛼 can derive a string beginningwith terminal a then a ∊ first(𝛼).

If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).

Page 105: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Computing first sets

If a is a terminal then a ∊ first(a).

If there exists a production

n → X1 X2 ⋯ Xn

and ∃i · a ∊ first(Xi)

and ∀j < i · 𝜀 ∊ first(Xj)

then a ∊ first(n).

If n → 𝜀 then 𝜀 ∊ first(n).

Page 106: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Computing first sets

If a is a terminal then a ∊ first(a).

If there exists a production

n → X1 X2 ⋯ Xn

and ∃i · a ∊ first(Xi)

and ∀j < i · 𝜀 ∊ first(Xj)

then a ∊ first(n).

If n → 𝜀 then 𝜀 ∊ first(n).

Page 107: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 7

What are the first sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 108: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 7

What are the first sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 109: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

first( f ) = { ‘(‘, ‘x’, ‘y’ }

first( t' ) = { ‘*’, 𝜀 }first( t ) = { ‘(‘, ‘x’, ‘y’ }

first( e' ) = { ‘+’, 𝜀 }first( e ) = { ‘(‘, ‘x’, ‘y’ }

Page 110: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

first( f ) = { ‘(‘, ‘x’, ‘y’ }

first( t' ) = { ‘*’, 𝜀 }first( t ) = { ‘(‘, ‘x’, ‘y’ }

first( e' ) = { ‘+’, 𝜀 }first( e ) = { ‘(‘, ‘x’, ‘y’ }

Page 111: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Definition of follow sets

Let 𝛼 and 𝛽 denote any sequenceof grammar symbols.

Terminal a ∊ follow(n) if the startsymbol of the grammar can derivea string of grammar symbols inwhich a immediately follows n.

The set follow(n) never contains 𝜀.

Page 112: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Definition of follow sets

Let 𝛼 and 𝛽 denote any sequenceof grammar symbols.

Terminal a ∊ follow(n) if the startsymbol of the grammar can derivea string of grammar symbols inwhich a immediately follows n.

The set follow(n) never contains 𝜀.

Page 113: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

End markers

In predictive parsing, it is useful tomark the end of the input stringwith a $ symbol.

If the start symbol can derive astring of grammar symbols inwhich n is the rightmost symbolthen $ is in follow(n).

Page 114: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

End markers

In predictive parsing, it is useful tomark the end of the input stringwith a $ symbol.

If the start symbol can derive astring of grammar symbols inwhich n is the rightmost symbolthen $ is in follow(n).

Page 115: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Computing follow sets

If s is the start symbol of thegrammar then $ ∊ follow(s).

If n → 𝛼 x 𝛽 then everything infirst(𝛽) except 𝜀 is in follow(x).

If n → 𝛼 x

or n → 𝛼 x𝛽 and 𝜀 ∊ first(𝛽)

then everything in follow(n) is infollow(x).

Page 116: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Computing follow sets

If s is the start symbol of thegrammar then $ ∊ follow(s).

If n → 𝛼 x 𝛽 then everything infirst(𝛽) except 𝜀 is in follow(x).

If n → 𝛼 x

or n → 𝛼 x𝛽 and 𝜀 ∊ first(𝛽)

then everything in follow(n) is infollow(x).

Page 117: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 8

What are the follow sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 118: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 8

What are the follow sets for eachnon-terminal in the followinggrammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 119: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

follow( e' ) = { $, ‘)’ }follow( e ) = { $, ‘)’ }

follow( t' ) = { ‘+’, $, ‘)’ }follow( t ) = { ‘+’, $, ‘)’ }

follow( f ) = { ‘*’, ‘+’, ‘)’, $ }

Page 120: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer

follow( e' ) = { $, ‘)’ }follow( e ) = { $, ‘)’ }

follow( t' ) = { ‘+’, $, ‘)’ }follow( t ) = { ‘+’, $, ‘)’ }

follow( f ) = { ‘*’, ‘+’, ‘)’, $ }

Page 121: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Predictive parsing table

For each non-terminal n, a parsetable T defines which productionof n should be chosen, based onthe next input symbol.

for each production n → 𝛼for each a ∊ first(𝛼)

add n → 𝛼 to T[n , a]if 𝜀 ∊ first(𝛼) then

for each b ∊ follow(n)add n → 𝛼 to T[n , a]

Page 122: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Predictive parsing table

For each non-terminal n, a parsetable T defines which productionof n should be chosen, based onthe next input symbol.

for each production n → 𝛼for each a ∊ first(𝛼)

add n → 𝛼 to T[n , a]if 𝜀 ∊ first(𝛼) then

for each b ∊ follow(n)add n → 𝛼 to T[n , a]

Page 123: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 9

Construct a predictive parsingtable for the following grammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 124: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 9

Construct a predictive parsingtable for the following grammar.

e → t e'e' → + t e'

| 𝜀

t → f t't' → * f t'

| 𝜀

f → ( e )| x| y

Page 125: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

LL(1) grammars

If each cell in the parse tablecontains at most one entry thenthe a non-backtracking parsercan be constructed and thegrammar is said to be LL(1).

First L: left-to-right scanning ofthe input.

Second L: a leftmost derivationis constructed.

The (1): using one input symbolof look-ahead to decide whichgrammar production to choose.

Page 126: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

LL(1) grammars

If each cell in the parse tablecontains at most one entry thenthe a non-backtracking parsercan be constructed and thegrammar is said to be LL(1).

First L: left-to-right scanning ofthe input.

Second L: a leftmost derivationis constructed.

The (1): using one input symbolof look-ahead to decide whichgrammar production to choose.

Page 127: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 10

Write a syntax checker for thegrammar of Exercise 9, utilisingthe predictive parsing table.

int e() {...

}

It should return a non-zero valueif some prefix of the stringpointed to by next conforms tothe grammar, otherwise it shouldreturn zero.

Page 128: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 10

Write a syntax checker for thegrammar of Exercise 9, utilisingthe predictive parsing table.

int e() {...

}

It should return a non-zero valueif some prefix of the stringpointed to by next conforms tothe grammar, otherwise it shouldreturn zero.

Page 129: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 1)

int e() {if (*next == 'x') return t() && e1();if (*next == 'y') return t() && e1();if (*next == '(') return t() && e1();return 0;

}

int e1(){

if (*next == '+')return eat('+') && t() && e1();

if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Page 130: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 1)

int e() {if (*next == 'x') return t() && e1();if (*next == 'y') return t() && e1();if (*next == '(') return t() && e1();return 0;

}

int e1(){

if (*next == '+')return eat('+') && t() && e1();

if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Page 131: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 2)

int t() {if (*next == 'x') return f() && t1();if (*next == 'y') return f() && t1();if (*next == '(') return f() && t1();return 0;

}

int t1() {if (*next == '+') return 1;if (*next == '*‘)

return eat('*') && f() && t1();if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Page 132: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 2)

int t() {if (*next == 'x') return f() && t1();if (*next == 'y') return f() && t1();if (*next == '(') return f() && t1();return 0;

}

int t1() {if (*next == '+') return 1;if (*next == '*‘)

return eat('*') && f() && t1();if (*next == ')') return 1;if (*next == '\0') return 1;return 0;

}

Page 133: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 3)

int f() {if (*next == 'x') return eat('x');if (*next == 'y') return eat('y');if (*next == '(')

return eat('(') && e() && eat(')');return 0;

}

(Notice how backtracking is notrequired.)

Page 134: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Answer (part 3)

int f() {if (*next == 'x') return eat('x');if (*next == 'y') return eat('y');if (*next == '(')

return eat('(') && e() && eat(')');return 0;

}

(Notice how backtracking is notrequired.)

Page 135: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Predictive parsing algorithm

Let s be a stack, initially containing thestart symbol of the grammar, and letnext point to the input string.

while (top(s) != $)if (top(s) is a terminal) {

if (top(s) == *next) { pop(s); next++; }else error();

}else if (T[top(s), *next] == X → Y1⋯ Yn) {

pop(s);push(s, Yn⋯ Y1) /* Y1 on top */

}

Page 136: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Predictive parsing algorithm

Let s be a stack, initially containing thestart symbol of the grammar, and letnext point to the input string.

while (top(s) != $)if (top(s) is a terminal) {

if (top(s) == *next) { pop(s); next++; }else error();

}else if (T[top(s), *next] == X → Y1⋯ Yn) {

pop(s);push(s, Yn⋯ Y1) /* Y1 on top */

}

Page 137: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 11

Give the steps that a predictiveparser takes to parse thefollowing input.

x + x * y

For each step (loop iteration),show the input stream, the stack,and the parser action.

Page 138: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Exercise 11

Give the steps that a predictiveparser takes to parse thefollowing input.

x + x * y

For each step (loop iteration),show the input stream, the stack,and the parser action.

Page 139: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Acknowledgements

Plus Stanford University lecturenotes by Maggie Johnson andJulie Zelenski.

Page 140: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Acknowledgements

Plus Stanford University lecturenotes by Maggie Johnson andJulie Zelenski.

Page 141: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

APPENDIX

Page 142: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

APPENDIX

Page 143: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Chomsky hierarchy

Grammar Valid productions

Unrestricted 𝛼 → 𝛽

Context-Sensitive 𝛼 x γ → 𝛼𝛽 γ

Context-Free x → 𝛽

Regularx → tx → t zx → 𝜀

Let t range over terminals, x andz over non-terminals and , 𝛽 andγ over sequences of terminals, non-

terminals, and 𝜀.

Page 144: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Chomsky hierarchy

Grammar Valid productions

Unrestricted 𝛼 → 𝛽

Context-Sensitive 𝛼 x γ → 𝛼𝛽 γ

Context-Free x → 𝛽

Regularx → tx → t zx → 𝜀

Let t range over terminals, x andz over non-terminals and , 𝛽 andγ over sequences of terminals, non-

terminals, and 𝜀.

Page 145: Lexical and Syntax AnalysisSyntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic

Backus-Naur Form

BNF is a standard ASCII notationfor specification of context-freegrammars whose terminals areASCII characters. For example:

<exp> ::= <exp> "+" <exp>| <exp> "-" <exp>| <var>

<var> ::= "x" | "y"

The BNF notation can itself bespecified in BNF.