20
The meaning of it all! The role of finite automata and grammars in compiler design

The meaning of it all! The role of finite automata and grammars in compiler design

Embed Size (px)

Citation preview

Page 1: The meaning of it all! The role of finite automata and grammars in compiler design

The meaning of it all!

The role of finite automata and grammars in compiler design

Page 2: The meaning of it all! The role of finite automata and grammars in compiler design

Compiler-Compilers!There are many applications of automata and grammars inside and outside computer science, the main applications in computer science being in the area of compiler design.

Suppose we have designed (on paper) a new programming language with nice features. We have worked out their syntax and the way they should work. Now, all we need is a compiler for this language.

It’s too complex to write a compiler from scratch!

What we could do is to make use of theoretical tools like automata and grammars that recognize/generate strings of symbols of various kinds* and formally specify the syntax of the new computer programming language. Such a formal specification (plus other details) can be used by “magical programs” known as compiler-compilers to automatically generate a compiler for the programming language!

But for these theoretical tools we would have to spell out the syntax of a language in, say, plain English which will not be precise enough and programs would find it hard to “understand” such a description to be able to generate a compiler on its own.

* A program can be viewed as a (very long!) string that adheres to certain rules dictated by the programming language.

Page 3: The meaning of it all! The role of finite automata and grammars in compiler design

Admiral Grace Hopper, Pioneer of compiler design

Page 4: The meaning of it all! The role of finite automata and grammars in compiler design

Lexical Analysis

Lexical AnalyzerLexical Analyzer

for(i=0;i<=10;i++)

identifier

keyword

ii

1010

Symbol Table

Raw stream of characters

Stream of tokens

for ( i = 0 ; <= 10 ++ )i ; i

constant

Page 5: The meaning of it all! The role of finite automata and grammars in compiler design

Parsing

‘Parse’ – to relate StatementStatement

for ( ; ; ) statement

FOR-statementFOR-statement

expexp

Assign_stmt expexp expexp

expexp

expexpidid =

ii constconst

00

idid ++

ii<=

idid constconst

ii 1010

expexp

For ( I = 0 ; I <= 10 ; I ++ )

assignmentassignment

Page 6: The meaning of it all! The role of finite automata and grammars in compiler design

Finite state automata as lexical analysers

0W

1H

2I

3L

4E

5 6

other than letter/digit

7 8 9

other than letter/digitF O R

10

11 12FI

other than letter/digit

13

14L

15S

16E

17 18

other than letter/digitE

Automaton for recognizing keywords

Automaton for recognizing identifiers

0letter

1

other than letter / digit

letter, digit

2

Page 7: The meaning of it all! The role of finite automata and grammars in compiler design

Converting a Finite state automaton into a computer program

A: Read next_charIf next_char is a letter goto Belse FAIL( )

B: Read next_char

If next_char is either a letter

or a digit goto B

else goto C

Automata for recognizing identifiers

Aletter

B

other than letter / digit

letter, digit

C

Note: Instead of using “A” and “B” as labels for GOTO statements, one could use them as names of individual functions/procedures that can be invoked.

FAIL( ) is a function that “puts back” the character just read and starts up the next transition diagram.

Page 8: The meaning of it all! The role of finite automata and grammars in compiler design

Grammars as syntax specification tools

Finite state automata are used to describe tokens. Grammars are much more “expressive” than finite state automata and can be used to describe more complicated syntactical structures in a program---for instance, the syntax of a FOR statement in C language.

Grammars only describe/generate strings. We need a process which, given an input string (a statement in a program, say), pronounces whether it is derivable from a given grammar or not. Such a process is known as parsing.

Page 9: The meaning of it all! The role of finite automata and grammars in compiler design

Types of Parsing

S aAcBeA Ab | bB d

a b b c d e

AA

B

SS

(i)

(ii) Reducing the input string to the start symbol

“Expanding” the start symbol down to the input string

A Ba c e

b

A b d

“EXPANDING” the start symbol (according to production rules of the given grammar), and subsequently every non-terminal symbol that occurs in the “expansion”* till we arrive at the input string. (* technically, it is called a sentential form)

top down

bottom up

We take a “chunk” of the input string and REDUCE it to (replace it with) the symbol on LHS of a production rule.In other words, the parse tree is constructed by beginning at the leaves and working up towards the root.

Page 10: The meaning of it all! The role of finite automata and grammars in compiler design

Shift-Reduce: a bottom up parsing technique

a b b c d e

Input

$ a

We shift symbols from input string (from left to right) usually onto a stack so that the “chunk” of symbols (matching the RHS of a production) which is to be reduced to the corresponding LHS wiil eventually appear on top of stack. (The chunk getting reduced is referred to as the “handle”.)

S aAcBeA Ab | bB d

$

b

$ a A b

$ a A c d

$ a A c B e

$ S

Page 11: The meaning of it all! The role of finite automata and grammars in compiler design

What is a handle?

A substring of the input string that matches the RHS of a production replacing which (by the corresponding LHS) would eventually lead to a reduction to the start symbol is called a handle.

S aAcBeA Ab | bB d

S aAcBeC

aAcde

aAbcde

abbcde

A Right Most Derivation (RMD)

Non-terminal symbols on the right get expanded first before those on the left get expanded. When we do this in reverse, though, (now reducing symbols---not expanding) pieces of string on the left get reduced first before those on the right.

Bottom up parsing can be viewed as “RMD in reverse direction”.

Page 12: The meaning of it all! The role of finite automata and grammars in compiler design

The problem with discovering handles

Discovering the handle may not be easy always!

There may be more than one substring emerging on top-of-stack that matches the RHS of a production.

a b b c d e

A A

$ a A b

$ a b

S aAcBeA Ab | bB d

$ a A

?

$ a A A

There’s no way a AAcde can be reduced to S. (When we make an incorrect choice of handle we get stuck half-way through, before we can arrive at the start symbol.)

Page 13: The meaning of it all! The role of finite automata and grammars in compiler design

The problem with discovering handles

In the exercises we did, we took decisions as to when to shift and when to reduce symbols (by ourselves, using our cleverness!).

However, these can (and must) be done automatically by the parser program in tune with the given grammar.

The well-known LR parser can do this and is beyond our present scope.

Page 14: The meaning of it all! The role of finite automata and grammars in compiler design

Top down parsing

Formal :

Construct parse tree (for the input) by beginning at the root and creating the nodes (of the tree) in preorder. In other words, it’s an attempt to find a Left Most Derivation for an input string.

Informal :

Instead of starting to work on the input string and reduce it to the start symbol (by replacing “chunks” of it with non-terminal symbols), we begin with the start symbol itself and ask: “How can I expand this in order to arrive (eventually) at the input string?” We ask the same question for every non-terminal symbol occurring in the resulting expansions. We choose an appropriate expansion of a certain non-terminal by glancing at the input string, i.e. by taking cues from the symbol being scanned (and also the next few symbols) in the input.

Page 15: The meaning of it all! The role of finite automata and grammars in compiler design

Top down parsing: an exampleS cAdA ab | a

c a d Input

Ac d

S

a b

Ac d

S

Start with S. Only one expansion is possible.

OK! It matches with the first symbol in the input.

Now, how to expand A? Try every expansion one by one!

c a d

mismatch!(so, try another expansion)

match!

c a d

aAc d

S

match!(so, move on!)

c a d

aAc d

S

match!(we’redone!)

(i) (ii)

(iii)

(iv)

Page 16: The meaning of it all! The role of finite automata and grammars in compiler design

Top down parsing: an example

function S( ){ if input_symbol = ‘c ’ then { ADVANCE( ); if A( ) then { if input_symbol = ‘d’ then { ADVANCE( ); return TRUE; } } } return FALSE;}

function A( ){ isave = input_pointer; if input_symbol = ‘a ’ then { ADVANCE( ); if input_symbol = ‘b’ then { ADVANCE( ); return TRUE; } } input_pointer = isave; /* Try second expansion */ if input_symbol = ‘a’ then { ADVANCE( ); return TRUE; } return FALSE;}

A program to do top-down parsing might use a separate procedure for every non-terminal

Page 17: The meaning of it all! The role of finite automata and grammars in compiler design

Problems with this approach

(i) Order in which the expansions are tried

S cAdA a | ab

c a b

Inputd

aAc d

S

mismatch!

match match

d is part of S and hence a new expansion for S will be tried (in vain!). Hence, cabd will be rejected as invalid (but is actually valid).

Remedy: Rewrite grammar so that there is no more than ONE expansion for every non-terminal sharing the same “prefix”; use “left factoring” to realise this.

Page 18: The meaning of it all! The role of finite automata and grammars in compiler design

Problems with this approach

(ii) Left recursion

A Aα

Production rule of the said form exhibits (immediate) left recursion. (More precisely,) a grammar has left recursion if, at some point, A “yields” Aα, i.e. if Aα can be derived from A in one or more steps.

Why is left recursion dangerous? It’s because the function A( ) corresponding to non-terminal A) will be forced to invoke itself repeatedly and endlessly.

Remedy:

To eliminate left recursion (from the grammar)!

Page 19: The meaning of it all! The role of finite automata and grammars in compiler design

Eliminating (immediate) left recursionA Aα | β

A β A’

A’ αA’ | ε

A A α A α α A α α α β α α α

E E + T | TT T * F | FF ( E ) | id

E TE’E’ +TE’ | εT FT’T’ *FT’ | εF ( E ) | id

e.g.

Page 20: The meaning of it all! The role of finite automata and grammars in compiler design

Left factoringA αβ | αγ

A αA’

A’ β | γ

S iCtS | iCtSeS | a

C b

e.g.

S iCtSS’ | aS’ eS | εC b