147
Compiler Design 15CS301 Instructor : Mr. R. Rajkumar, Assistant Professor | CSE Venue : TP-606, Tech park, SRM Institute of Science and Technology, Kattankulathur, India. 1 UNIT 1 Introduction to Compiler and Automata R. Rajkumar AP | CSE

Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Compiler Design – 15CS301

Instructor : Mr. R. Rajkumar, Assistant Professor | CSE

Venue : TP-606, Tech park,

SRM Institute of Science and Technology,

Kattankulathur, India. 1

UNIT 1 – Introduction to Compiler and Automata

R. Rajkumar AP | CSE

Page 2: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Q: Why Compiler Design?

Programming languages are the primary tools for all

computer programmers.

While many software engineers may claim to know a

number of languages enough that they can work with them,

it’s seen that they work within their comfort zones.

A number of very sophisticated features of programming

languages remain out of reach for a majority of

programmers.

R. Rajkumar AP | CSE2

Page 3: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Compilers provide you with the theoretical and practical

knowledge that is needed to implement a programming

language. Once you have learnt to do a compiler, you pretty

much know the innards of many programming languages.

Compilers have a plethora of sophisticated algorithms and

data-structures implemented within. So, if you are fascinated

with algorithms and data-structures, you will find several of

them at work in a compiler.

R. Rajkumar AP | CSE3

Q: Why Compiler Design? (2)

Page 4: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Compilers are complex software systems. If you can

truthfully claim that you have written a compiler with your

own hands, it is likely that there will be no questions asked

after that in any interview. A person who has made a

compiler can do anything.

R. Rajkumar AP | CSE4

Q: Why Compiler Design? (3)

Page 5: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The software architecture of a compiler is quite general. A

large variety of applications can be modelled after a

compiler (or some part thereof). Simulators, debuggers,

program analysis tools, editors, IDEs, RDBMSs, browsers,

OS shells, … have some significant elements of language

processing (read compiling) in them.

R. Rajkumar AP | CSE5

Q: Why Compiler Design? (4)

Page 6: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Let us start with a small history

R. Rajkumar AP | CSE6

Page 7: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Overview and History (1)

Cause Software for early computers was written in assembly language

The benefits of reusing software on different CPUs started to become significantly greater than the cost of writing a compiler

The first real compiler FORTRAN compilers of the late 1950s

18 person-years to build

7 R. Rajkumar AP | CSE

Page 8: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Overview and History (2)

8

Compiler technology is more broadly applicable and has been employed in

rather unexpected areas. Text-formatting languages,

like nroff and troff; preprocessor packages like eqn, tbl, pic

Silicon compiler for the creation of VLSI circuits

Command languages of OS

Query languages of Database systems

R. Rajkumar AP | CSE

Page 9: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

What Do Compilers Do (1)

9

A compiler acts as a translator,

transforming human-oriented programming languages

into computer-oriented machine languages.

Ignore machine-dependent details for programmer

Programming

Language

(Source)Compiler

Machine

Language

(Target)

R. Rajkumar AP | CSE

Page 10: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

What Do Compilers Do (2)

Compilers may generate three types of code:

Pure Machine Code

Machine instruction set without assuming the existence of any

operating system or library.

Mostly being OS or embedded applications.

Augmented Machine Code

Code with OS routines and runtime support routines.

More often

Virtual Machine Code

Virtual instructions, can be run on any architecture with a virtual

machine interpreter or a just-in-time compiler

Ex. Java

10 R. Rajkumar AP | CSE

Page 11: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

What Do Compilers Do (3)

Another way that compilers

differ from one another is in the format of the target

machine code they generate:

Assembly or other source format

Relocatable binary

Relative address

A linkage step is required

Absolute binary

Absolute address

Can be executed directly

11 R. Rajkumar AP | CSE

Page 12: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

12

Any compiler must perform two major tasks

Analysis of the source program

Synthesis of a machine-language program

The Structure of a Compiler (1)

Compiler

Analysis Synthesis

R. Rajkumar AP | CSE

Page 13: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (2)

13

Scanner ParserSemantic

Routines

Code

Generator

Optimizer

Source

Program Tokens Syntactic

Structure

Symbol and

Attribute

Tables

(Used by all Phases of The Compiler)

(Character Stream)

Intermediate

Representation

Target machine codeR. Rajkumar AP | CSE

Page 14: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (3)

14

Scanner ParserSemantic

Routines

Code

Generator

Optimizer

Source

Program Tokens Syntactic

Structure

Symbol and

Attribute

Tables

(Used by all

Phases of

The Compiler)

Scanner The scanner begins the analysis of the source program by

reading the input, character by character, and grouping

characters into individual words and symbols (tokens)

RE ( Regular expression )

NFA ( Non-deterministic Finite Automata )

DFA ( Deterministic Finite Automata )

LEX

(Character Stream)

Intermediate

Representation

Target machine codeR. Rajkumar AP | CSE

Page 15: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (4)

15

Scanner ParserSemantic

Routines

Code

Generator

Optimizer

Source

Program Tokens Syntactic

Structure

Symbol and

Attribute

Tables

(Used by all

Phases of

The Compiler)

Parser Given a formal syntax specification (typically as a context-

free grammar [CFG] ), the parse reads tokens and groups

them into units as specified by the productions of the CFG

being used.

As syntactic structure is recognized, the parser either calls

corresponding semantic routines directly or builds a syntax

tree. CFG ( Context-Free Grammar )

BNF ( Backus-Naur Form )

GAA ( Grammar Analysis Algorithms )

LL, LR, SLR, LALR Parsers

YACC

(Character Stream)

Intermediate

Representation

Target machine codeR. Rajkumar AP | CSE

Page 16: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (5)

16

Scanner ParserSemantic

Routines

Code

Generator

Optimizer

Source

Program

(Character Stream)

Tokens Syntactic

Structure

Intermediate

Representation

Symbol and

Attribute

Tables

(Used by all

Phases of

The Compiler)

Semantic Routines Perform two functions

Check the static semantics of each construct Do the actual translation

The heart of a compiler

Syntax Directed Translation

Semantic Processing Techniques

IR (Intermediate Representation)

Target machine codeR. Rajkumar AP | CSE

Page 17: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (6)

17

Scanner ParserSemantic

Routines

Code

Generator

Optimizer

Source

Program Tokens Syntactic

Structure

Symbol and

Attribute

Tables

(Used by all

Phases of

The Compiler)

Optimizer The IR code generated by the semantic routines is

analyzed and transformed into functionally equivalent but

improved IR code

This phase can be very complex and slow

Peephole optimization

loop optimization, register allocation, code scheduling

Register and Temporary Management

Peephole Optimization

(Character Stream)

Intermediate

Representation

Target machine codeR. Rajkumar AP | CSE

Page 18: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (7)

18

Source

Program

(Character Stream)Scanner

TokensParser

Syntactic

Structure

Semantic

Routines

Intermediate

Representation

Optimizer

Code

Generator

Code Generator Interpretive Code Generation

Generating Code from Tree/Dag

Grammar-Based Code Generator

Target machine codeR. Rajkumar AP | CSE

Page 19: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (8)

19

Scanner [Lexical Analyzer]

Parser [Syntax Analyzer]

Semantic Process [Semantic analyzer]

Code Generator[Intermediate Code Generator]

Code Optimizer

Tokens

Parse tree

Abstract Syntax Tree w/ Attributes

Non-optimized Intermediate Code

Optimized Intermediate Code

Code Optimizer

Target machine code

R. Rajkumar AP | CSE

Page 20: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Structure of a Compiler (9)

Compiler writing tools

Compiler generators or compiler-

compilers

E.g. scanner and parser generators

Examples : Yacc, Lex

20 R. Rajkumar AP | CSE

Page 21: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Syntax and Semantics of

Programming Language (1)

A programming language must include the specification of

syntax (structure) and semantics (meaning).

Syntax typically means the context-free syntax because of

the almost universal use of context-free-grammar (CFGs)

Ex.

a = b + c is syntactically legal

b + c = a is illegal

21 R. Rajkumar AP | CSE

Page 22: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The Syntax and Semantics of

Programming Language (2)

The semantics of a programming language are commonly

divided into two classes:

Static semantics

Semantics rules that can be checked at compiled time.

Ex. The type and number of a function’s arguments

Runtime semantics

Semantics rules that can be checked only at run time

22 R. Rajkumar AP | CSE

Page 23: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Compiler Design and Programming

Language Design

23

An interesting aspect is how programming language

design and compiler design influence one another.

Programming languages that are easy to compile

have many advantages

R. Rajkumar AP | CSE

Page 24: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Computer Architecture and Compiler

Design

Compilers should exploit the hardware-specific feature

and computing capability to optimize code.

The problems encountered in modern computing

platforms:

Instruction sets for some popular architectures are highly

nonuniform.

High-level programming language operations are not always

easy to support.

Ex. exceptions, threads, dynamic heap access …

Exploiting architectural features such as cache, distributed

processors and memory

Effective use of a large number of processors

24 R. Rajkumar AP | CSE

Page 25: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Compiler Design Considerations

Debugging Compilers

Designed to aid in the development and debugging of

programs.

Optimizing Compilers

Designed to produce efficient target code

Retargetable Compilers

A compiler whose target architecture can be changed without

its machine-independent components having to be rewritten.

25 R. Rajkumar AP | CSE

Page 26: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Compiler Construction Tools

R. Rajkumar AP | CSE26

Page 27: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Contents

Defining Compiler Construction Tools (aka CCTs)

Uses for CCTs

CCTs in the Compiler Structure

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Code Optimizer

Code Generator

R. Rajkumar AP | CSE27

Page 28: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Defining CCTs

programs or environments that assist in

the creation of an entire compiler or its

parts

R. Rajkumar AP | CSE28

Page 29: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Uses for CCTs

generate lexical analyzers,

syntax analyzers,

semantic analyzers,

intermediate code,

optimized target code

R. Rajkumar AP | CSE29

Page 30: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

CCTs in

the

Compiler

Structure

R. Rajkumar AP | CSE30

Page 31: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Lexical Analyzer

scanner generators

input: source program

output: lexical analyzer

task of reading characters from source program and

recognizing tokens or basic syntactic components

maintains a list of reserved words

R. Rajkumar AP | CSE31

Page 32: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Lexical Analyzer

Flex (fast lexical analyzer generator)

Example which specifies a scanner which replaces the

string “username” with the user’s login name

%%

username printf(“%s”, getlogin());

R. Rajkumar AP | CSE32

Page 33: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Syntax Analyzer

parser generators

input: context-free grammar

output: syntax analyzer

the task of the syntax analyzer is to produce a representation of the source program in a form directly representing its syntax structure. This representation is usually in the form of a binary tree or similar data structure

R. Rajkumar AP | CSE33

Page 34: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Semantic Analyzer

syntax-directed translators

input: parse tree

output: routines to generate I-code

“The role of the semantic analyzer is to derive methods by which the structures constructed by the syntax analyzer may be evaluate or executed.“

type checker

two common tactics:

~ flatten the semantic analyzer’s parse tree

~ embed semantic analyzer w/in syntax analyzer

(syntax-driven translation)

R. Rajkumar AP | CSE34

Page 35: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Intermediate Code Generator

Automatic code generators

input: I-code rules

output: crude target machine program

“The task of the code generator is to traverse this tree, producing functionally equivalent object code.” [3]

three address code is one type

R. Rajkumar AP | CSE35

Page 36: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Intermediate Code Generator

Example 7 + (8 * y) / 2

a := 8

b := y

c := a * b

a := c

b := 2

c := a / b

a := 7

b := c

c := a + b

expr

7 + expr

expr / 2

expr( )

8 *y

R. Rajkumar AP | CSE36

Page 37: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Code Optimizer

Data flow engines

input: I-code

output: transformed code

“This improvement is achieved by program transformations that are traditionally called optimizations, although the term ‘optimization’ is a misnomer because there is rarely a guarantee that the resulting code is the best possible.”

R. Rajkumar AP | CSE37

Page 38: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Code Optimizer

Peephole Optimization

machine or assembly code is used along with knowledge of target machine’s instruction set to replace I-code instructions with shorter or more quickly executed instructions - this is repeated as much as is necessary

R. Rajkumar AP | CSE38

Page 39: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Code Optimizer

Common Optimizing Transformations

Optim. Name Required Analysis Transformation

constant folding simulated exec. elimination

dead code elim. simulated exec. elimination

loop unrolling loop struct., stat.s motion (replic.)

linearizing arrays loop structure elimination

load/store optim. DFA motion

branch chaining statistics selection (dec)

math identities none selection, elimination

common subexp. simulated exec. elimination

R. Rajkumar AP | CSE39

Page 40: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Code Optimizer

Example 7 + (8 * y) / 2

a := y

a := a * 8

a := a / 2

a := + 7expr

7 + expr

expr / 2

expr( )

8 *y

R. Rajkumar AP | CSE40

Page 41: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Code Generator (Assembly Level)

Automatic code generators

input: optimized (transformed) I-code

output: target machine program

Example 7 + (8 * y) / 2

Load a, y

Mult a, 8

Div a, 2

Add a, 7

R. Rajkumar AP | CSE41

Page 42: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Review: Compiler Phases:Source program

Lexical analyzer

Syntax analyzer

Semantic analyzer

Intermediate code generator

Code optimizer

Code generator

Symbol table

manager Error handler

Front End

Backend

R. Rajkumar AP | CSE42

Page 43: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

The role of lexical analyzer

Lexical

AnalyzerParser

Source

program

token

getNextToken

Symbol

table

To semantic

analysis

R. Rajkumar AP | CSE43

Page 44: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Lexical Analysis

Lexical analyzer: reads input characters and produces a sequence of tokens as output (nexttoken()).

Trying to understand each element in a program. Token: a group of characters having a collective meaning.

const pi = 3.14159;

Token 1: (const, -)

Token 2: (identifier, ‘pi’)

Token 3: (=, -)

Token 4: (realnumber, 3.14159)

Token 5: (;, -)

R. Rajkumar AP | CSE44

Page 45: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Some terminology: Token: a group of characters having a collective meaning.

A lexeme is a particular instant of a token.

E.g. token: identifier, lexeme: pi, etc.

pattern: the rule describing how a token can be formed.

E.g: identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])*

Lexical analyzer does not have to be an individual

phase. But having a separate phase simplifies the

design and improves the efficiency and portability.

R. Rajkumar AP | CSE45

Page 46: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Two issues in lexical analysis.

How to specify tokens (patterns)?

How to recognize the tokens giving a token specification (how to

implement the nexttoken() routine)?

How to specify tokens:

all the basic elements in a language must be tokens so that

they can be recognized.

Token types: constant, identifier, reserved word, operator and

misc. symbol.

Tokens are specified by regular expressions.

main() {

int i, j;

for (I=0; I<50; I++) {

printf(“I = %d”, I);

}

}

R. Rajkumar AP | CSE46

Page 47: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Why to separate Lexical analysis and

parsing

1. Simplicity of design

2. Improving compiler efficiency

3. Enhancing compiler portability

R. Rajkumar AP | CSE47

Page 48: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Tokens, Patterns and Lexemes

A token is a pair a token name and an optional token

value

A pattern is a description of the form that the lexemes of

a token may take

A lexeme is a sequence of characters in the source

program that matches the pattern for a token

R. Rajkumar AP | CSE48

Page 49: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Example

Token Informal description Sample lexemes

if

else

comparison

id

number

literal

Characters i, f

Characters e, l, s, e

< or > or <= or >= or == or !=

Letter followed by letter and digits

Any numeric constant

Anything but “ sorrounded by “

if

else

<=, !=

pi, score, D2

3.14159, 0, 6.02e23

“core dumped”

printf(“total = %d\n”, score);

R. Rajkumar AP | CSE49

Page 50: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Lexical errors

Some errors are out of power of lexical analyzer to

recognize:

fi (a == f(x)) …

However it may be able to recognize errors like:

d = 2r

Such errors are recognized when no pattern for tokens

matches a character sequence

R. Rajkumar AP | CSE50

Page 51: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Error recovery

Panic mode: successive characters are ignored until we

reach to a well formed token

Delete one character from the remaining input

Insert a missing character into the remaining input

Replace a character by another character

Transpose two adjacent characters

R. Rajkumar AP | CSE51

Page 52: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Input buffering

Sometimes lexical analyzer needs to look ahead some

symbols to decide about the token to return

In C language: we need to look after -, = or < to decide what

token to return

In Fortran: DO 5 I = 1.25

We need to introduce a two buffer scheme to handle

large look-aheads safely

E = M * C * * 2 eof

R. Rajkumar AP | CSE52

Page 53: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Sentinels

Switch (*forward++) {

case eof:

if (forward is at end of first buffer) {

reload second buffer;

forward = beginning of second buffer;

}

else if {forward is at end of second buffer) {

reload first buffer;\

forward = beginning of first buffer;

}

else /* eof within a buffer marks the end of input */

terminate lexical analysis;

break;

cases for the other characters;

}

E = M eof * C * * 2 eof eof

R. Rajkumar AP | CSE53

Page 54: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Specification of tokens

In theory of compilation regular expressions are used to

formalize the specification of tokens

Regular expressions are means for specifying regular

languages

Example:

Letter_(letter_ | digit)*

Each regular expression is a pattern specifying the form of

strings

R. Rajkumar AP | CSE54

Page 55: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Regular expressions

Ɛ is a regular expression, L(Ɛ) = {Ɛ}

If a is a symbol in ∑then a is a regular expression, L(a) =

{a}

(r) | (s) is a regular expression denoting the language L(r)

∪ L(s)

(r)(s) is a regular expression denoting the language

L(r)L(s)

(r)* is a regular expression denoting (L9r))*

(r) is a regular expression denting L(r)

R. Rajkumar AP | CSE55

Page 56: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Regular definitions

d1 -> r1

d2 -> r2

dn -> rn

Example:

letter_ -> A | B | … | Z | a | b | … | Z | _

digit -> 0 | 1 | … | 9

id -> letter_ (letter_ | digit)*

R. Rajkumar AP | CSE56

Page 57: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Extensions

One or more instances: (r)+

Zero of one instances: r?

Character classes: [abc]

Example:

letter_ -> [A-Za-z_]

digit -> [0-9]

id -> letter_(letter|digit)*

R. Rajkumar AP | CSE57

Page 58: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Recognition of tokens

Starting point is the language grammar to understand the

tokens:

stmt -> if expr then stmt

| if expr then stmt else stmt

| Ɛ

expr -> term relop term

| term

term -> id

| number

R. Rajkumar AP | CSE58

Page 59: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Recognition of tokens (cont.)

The next step is to formalize the patterns:digit -> [0-9]

Digits -> digit+

number -> digit(.digits)? (E[+-]? Digit)?

letter -> [A-Za-z_]

id -> letter (letter|digit)*

If -> if

Then -> then

Else -> else

Relop -> < | > | <= | >= | = | <>

We also need to handle whitespaces:

ws -> (blank | tab | newline)+

R. Rajkumar AP | CSE59

Page 60: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Architecture of a transition-diagram-

based lexical analyzerTOKEN getRelop()

{

TOKEN retToken = new (RELOP)

while (1) { /* repeat character processing until a

return or failure occurs */

switch(state) {

case 0: c= nextchar();

if (c == ‘<‘) state = 1;

else if (c == ‘=‘) state = 5;

else if (c == ‘>’) state = 6;

else fail(); /* lexeme is not a relop */

break;

case 1: …

case 8: retract();

retToken.attribute = GT;

return(retToken);

}

R. Rajkumar AP | CSE60

Page 61: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Lexical Analyzer Generator - Lex

Lexical

Compiler

Lex Source program

lex.llex.yy.c

Ccompiler

lex.yy.c a.out

a.outInput stream Sequence

of tokens

R. Rajkumar AP | CSE61

Page 62: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Structure of Lex programs

declarations

%%

translation rules

%%

auxiliary functions

Pattern {Action}

R. Rajkumar AP | CSE62

Page 63: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Example%{

/* definitions of manifest constants

LT, LE, EQ, NE, GT, GE,

IF, THEN, ELSE, ID, NUMBER, RELOP */

%}

/* regular definitions

delim [ \t\n]

ws {delim}+

letter [A-Za-z]

digit [0-9]

id {letter}({letter}|{digit})*

number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?

%%

{ws} {/* no action and no return */}

if {return(IF);}

then {return(THEN);}

else {return(ELSE);}

{id} {yylval = (int) installID(); return(ID); }

{number} {yylval = (int) installNum(); return(NUMBER);}

Int installID() {/* funtion to install the lexeme, whose first character is pointed to by yytext, and whose length is yyleng, into the symbol table and return a pointer thereto */

}

Int installNum() { /* similar to installID, but puts numerical constants into a separate table */

}

R. Rajkumar AP | CSE63

Page 64: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

64

Finite Automata

R. Rajkumar AP | CSE

Page 65: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

65

Finite Automata

Regular expressions = specification

Finite automata = implementation

A finite automaton consists of

An input alphabet

A set of states S

A start state n

A set of accepting states F S

A set of transitions state input state

R. Rajkumar AP | CSE

Page 66: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

66

Finite Automata

Transition

s1 a s2

Is read

In state s1 on input “a” go to state s2

If end of input

If in accepting state => accept, othewise => reject

If no transition possible => reject

R. Rajkumar AP | CSE

Page 67: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

67

Finite Automata State Graphs

A state

• The start state

• An accepting state

• A transitiona

R. Rajkumar AP | CSE

Page 68: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

68

A Simple Example

A finite automaton that accepts only “1”

A finite automaton accepts a string if we can follow

transitions labeled with the characters in the string from

the start to some accepting state

1

R. Rajkumar AP | CSE

Page 69: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

69

Another Simple Example

A finite automaton accepting any number of 1’s followed

by a single 0

Alphabet: {0,1}

Check that “1110” is accepted but “110…” is not

0

1

R. Rajkumar AP | CSE

Page 70: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

70

And Another Example

Alphabet {0,1}

What language does this recognize?

0

1

0

1

0

1

R. Rajkumar AP | CSE

Page 71: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

71

And Another Example

Alphabet still { 0, 1 }

The operation of the automaton is not completely

defined by the input

On input “11” the automaton could be in either state

1

1

R. Rajkumar AP | CSE

Page 72: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

72

Epsilon Moves

Another kind of transition: -moves

• Machine can move from state A to state B without reading input

A B

R. Rajkumar AP | CSE

Page 73: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

73

Deterministic and Nondeterministic

Automata

Deterministic Finite Automata (DFA)

One transition per input per state

No -moves

Nondeterministic Finite Automata (NFA)

Can have multiple transitions for one input in a given state

Can have -moves

Finite automata have finite memory

Need only to encode the current state

R. Rajkumar AP | CSE

Page 74: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

74

Execution of Finite Automata

A DFA can take only one path through the state graph

Completely determined by input

NFAs can choose

Whether to make -moves

Which of multiple transitions for a single input to take

R. Rajkumar AP | CSE

Page 75: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

75

Acceptance of NFAs An NFA can get into multiple states

• Input:

0

1

1

0

1 0 1

• Rule: NFA accepts if it can get in a final state

R. Rajkumar AP | CSE

Page 76: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

76

NFA vs. DFA (1)

NFAs and DFAs recognize the same set of languages

(regular languages)

DFAs are easier to implement

There are no choices to consider

R. Rajkumar AP | CSE

Page 77: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

77

NFA vs. DFA (2) For a given language the NFA can be simpler than the DFA

01

0

0

01

0

1

0

1

NFA

DFA

• DFA can be exponentially larger than NFA

R. Rajkumar AP | CSE

Page 78: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

78

Regular Expressions to Finite Automata

High-level sketch

Regularexpressions

NFA

DFA

LexicalSpecification

Table-driven Implementation of DFA

R. Rajkumar AP | CSE

Page 79: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

79

Regular Expressions to NFA (1)

Thomson Construction For each kind of rexp, define an NFA

Notation: NFA for rexp A

A

• For

• For input aa

R. Rajkumar AP | CSE

Page 80: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

80

Regular Expressions to NFA (2) For AB

A B

• For A | B

A

B

R. Rajkumar AP | CSE

Page 81: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

81

Regular Expressions to NFA (3)

For A*

A

R. Rajkumar AP | CSE

Page 82: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

R. Rajkumar AP | CSE82

Relationship between NFAs and DFAs

DFA is a special case of an NFA

DFA has no transitions

DFA’s transition function is single-valued

Same rules will work

DFA can be simulated with an NFA

Obviously

NFA can be simulated with a DFA (less obvious)

Simulate sets of possible states

Possible exponential blowup in the state space

Still, one state per character in the input streamRabin & Scott, 1959

Page 83: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

R. Rajkumar AP | CSE83

Automating Scanner Construction

To convert a specification into code:

1 Write down the RE for the input language

2 Build a big NFA

3 Build the DFA that simulates the NFA

4 Systematically shrink the DFA

5 Turn it into code

Scanner generators

Lex and Flex work along these lines

Algorithms are well-known and well-understood

Key issue is interface to parser (define all parts of speech)

You could build one in a weekend!

Page 84: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

R. Rajkumar AP | CSE84

Where are we? Why are we doing this?

RE NFA (Thompson’s construction)

Build an NFA for each term

Combine them with -moves

NFA DFA (Subset construction)

Build the simulation

DFA Minimal DFA

Hopcroft’s algorithm

DFA RE

All pairs, all paths problem

Union together paths from s0 to a final state

minimal

DFARE NFA DFA

The Cycle of Constructions

Page 85: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

R. Rajkumar AP | CSE85

RE NFA using Thompson’s Construction

Key idea

NFA pattern for each symbol & each operator

Join them with moves in precedence orderS0 S1

a

NFA for a

S0 S1

aS3 S4

b

NFA for ab

NFA for a | b

S0

S1 S2

a

S3 S4

b

S5

S0 S1

S3 S4

NFA for a*

a

Ken Thompson, CACM, 1968

Page 86: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

R. Rajkumar AP | CSE86

Example of Thompson’s Construction

Let’s try a ( b | c )*

1. a, b, & c

2. b | c

3. ( b | c )*

S0 S1

aS0 S1

bS0 S1

c

S2 S3

b

S4 S5

c

S1 S6 S0 S7

S1 S2

b

S3 S4

c

S0 S5

Page 87: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

R. Rajkumar AP | CSE87

Example of Thompson’s Construction (con’t)

4. a ( b | c )*

Of course, a human would design something simpler ...S0 S1

a

b | c

But, we can automate production of the more complex NFA version ...

S0 S1

a S4 S5

b

S6 S7

c

S3 S8 S2 S9

Page 88: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

R. Rajkumar AP | CSE88

Where are we? Why are we doing this?

RE NFA (Thompson’s construction)

Build an NFA for each term

Combine them with -moves

NFA DFA (subset construction)

Build the simulation

DFA Minimal DFA

Hopcroft’s algorithm

DFA RE

All pairs, all paths problem

Union together paths from s0 to a final state

minimal

DFARE NFA DFA

The Cycle of Constructions

Page 89: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

89

Example of RegExp -> NFA conversion

Consider the regular expression

(1 | 0)*1

The NFA is

1C E

0D F

B

G

A H1

I J

R. Rajkumar AP | CSE

Page 90: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

90

Next

Regularexpressions

NFA

DFA

LexicalSpecification

Table-driven Implementation of DFA

R. Rajkumar AP | CSE

Page 91: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

91

Constructing Efficient Finite Automata

First we’ll see how to transform an NFA into a DFA.

Then we’ll see how to transform a

DFA into a minimum-state DFA.

Transforming an NFA into a DFA

The l-closure of a state s, denoted l(s), is the set consisting of s together with all states that

can be reached from s by traversing l-edges. The l-closure of a set S of states, denoted

l(S), is the union of the l-closures of the states in S.

Example. Given the following NFA as a graph and as a transition table.

S 0

2

1

b

b

La

La

Some sample l-closures for the NFA are as follows:

l(0) = {0, 1, 2}

l(1) = {1, 2}

l(2) = {2}

l() =

l({1, 2}) = {1, 2}

l({0, 1, 2}) = {0, 1, 2}.

S

F

TN a b L

0 {1, 2} {1}

1 {1, 2} {2}

2

R. Rajkumar AP | CSE

Page 92: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

92

S

F

F

TN a b L

0 {0, 1} {3}

1 {2}

2 {2}

3 {3}

Algorithm: Transform an NFA into a DFA

Construct a DFA table TD from an NFA table TN as follows:

1. The start state of the DFA is l(s), where s is the start state of the NFA.

2. If {s1, …, sn} is a DFA state and a A, then

TD({s1, …, sn}, a) = l(TN(s1, a) … TN(sn, a)).

3. A DFA state is final if one of its elements is an NFA final state.

Example. Given the following NFA.

The algorithm constructs the following DFA transition table TD, where it is also written in

simplified form after a renumbering of the states.

S, F

F

F

F

F

TD a b

{0, 3} {3} {0, 1, 3}

{3} {3}

{0, 1, 3} {2, 3} {0, 1, 3}

{2, 3} {3} {2}

{2} {2}

S 0

3

1 bb

aL

a2

b

S, F

F

F

F

F

TD a b

0 1 2

1 1 5

2 3 2

3 1 4

4 5 4

5 5 5R. Rajkumar AP | CSE

Page 93: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

93

S

F

TN a b L

0 {1} {3}

1 {2}

2

3 {2, 3}

S

F

F

F

TD a b

{0, 3} {1, 2, 3}

{1, 2, 3} {2, 3} {2}

{2, 3} {2, 3}

{2}

Quiz. Use the algorithm to transform the following NFA into a DFA.

S 0

1a

aL

b

2

3a

Solution: The algorithm constructs the following DFA transition table TD, where it is

also written in simplified form after a renumbering of the states.

S

F

F

F

TD a b

0 1 4

1 2 3

2 2 4

3 4 4

4 4 4

R. Rajkumar AP | CSE

Page 94: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

94

Transforming an DFA into a minimum-state DFA

Let S be the set of states that can be reached from the start state of a DFA over A.

For states s, t S let s ~ t mean that for all strings w A* either T(s, w) and T(t, w) are

both final or both nonfinal. Observe that ~ is an equivalence relation on S. So it partitions S

into equivalence classes.

Observe also that the number of equivalence classes is the minimum number of states

needed by a DFA to recognize the language of the given DFA.

Algorithm: Transform a DFA to a minimum-state DFA

1. Construct the following sequence of sets of possible equivalent pairs of distinct states:

E0 E1 … Ek = Ek+1,

where

E0 = {{s, t} | s and t are either both final or both nonfinal}

and

Ei+1 = {{s, t} Ei | {T(s, a), T(t, a)} Ei or T(s, a) = T(t, a)} for every a A}.

Ek represents the distinct pairs of equivalent states from which ~ can be generated.

2. The equivalence classes form the states of the minimum state DFA with transition

table Tmin defined by

Tmin([s], a) = [T(s, a)].

3. The start state is the class containing the start state of the given DFA.

4. A final state is any class containing a final state of the given DFA.

R. Rajkumar AP | CSE

Page 95: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

95

Example. Use the algorithm to transform the following DFA into a minimum-state DFA.

S 0

a

b

a2

a, b

4

1

3

b a, b

a, b

S

F

F

F

T a b

0 1 4

1 2 3

2 3 3

3 3 3

4 4 4

Solution: The set of states is S = {0, 1, 2, 3, 4}. To find the equivalent states calculate:

E0 = {{0, 4}, {1, 2}, {1, 3}, {2, 3}}

E1 = {{1, 2}, {1, 3}, {2, 3}}

E2 = {{1, 2}, {1, 3}, {2, 3}} = E1.

So 1 ~ 2, 1 ~ 3, 2 ~ 3. This tells us that S is partitioned by {0}, {1, 2, 3}, {4}, which we

name [0], [1], [4], respectively. So the minimum-state DFA has three states.

S

F

TMin a b

[0] [1] [4]

[1] [1] [1]

[4] [4] [4]

Min-state Table

S

F

TMin a b

0 1 2

1 1 1

2 2 2

Renamed Table

S 0

a

b

a, b

2

1 a, b

Min-state DFA graph

Quiz: What regular expression equality arises from the two DFAs?

Answer: a + aa + (aaa + aab + ab)(a + b)* = a(a + b)*.R. Rajkumar AP | CSE

Page 96: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA Minimization

R. Rajkumar AP | CSE96

Page 97: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA

Deterministic Finite Automata (DFSA)

(Q, Σ, δ, q0, F)

Q – (finite) set of states

Σ – alphabet – (finite) set of input symbols

δ – transition function

q0 – start state

F – set of final / accepting states

R. Rajkumar AP | CSE97

Page 98: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA

Often representing as a diagram:

R. Rajkumar AP | CSE98

Page 99: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA Minimization

Some states can be redundant:

The following DFA accepts (a|b)+

State s1 is not necessary

R. Rajkumar AP | CSE99

Page 100: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA Minimization

So these two DFAs are equivalent:

R. Rajkumar AP | CSE100

Page 101: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA Minimization

This is a state-minimized (or just minimized) DFA

Every remaining state is necessary

R. Rajkumar AP | CSE101

Page 102: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA Minimization

The task of DFA minimization, then, is to automatically

transform a given DFA into a state-minimized DFA

Several algorithms and variants are known

Note that this also in effect can minimize an NFA (since we

know algorithm to convert NFA to DFA)

R. Rajkumar AP | CSE102

Page 103: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA Minimization Algorithm

Recall that a DFA M=(Q, Σ, δ, q0, F)

Two states p and q are distinct if

p in F and q not in F or vice versa, or

for some α in Σ, δ(p, α) and δ(q, α) are distinct

Using this inductive definition, we can calculate which

states are distinct

R. Rajkumar AP | CSE103

Page 104: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

DFA Minimization Algorithm

Create lower-triangular table DISTINCT, initially blank

For every pair of states (p,q): If p is final and q is not, or vice versa

DISTINCT(p,q) = ε

Loop until no change for an iteration: For every pair of states (p,q) and each symbol α

If DISTINCT(p,q) is blank and DISTINCT( δ(p,α), δ(q,α) ) is not blank

DISTINCT(p,q) = α

Combine all states that are not distinct

R. Rajkumar AP | CSE104

Page 105: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Very Simple Example

s0

s1

s2

s0 s1 s2

R. Rajkumar AP | CSE 105

Page 106: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Very Simple Example

s0

s1 ε

s2 ε

s0 s1 s2

Label pairs with ε where one is a final state and the other is not

R. Rajkumar AP | CSE 106

Page 107: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Very Simple Example

s0

s1 ε

s2 ε

s0 s1 s2

Main loop (no changes occur)

R. Rajkumar AP | CSE 107

Page 108: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Very Simple Example

s0

s1 ε

s2 ε

s0 s1 s2

DISTINCT(s1, s2) is empty, so s1 and s2 are equivalent states

R. Rajkumar AP | CSE 108

Page 109: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Very Simple Example

Merge s1 and s2

R. Rajkumar AP | CSE109

Page 110: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

More Complex Example

R. Rajkumar AP | CSE110

Page 111: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

More Complex Example

Check for pairs with one state final and one not:

R. Rajkumar AP | CSE111

Page 112: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

More Complex Example

First iteration of main loop:

R. Rajkumar AP | CSE112

Page 113: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

More Complex Example

Second iteration of main loop:

R. Rajkumar AP | CSE113

Page 114: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

More Complex Example

Third iteration makes no changes

Blank cells are equivalent pairs of states

R. Rajkumar AP | CSE114

Page 115: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

More Complex Example

Combine equivalent states for minimized DFA:

R. Rajkumar AP | CSE115

Page 116: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Conclusion

DFA Minimization is a fairly understandable process, and

is useful in several areas

Regular expression matching implementation

Very similar algorithm is used for compiler optimization to

eliminate duplicate computations

The algorithm described is O(kn2)

John Hopcraft describes another more complex algorithm that

is O(k (n log n) )

R. Rajkumar AP | CSE116

Page 117: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

117

Parse Trees

Definitions

Relationship to Left- and Rightmost Derivations

Ambiguity in Grammars

R. Rajkumar AP | CSE

Page 118: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

118

Parse Trees

Parse trees are trees labeled by symbols of a particular

CFG.

Leaves: labeled by a terminal or ε.

Interior nodes: labeled by a variable.

Children are labeled by the right side of a production for

the parent.

Root: must be labeled by the start symbol.

R. Rajkumar AP | CSE

Page 119: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

119

Example: Parse Tree

S -> SS | (S) | ()

S

SS

S )(

( )

( )

R. Rajkumar AP | CSE

Page 120: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

120

Yield of a Parse Tree

The concatenation of the labels of the leaves in left-to-

right order

That is, in the order of a preorder traversal.

is called the yield of the parse tree.

Example: yield of is (())()

S

SS

S )(

( )

( )

R. Rajkumar AP | CSE

Page 121: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

121

Parse Trees, Left- and Rightmost

Derivations For every parse tree, there is a unique leftmost, and a

unique rightmost derivation.

We’ll prove:

1. If there is a parse tree with root labeled A and yield w, then

A =>*lm w.

2. If A =>*lm w, then there is a parse tree with root A and

yield w.

R. Rajkumar AP | CSE

Page 122: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

122

Proof: Part 2

Given a leftmost derivation of a terminal string, we need

to prove the existence of a parse tree.

The proof is an induction on the length of the derivation.

R. Rajkumar AP | CSE

Page 123: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

123

Part 2 – Basis

If A =>*lm a1…an by a one-step derivation, then there

must be a parse tree

A

a1 an. . .

R. Rajkumar AP | CSE

Page 124: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

124

Part 2 – Induction

Assume (2) for derivations of fewer than k > 1 steps,

and let A =>*lm w be a k-step derivation.

First step is A =>lm X1…Xn.

Key point: w can be divided so the first portion is

derived from X1, the next is derived from X2, and so

on.

If Xi is a terminal, then wi = Xi.

R. Rajkumar AP | CSE

Page 125: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

125

Induction – (2)

That is, Xi =>*lm wi for all i such that Xi is a variable.

And the derivation takes fewer than k steps.

By the IH, if Xi is a variable, then there is a parse tree

with root Xi and yield wi.

Thus, there is a parse tree

A

X1 Xn. . .

w1 wnR. Rajkumar AP | CSE

Page 126: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

126

Parse Trees and Rightmost Derivations

The ideas are essentially the mirror image of the

proof for leftmost derivations.

Left to the imagination.

R. Rajkumar AP | CSE

Page 127: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

127

Parse Trees and Any Derivation

The proof that you can obtain a parse tree from a

leftmost derivation doesn’t really depend on

“leftmost.”

First step still has to be A => X1…Xn.

And w still can be divided so the first portion is

derived from X1, the next is derived from X2, and so

on.

R. Rajkumar AP | CSE

Page 128: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

128

Ambiguous Grammars

A CFG is ambiguous if there is a string in the language

that is the yield of two or more parse trees.

Example: S -> SS | (S) | ()

Two parse trees for ()()() on next slide.

R. Rajkumar AP | CSE

Page 129: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

129

Example – Continued

S

SS

S S

( )

S

SS

SS

( )( )

( ) ( )

( )

R. Rajkumar AP | CSE

Page 130: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

130

Ambiguity, Left- and Rightmost

Derivations

If there are two different parse trees, they must

produce two different leftmost derivations by the

construction given in the proof.

Conversely, two different leftmost derivations produce

different parse trees by the other part of the proof.

Likewise for rightmost derivations.

R. Rajkumar AP | CSE

Page 131: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

131

Ambiguity, etc. – (2)

Thus, equivalent definitions of “ambiguous grammar’’

are:

1. There is a string in the language that has two different

leftmost derivations.

2. There is a string in the language that has two different

rightmost derivations.

R. Rajkumar AP | CSE

Page 132: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

132

Ambiguity is a Property of Grammars,

not Languages

For the balanced-parentheses language, here is

another CFG, which is unambiguous.

B -> (RB | ε

R -> ) | (RR B, the start symbol,

derives balanced strings.

R generates strings that

have one more right paren

than left.R. Rajkumar AP | CSE

Page 133: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

133

Example: Unambiguous Grammar

B -> (RB | ε R -> ) | (RR

Construct a unique leftmost derivation for a given

balanced string of parentheses by scanning the string from

left to right.

If we need to expand B, then use B -> (RB if the next symbol is “(” and ε if at the end.

If we need to expand R, use R -> ) if the next symbol is “)” and

(RR if it is “(”.

R. Rajkumar AP | CSE

Page 134: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

134

The Parsing Process

Remaining Input:

(())()

Steps of leftmost derivation:

B

Next

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 135: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

135

The Parsing Process

Remaining Input:

())()

Steps of leftmost derivation:

B

(RB

Next

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 136: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

136

The Parsing Process

Remaining Input:

))()

Steps of leftmost derivation:

B

(RB

((RRB

Next

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 137: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

137

The Parsing Process

Remaining Input:

)()

Steps of leftmost derivation:

B

(RB

((RRB

(()RB

Next

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 138: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

138

The Parsing Process

Remaining Input:

()

Steps of leftmost derivation:

B

(RB

((RRB

(()RB

(())BNext

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 139: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

139

The Parsing Process

Remaining Input:

)

Steps of leftmost derivation:

B (())(RB

(RB

((RRB

(()RB

(())BNext

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 140: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

140

The Parsing Process

Remaining Input: Steps of leftmost derivation:

B (())(RB

(RB (())()B

((RRB

(()RB

(())BNext

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 141: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

141

The Parsing Process

Remaining Input: Steps of leftmost derivation:

B (())(RB

(RB (())()B

((RRB (())()

(()RB

(())BNext

symbol

B -> (RB | ε R -> ) | (RRR. Rajkumar AP | CSE

Page 142: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

142

LL(1) Grammars

As an aside, a grammar such B -> (RB | ε R -> ) |

(RR, where you can always figure out the production to

use in a leftmost derivation by scanning the given string

left-to-right and looking only at the next one symbol is

called LL(1).

“Leftmost derivation, left-to-right scan, one symbol of

lookahead.”

R. Rajkumar AP | CSE

Page 143: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

143

LL(1) Grammars – (2)

Most programming languages have LL(1) grammars.

LL(1) grammars are never ambiguous.

R. Rajkumar AP | CSE

Page 144: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

References

Aho, Alfred V., Sethi, Ravi and Ullman, Jeffrey D. Compilers: principles, techniques, and tools. (1986). Reading: Addison-Wesley.

Peters, James, Pittman, Thomas. The art of compiler design: theory and practice. (1992). Englewood Cliffs: Prentice Hall.

Watson, Des. High-level languages and their compilers. (1989). Wokingham: Addison-Wesley.

R. Rajkumar AP | CSE144

Page 145: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

References

Ullman, A. V., Hopcroft, J. E. and Ullman, J. D. (1974) The Design and Analysis of

Computer Algorithms. Addison-Wesley.

Hopcroft, J. (1971) An N Log N Algorithm for Minimizing States in a Finite Automaton.

Stanford University.

Parthasarathy, M. and Fleck, M. (2007) DFA Minimization. University of Illinois at

Urbana-Champaign. http://www.cs.uiuc.edu/class/fa07/cs273/Handouts/minimization/minimization.pdf

R. Rajkumar AP | CSE145

Page 146: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

References

Heng, Christopher. Free Compiler Construction Tools. http://www.thefreecountry.com/programming/compilercontructiontools

The Lex & Yacc Page. http://dinosaur.compilertools.net

Compiler Construction Kits. http://catalog.compilertools.net

The Cocktail Compiler Toolbox. http://www.first.gmd.de/cocktail/

R. Rajkumar AP | CSE146

Page 147: Compiler Design 15CS301gameofcompilers.weebly.com/uploads/8/5/4/8/8548812/cd_unit_1.pdf · The Structure of a Compiler (3) 14 Scanner Parser Semantic Routines Code Generator Optimizer

Prepared by

www.gameofcompilers.weebly.com

R. Rajkumar AP | CSE147

Instructor : Mr. R. Rajkumar, Assistant Professor | CSE

Staff room: TP-612, Tech park,

SRM Institute of Science and Technology,

Kattankulathur, India.