24
Compiler Design Lexical Analysis Design of a Lexical - Analyzer Generator conf. dr. ing. Ciprian-Bogdan Chirila [email protected] http://www.cs.upt.ro/~chirila

Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Compiler Design

Lexical Analysis

Design of a Lexical-Analyzer

Generator

conf. dr. ing. Ciprian-Bogdan Chirila

[email protected]

http://www.cs.upt.ro/~chirila

Page 2: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Outline

The Structure of the Generated Analyzer

Pattern Matching Based on NFA’s

DFA’s for Lexical Analyzers

Implementing the Lookahead Operator

Page 3: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Objectives

to present the architecture of Lex

to discuss two approaches

◦ NFA based

◦ DFA based

implementation of Lex

Page 4: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

The Structure of the Generated

Lexical Analyzer fixed program that simulates an automaton

◦ deterministic

◦ nondeterministic

transition table for the automaton

functions that are passed directly through Lex to the output (we will see next)

actions from the input program

◦ as fragments of code

◦ to be invoked at the appropriate time by the automaton simulator

Page 5: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Architecture of a Lexical Analyzer

Generated by Lex

Page 6: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

The Generation Process

each regular expression pattern is

transformed into NFA

all NFAs are combined into one

◦ new ε-transitions are added to NFAs Ni for

pattern pi

Page 7: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Example

Page 8: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Example

patterns

◦ a {action A1 for pattern p1}

◦ abb {action A2 for pattern p2}

◦ a*b+ {action A3 for pattern p3}

when several prefixes on the input matches

multiple patterns

◦ always prefer a longer prefix to a shorter prefix

◦ if the longest possible prefix matches multiple

patterns choose the pattern listed first

the lexeme “abb” is taken by the second rule

Page 9: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Conflict Resolution

the three patterns present some conflicts

abb matches p2 and p3

◦ we consider it a lexeme for p2

◦ p2 is listed above p3

aabbbb…

◦ we take the longest lexeme until another a is

reached

◦ we will report the lexeme from the initial a

followed by as many b as there are

Page 10: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Example

Page 11: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Pattern Matching Based on NFA’s

NFA simulation algorithmS=ε-closure(s0);

c=nextChar();

while(c!=eof)

{

S=ε-enclosure(move(S,c));

c=nextChar();

}

if(S∩F!=ø) return “yes”;

else return “no”;

Page 12: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Example input a a b a

Page 13: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Example input a a b a

Page 14: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Example input a a b a

Page 15: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Example input a a b a

pattern a*b+ was found !!!

Page 16: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

DFAs Architecture for Lexical

Analyzers to convert NFA for all patterns into DFA

◦ by using the subset construction algorithm

within each DFA state having one or

more NFA accepting states

◦ to determine the first pattern whose

accepting state is represented

◦ to make that pattern the output of the DFA

state

Page 17: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

The Subset Construction Algorithm

while(there is an unmarked state T in Dstates)

{

mark T;

for(each input symbol a)

{

U=ε-closure(move(T,a));

if (U is not in Dstates)

add U as unmarked state to Dstates;

Dtran[T,a]=U;

}

}

Page 18: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

NFA Example

Page 19: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

NFA to DFA Example

Page 20: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

DFA Simulation Example a b b a

Page 21: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

DFA Simulation Example a b b a

Page 22: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

DFA Simulation Example a b b a

Page 23: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Dead States in DFA’s

the automaton not quite a DFA

◦ no transitions on every state x every input

we have omitted

◦ transitions to the dead state Ø

◦ from the dead state Ø to itself

Page 24: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton

Bibliography

Alfred V. Aho, Monica S. Lam, Ravi Sethi,

Jeffrey D. Ullman – Compilers, Principles,

Techniques and Tools, Second Edition,

2007