61
Lexical Analysis (2 Lectures)

Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

  • View
    232

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

Lexical Analysis(2 Lectures)

Page 2: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

2CSE244 Compilers

Overview• Basic Concepts

• Regular Expressions– Language

• Lexical analysis by hand

• Regular Languages Tools– NFA

– DFA

• Scanning tools– Lex / Flex / JFlex / ANTLR

Page 3: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

3CSE244 Compilers

Scanning Perspective• Purpose

– Transform a stream of symbols

– Into a stream of tokens

Page 4: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

4CSE244 Compilers

Lexical Analyzer Responsibilities

• Lexical analyzer [Scanner]– Scan input

– Remove white spaces

– Remove comments

– Manufacture tokens

– Generate lexical errors

– Pass token to parser

Page 5: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

5CSE244 Compilers

Modular design• Rationale

– Separate the two analysis• High cohesion / Low coupling

– Improve efficiency

– Improve portability / maintainability

– Enable integration of third-party lexers • [lexer = lexical analysis tool]

Page 6: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

6CSE244 Compilers

Terminology• Token

– A classification for a common set of strings

– Examples: Identifier, Integer, Float, Assign, LeftParen, RightParen,....

• Pattern– The rules that characterize the set of strings for a token

– Examples: [0-9]+

• Lexeme– Actual sequence of characters that matches a pattern

and has a given Token class.

– Examples: • Identifier: Name,Data,x

• Integer: 345,2,0,629,....

Page 7: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

7CSE244 Compilers

Examples

“ ” “ ”

Page 8: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

8CSE244 Compilers

Lexical Errors• Error Handling is very localized, w.r.t. Input

Source

• Example: fi(a==f(x)) …generates no lexical error in C

• In what situations do errors occur?• Prefix of remaining input doesn’t match any defined

token

• Possible error recovery actions:• Deleting or Inserting Input Characters

• Replacing or Transposing Characters

• Or, skip over to next separator to ignore problem

Page 9: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

9CSE244 Compilers

Basic Scanning technique• Use 1 character of look-ahead

– Obtain char with getc()

• Do a case analysis– Based on lookahead char

– Based on current lexeme

• Outcome– If char can extend lexeme, all is well, go on.

– If char cannot extend lexeme:• Figure out what the complete lexeme is and return its

token

• Put the lookahead back into the symbol stream

Page 10: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

10

CSE244 Compilers

Language Concepts

Alphabet Language

{0,1} {0,10,100,1000,10000,…}

{0,1,100,000,111,…}

{a,b,c} {abc,aabbcc,aaabbbccc,…}

{A…Z} {TEE,FORE,BALL…}

{FOR,WHILE,GOTO…}

{A…Z,a…z,0…9, {All legal PASCAL progs}

+,-,…,<,>,…} {All grammatically correct English Sentences}

Special Languages: Φ – EMPTY LANGUAGE

ε – contains empty string ε only

•A language, L, is simply any set of strings over a fixed alphabet.

Page 11: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

11

CSE244 Compilers

Formal Language Operations

Page 12: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

12

CSE244 Compilers

Examples

L = {A, B, C, D } D = {1, 2, 3}

L D = {A, B, C, D, 1, 2, 3 }

LD = {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }

L2 = { AA, AB, AC, AD, BA, BB, BC, BD, CA, É DD}

L4 = L2 L2 = ??

L* = { All possible strings of L plus }

L+ = L* -

L (L D ) = ??

L (L D )* = ??

Page 13: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

13

CSE244 Compilers

Regular Languages• All examples above are

– Quite expressive

– Simple languages

• But also...– Belong to a special class: regular languages

• A Regular Expression is a Set of Rules / Techniques for Constructing Sequences of Symbols (Strings) From an Alphabet.

• Let Σ Be an Alphabet, r a Regular Expression Then L(r) is the Language That is Characterized by the Rules of r

Page 14: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

14

CSE244 Compilers

Rules• fix alphabet Σ

• ε is a regular expression denoting {ε}

• If a is in Σ , a is a regular expression that denotes {a}

• Let r and s be R.E. for L(r) and L(s). Then

• (a) (r) | (s) is a regular expression L(r) ∪ L(s)

• (b) (r)(s) is a regular expression L(r) L(s)

• (c) (r)* is a regular expression (L(r))*

• (d) (r) is a regular expression L(r)

• All are Left-Associative.

• Parentheses are dropped as allowed by precedences.

Pre

ced

ee

nce

Pre

ced

ee

nce

Page 15: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

15

CSE244 Compilers

Example revisitedL = {A, B, C, D } D = {1, 2, 3}

A | B | C | D = L

(A | B | C | D ) (A | B | C | D ) = L2

(A | B | C | D )* = L*

(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L D)

Page 16: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

16

CSE244 Compilers

Algebraic Properties

AXIOM DESCRIPTION

r | s = s | r

r | (s | t) = (r | s) | t

(r s) t = r (s t)

r = rr = r

r* = ( r | )*

r ( s | t ) = r s | r t( s | t ) r = s r | t r

r** = r*

| is commutative

| is associative

concatenation is associative

concatenation distributes over |

relation between * and

Is the identity element for concatenation

* is idempotent

Page 17: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

17

CSE244 Compilers

More Examples

• All Strings that start with “tab” or end with “bat”:

tab{A,…,Z,a,...,z}*|{A,…,Z,a,....,z}*bat

• All Strings in Which {1,2,3} exist in ascending

order:

{A,…,Z}*1 {A,…,Z}*2 {A,…,Z}*3 {A,…,Z}*

Page 18: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

18

CSE244 Compilers

Tokens as R.E.

… …

“+”

“?”

Page 19: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

19

CSE244 Compilers

Tokens as Patterns• Patterns are ???

• Tokens are ???Assume Following Tokens:

if, then, else, relop, id, num

What language construct are they used for ?

Given Tokens, What are Patterns ?

if if

then then

else else

relop < | <= | > | >= | = | <>

id letter ( letter | digit )*

num digit + (. digit + ) ? ( E(+ | -) ? digit + ) ?

What does this represent ? What is ?

Page 20: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

20

CSE244 Compilers

Throw Away Tokens• Fact

– Some languages define tokens as useless

– Example: C• whitespace, tabulations, carriage return, and

comments can be discarded without affecting the program’s meaning.

blank b

tab ^T

newline ^M

delim blank | tab | newline

ws delim +

Page 21: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

21

CSE244 Compilers

Automaton• A tool to specify a token

start

other

=>0 6 7

8 * RTN(G)

RTN(GE)> =

WeÕve accepted Ņ>Ó and have read other char thatmust be unread.

Page 22: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

22

CSE244 Compilers

A More Complex Automatonstart <

0

other

=6 7

8

return(relop, LE)

5

4

>

=1 2

3

other

>

=

*

*

return(relop, NE)

return(relop, LT)

return(relop, EQ)

return(relop, GE)

return(relop, GT)

Page 23: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

23

CSE244 Compilers

Two More...id :

delim :

start delim28

other3029

delim

*

start letter9

other1110

letter or digit

*

Page 24: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

24

CSE244 Compilers

What about keywords ?• Easy!

– Use the “Identifier” token

– After a match, lookup the keyword table• If found, return a token for the matched keyword

• If not, return a token for the true identifier

Page 25: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

25

CSE244 Compilers

Yes... But how to scan?• Remember the algorithm?

– Acquire 1 character of lookahead

– Case analysis based• On lookahead

• On state of automaton

Page 26: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

26

CSE244 Compilers

Scanner codeclass Scanner {

InputStream _in; char _la; // The lookahead characterchar[] _window; // lexeme windowToken nextToken() {

startLexeme(); // reset window at startwhile(true) {

switch(_state) {case 0: {

_la = getChar(); if (_la == ‘<’) _state = 1;

else if (_la == ‘=’) _state = 5;else if (_la == ‘>’) _state = 6;else failure(state);

}break;case 6: {

_la = getChar();if (_la == ‘=’) _state = 7;else _state = 8;

}break;}

}}

}

case 7: {return new Token(GEQUAL);

}break;

case 8: {pushBack(_la);return new Token(GREATER);

}

start <0

other

=6 7

8

5

4

>

=1 2

3

other

>

=

*

*

Page 27: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

27

CSE244 Compilers

Handling Failures• Meaning

– The automaton for this token failed

• solution– If another automaton is available

• “rewind” the input to the beginning of last lexeme

• Jump to start state of next automaton

• Start recognizing again

– If no other automaton• This is a true lexical error.

• Discard lexeme (or at least first char of lexeme)

• Start from state 0 again

Page 28: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

28

CSE244 Compilers

Overview• Basic Concepts

• Regular Expressions– Language

• Lexical analysis by hand

• Regular Languages Tools– NFA / DFA

• Scanning with DFAs

• Scanning tools– Lex / Flex / JFlex

Page 29: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

29

CSE244 Compilers

Automata & Language Theory

• Terminology– FSA

• A recognizer that takes an input string and determines whether it’s a valid string of the language.

– Non-Deterministic FSA (NFA)• Has several alternative actions for the same input

symbol

– Deterministic FSA (DFA)• Has at most 1 action for any given input symbol

• Bottom Line– expressive power(NFA) == expressive power(DFA)

– Conversion can be automated

Page 30: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

30

CSE244 Compilers

NFA

An NFA is a mathematical model that consists of :

• S, a set of states

• , the symbols of the input alphabet

• move, a transition function.

• move(state, symbol) → set of states

• move : S ∪{∈} → Pow(S)

• A state, s0 ∈ S, the start state

• F ⊆ S, a set of final or accepting states.

Page 31: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

31

CSE244 Compilers

Representing NFA

Transition Diagrams :

Transition Tables:

Number states (circles), arcs, final states, …

More suitable to representation within a computer

We’ll see examples of both !

Page 32: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

32

CSE244 Compilers

Example NFA

S = { 0, 1, 2, 3 }

s0 = 0

F = { 3 }

Σ = { a, b }

start0 3b21 ba

a

b

What Language is defined ?

What is the Transition Table ?

state

i n p u t

0

1

2

a b

{ 0, 1 }-- { 2 }

-- { 3 }

{ 0 }

∈(null) moves possible

ji ∈

Switch state but do not use any input symbol

Page 33: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

33

CSE244 Compilers

Epsilon-Transitions• Given the regular expression : (a (b*c)) | (a (b | c+)?)

• Find a transition diagram NFA that recognizes it.

• Solution ?

Page 34: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

34

CSE244 Compilers

NFA Construction• Automatic construction example

• a(b*c)

• a(b|c+)?

Build a DisjunctionBuild a Disjunction

Page 35: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

35

CSE244 Compilers

Resulting NFA

Page 36: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

36

CSE244 Compilers

Working NFA

start0 3b21 ba

a

b • Given an input string, we trace moves

• If no more input & in final state, ACCEPT EXAMPLE:

Input: ababb

move(0, a) = 1

move(1, b) = 2

move(2, a) = ? (undefined)

REJECT !

move(0, a) = 0

move(0, b) = 0

move(0, a) = 1

move(1, b) = 2

move(2, b) = 3ACCEPT !

-OR-

Page 37: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

37

CSE244 Compilers

Handling Undefined Transitions

• We can handle undefined transitions by defining one more state, a “death” state, and transitioning all previously undefined transition to this death state.

start0 3b21 ba

a

b

4

a, b

aa

Page 38: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

38

CSE244 Compilers

Worse still...• Not all path result in acceptance!

start0 3b21 ba

a

b

aabb is accepted along path :

0 → 0 → 1 → 2 → 3

BUT… it is not accepted along the valid path:

0 → 0 → 0 → 0 → 0

Page 39: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

39

CSE244 Compilers

The NFA “Problem”• Two problems

– Valid input may not be accepted

– Non-deterministic behavior from run to run...

• Solution?

Page 40: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

40

CSE244 Compilers

The DFA Save The Day• A DFA is an NFA with a few restrictions

– No epsilon transitions

– For every state s, there is only one transition (s,x) from s for any symbol x in Σ

• Corollaries

– Easy to implement a DFA with an algorithm!

– Deterministic behavior

Page 41: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

41

CSE244 Compilers

NFA vs. DFA• NFA

– smaller number of states Qnfa

– In order to simulate it requires a |Qnfa| computation for each input symbol.

• DFA– larger number of states Qdfa

– In order to simulate it requires a constant computation for each input symbol.

• caveat - generic NFA=>DFA construction: Qdfa

~ 2^{Qnfa}

• but: DFA’s are perfectly optimizable! (i.e., you can find smallest possible Qdfa )

Page 42: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

42

CSE244 Compilers

One catch...• NFA-DFA comparison

Page 43: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

43

CSE244 Compilers

NFA to DFA Conversion• Idea

– Look at the state reachable without consuming any input

– Aggregate them in macro states

Page 44: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

44

CSE244 Compilers

Final Result• A state is final

– IFF one of the NFA state was final

Page 45: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

45

CSE244 Compilers

Preliminary Definitions• NFA N = ( S, Σ, s0, F, MOVE )• ε-Closure(s) : s ε S

– set of states in S that are reachable from s via ε-moves of N that originate from s.

• ε-Closure(T) : T ⊆S• NFA states reachable from all t ε T on ε-moves

only.

• move(T,a) : T ⊆S, a ε Σ• Set of states to which there is a transition on

input a from some t ε T

Page 46: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

46

CSE244 Compilers

Algorithm

forall(t in T) push(t);

initialize ε-closure(T) to T;

while stack is not empty do begin

t = pop();

for each u ε S with edge t→u labeled ε

if u is not in ε-closure(T)

add u to ε-closure(T) ;

push u onto stack

computing the ε-closurecomputing the ε-closure

Page 47: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

47

CSE244 Compilers

DFA constructioncomputing the

The set of states

The transitions

computing the

The set of states

The transitionslet Q = ε-closure(s0) ;

D = { Q };

enQueue(Q)

while queue not empty do

X = deQueue();

for each a ε Σ do

Y := ε-closure(move(X,a));

T[X,a] := Y

if Y is not in D

D = D U { Y }

enQueue(Y);

end

end

Page 48: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

48

CSE244 Compilers

Summary• We can

– Specify tokens with R.E.

– Use DFA to scan an input and recognize token

– Transform an NFA into a DFA automatically

• What we are missing– A way to transform an R.E. into an NFA

• Then, we will have a complete solution– Build a big R.E.

– Turn the R.E. into an NFA

– Turn the NFA into a DFA

– Scan with the obtained DFA

Page 49: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

49

CSE244 Compilers

R.E. To NFA• Process

– Inductive definition• Use the structure of the R.E.

• Use atomic automata for atomic R.E.

• Use composition rules for each R.E. expression

• Recall– RE ::= ε

::= s in Σ::= rs

::= r | s

::= r*

Page 50: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

50

CSE244 Compilers

Epsilon Construction• RE ::= ε

Page 51: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

51

CSE244 Compilers

Symbol Construction• RE ::= x in Σ

Page 52: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

52

CSE244 Compilers

Chaining Construction• RE ::= rs

Page 53: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

53

CSE244 Compilers

Branching Construction• RE ::= r | s

Page 54: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

54

CSE244 Compilers

Kleene-Closure Construction• RE ::= r*

Page 55: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

55

CSE244 Compilers

NFA Construction Example• R.E.

– (ab*c) | (a(b|c*))

• Parse Tree: r13

r12r5

r3 r11r4

r9

r10

r8r7

r6

r0

r1 r2

b

*c

a a

|

( )

b

|

*

c

Page 56: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

56

CSE244 Compilers

NFA Construction Example 2

r3: a

r0: b

r2: c

b ∈

∈r1:

r4 : r1 r2b ∈

∈ c

r5 : r3 r4

b ∈

∈a c

Page 57: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

57

CSE244 Compilers

NFA Construction Example 3

r11: a

r7: b

r6: c

c ∈

∈r9 : r7 | r8

∈∈

b

c ∈

∈r8:

c ∈

∈r12 : r11 r10

∈∈

b

a

r10 : r9

Page 58: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

58

CSE244 Compilers

NFA Construction Example 4r13 : r5 | r12

b ∈

∈a c

c ∈

∈∈

b

a

∈∈

1

6543

8

2

10

9 12 13 14

11

15

7

16

17

Page 59: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

59

CSE244 Compilers

Overall Summary• How does this all fit together ?

– Reg. Expr. → NFA construction

– NFA → DFA conversion

– DFA simulation for lexical analyzer

• Recall Lex Structure– Pattern Action

– Pattern Action

– … …• Each pattern recognizes lexemes

• Each pattern described by regular expression

etc.

(abc)*ab

(a | b)*abb

Recognizer!

Page 60: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

60

CSE244 Compilers

Morale?• All of this can be automated with a tool!

– LEX The first lexical analyzer tool for C

– FLEXA newer/faster implementation C / C++ friendly

– JFLEX A lexer for Java. Based on same principles.

– JavaCC

– ANTLR

Page 61: Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools

61

CSE244 Compilers

Ahead...• Grammars

• Parsing– Bottom Up

– Top Down