29
Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College Mon, Jan 19, 2015 Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 1 / 21

Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Recognition of TokensLecture 3

Section 3.4

Robb T. Koether

Hampden-Sydney College

Mon, Jan 19, 2015

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 1 / 21

Page 2: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

1 A Class of Tokens

2 The Input Buffer

3 Transition Diagrams

4 Writing the Lexer

5 Assignment

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 2 / 21

Page 3: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Outline

1 A Class of Tokens

2 The Input Buffer

3 Transition Diagrams

4 Writing the Lexer

5 Assignment

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 3 / 21

Page 4: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

A Class of Tokens

We will explore and demonstrate the concepts of a lexer by usinga simple class of tokens.

digit → [0-9]

digits → digit+

number → digits (. digits)? (E [+-]? digits)?

letter → [A-Za-z]

id → letter (letter | digit)∗

if → if

then → then

else → else

relop → < | > | <= | >= | = | <>

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 4 / 21

Page 5: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Whitespace

In addition to recognizing tokens, the lexer must strip whitespacefrom the input.Whitespace is not a token, but it must be recognized by the lexer.

ws → (blank | tab | newline)+

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 5 / 21

Page 6: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Outline

1 A Class of Tokens

2 The Input Buffer

3 Transition Diagrams

4 Writing the Lexer

5 Assignment

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 6 / 21

Page 7: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

The input to the lexer is a stream of characters.We may consider the characters to be residing in a buffer.We mark two positions in the buffer.

lexemeBeginforward

The pointer lexemeBegin holds the starting position of the currenttoken.The pointer forward points to the current symbol.

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 7 / 21

Page 8: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

The lexer begins in the start state with the current symbol (pointedto by both lexemeBegin and forward).The process moves from state to state by following the transitionswhose labels match the current symbol (forward).This continues until no further moves are possible.

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 8 / 21

Page 9: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

The input buffer

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 10: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

Advance one symbol

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 11: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

Could be an identifier; could be a keyword

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 12: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

It is the keyword int

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 13: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

Skip whitespace

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 14: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

Could be an identifier; could be a keyword

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 15: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

It is an identifier

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 16: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

This is an operator, but which one?

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 17: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

The Input Buffer

Example (Processing a Statement)

c o u n t = 0 ;i n t

lexemeBegin

forward

It is the assignment operator

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21

Page 18: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Outline

1 A Class of Tokens

2 The Input Buffer

3 Transition Diagrams

4 Writing the Lexer

5 Assignment

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 10 / 21

Page 19: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Transition Diagrams

Definition (Transition Diagram)A transition diagram is a directed graph.It consists of a finite set of nodes, called states.One state is designated the start state.The directed edges between states represent transitions.Each transition is labeled with a symbol (or possibly a regularexpression).A subset of the set of states is designated the accepting states.The remaining states are rejecting states.

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 11 / 21

Page 20: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Transition Diagrams

Example (Relational Operators)Consider the relational operators <, >, <=, >=, =, and <>.The first symbol may be <, >, =, or something else.If the first symbol is <, then the next symbol may be =, >, orsomething else.If the first symbol is >, then the next symbol may be = orsomething else.If the first symbol is =, then the next symbol is something else.

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 12 / 21

Page 21: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Transition Diagrams

Example (Relational Operators)

1

0

=

=

>

<

>

=

other

other

2

3

4

5

6

8

7

LE

NE

LT and retract

EQ

GE

GT and retract

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 13 / 21

Page 22: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Transition Diagrams

Example (Identifiers)

9letter other

10 11 identifierand retract

letter | digit

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 14 / 21

Page 23: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Transition Diagrams

Example (Keywords)

if other keyword

and retract

t h e n other keywordand retract

e l s e other keywordand retract

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 15 / 21

Page 24: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Transition Diagrams

Draw the transition diagram for numbers

digit → [0-9]

digits → digit+

number → digits (. digits)? (E [+-]? digits)?

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 16 / 21

Page 25: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Outline

1 A Class of Tokens

2 The Input Buffer

3 Transition Diagrams

4 Writing the Lexer

5 Assignment

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 17 / 21

Page 26: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Writing the Lexer

The lexer is the program that implements the transition diagram.We could use

A switch statement, and/orAn if-else structure.

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 18 / 21

Page 27: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Writing the Lexer

The Lexer for Relational OperatorsToken getRelop(){

int state = 0;char c = get_next_symbol();while (c == ’<’ || c == ’=’ || c == ’>’){

switch (state){

case 0:if (c == ’<’) state = 1;else if (c == ’=’) state = 2;else if (c == ’>’) state = 3;else fail();break;

...case 8:

retract();return Token(GT);

}}

}

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 19 / 21

Page 28: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Outline

1 A Class of Tokens

2 The Input Buffer

3 Transition Diagrams

4 Writing the Lexer

5 Assignment

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 20 / 21

Page 29: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College

Assignment

AssignmentRead Section 3.4.Exercises 1, 2(c)(i).

Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 21 / 21