32
241-437 Compilers: lex/3 Compiler Structures Objectives describe lex give many examples of lex's use 241-437, Semester 1, 2011-2012 3. Lex

241-437 Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use 241-437, Semester 1, 2011-2012 3. Lex

Embed Size (px)

Citation preview

241-437 Compilers: lex/3 1

Compiler Structures

• Objectives– describe lex– give many examples of lex's use

241-437, Semester 1, 2011-2012

3. Lex

241-437 Compilers: lex/3 2

Overview

1. What is lex (and flex)?

2. Lex Program Format

3. Removing Whitespace (white.l)

4. Printing Line Numbers (linenos.l)

5. Counting (counter.l)

6. Counting IDs (ids.l)

7. Matching Rules

8. More Information.

241-437 Compilers: lex/3 3

1. What is lex (and flex)?

• lex is a lexical analyzer generator– flex is a fast version of lex, which we'll be

using

• lex translates REs into C code• The generated code is easy to integrate into

C compilers (and other applications).

241-437 Compilers: lex/3 4

Uses for Lex

• Convert input from one form to another.

• Extract information from text files.

• Extract tokens for a syntax analyzer.

241-437 Compilers: lex/3 5

Using Lex

lex (flex)lex

sourceprogramlex.l

lex.yy.c

inputstream

of chars

Ccompiler

a.outsequenceof tokens

lex.yy.c

a.out

241-437 Compilers: lex/3 6

Running Flex• With UNIX:

> flex foo.l

> gcc –Wall -o foo lex.yy.c

> ./foo < inputfile.txt

• You may need to include –ll (-lfl) in the gcc call.– it links in the lex library

• You may get "warning" messages from gcc.

241-437 Compilers: lex/3 7

How Lex Works

• The lex-generated program (e.g. foo) will read characters from stdin, trying to match against a character sequence using its REs.

• Once it matches a sequence, it reads in more characters for the next RE match.

241-437 Compilers: lex/3 8

2. Lex Program Format

• A lex program has three sections:

REs and/or C code%% RE/action rules%%C functions

241-437 Compilers: lex/3 9

A Lex Program%{ int charCount=0, wordCount=0, lineCount=0;%}word [^ \t\n]*%%{word} {wordCount++; charCount += yyleng; }[\n] {charCount++; lineCount++;}. {charCount++;}%%int main(void) { yylex(); printf(“Chars %d, Words: %d, Lines: %d\n”, charCount, wordCount, lineCount); return 0;}

1) C Code, REs

2) RE/Action rules

3) C functions

241-437 Compilers: lex/3 10

Section 1: Defining a RE

• Format:name RE

• Examples:digit [0-9]

letter [A-Za-z]

id {letter} ({letter}|{digit})*

word [^ \t\n]*

241-437 Compilers: lex/3 11

Regular Expressions in Lex

x match the char x \. match the char . "string" match contents of string of chars . match any char except \n^ match beginning of a line$ match the end of a line[xyz] match one char x, y, or z[^xyz] match any char except x, y, and z [a-z] match one of a to z

241-437 Compilers: lex/3 12

r* closure (match 0 or more r's)r+ positive closure (match 1 or more r's)r? optional (match 0 or 1 r)r1 r2 match r1 then r2 (concatenation)r1 | r2match r1 or r2 (union)( r ) groupingr1 \ r2match r1 when followed by r2

{ name } match the RE defined by name

241-437 Compilers: lex/3 13

Example REs (Again)

[0-9]A single digit.

[0-9]+An integer.

[0-9]+ (\.[0-9]+)?An integer or floating point number.

[+-]? [0-9]+ (\.[0-9]+)? ([eE][+-]?[0-9]+)?Integer, floating point, or scientific notation.

241-437 Compilers: lex/3 14

Section 2: RE/Action Rule

• A rule has the form:name { action }

– the name must be defined in section 1– the action is any C code

• If the named RE matches an input character sequence, then the C code is executed.

241-437 Compilers: lex/3 15

Section 3: C Functions

• Added to the lexical analyzer• Depending on the lex/flex version, you may need to add the function:

int yywrap(void){ return 1; }

– it returns 1 to signal that the end of the input file means that the lexer can terminate

241-437 Compilers: lex/3 16

3. Removing Whitespace (white.l)whitespace [ \t\n]%%

{whitespace} ;. { ECHO; }

%%

int yywrap(void) { return 1; }

int main(void) { yylex(); // the lexical analyzer return 0;}

empty action

ECHO macro

name

RE

241-437 Compilers: lex/3 17

Usage

> flex white.l

> gcc -Wall -o white lex.yy.c

> ./white < white.l

/*white.l*//*AndrewDavison,May...

>

flex output file

241-437 Compilers: lex/3 18

4. Printing Linenos (linenos.l)

%{ int lineno = 1;%}

%%^(.*)\n { printf("%4d\t%s", lineno, yytext); lineno++; }%%

int yywrap(void){ return 1; }

continued

241-437 Compilers: lex/3 19

int main(int argc, char *argv[]){ if (argc > 1) { FILE *file = fopen(argv[1], "r"); if (file == NULL) { printf("Error opening %s\n", argv[1]); exit(1); } yyin = file; } yylex(); fclose(yyin); return 0;}

241-437 Compilers: lex/3 20

Built-in Variables

• yytext holds the matched string.• yyin is the input stream.

• yyleng holds the length of the string.

• There are several other built-in variables in lex.

241-437 Compilers: lex/3 21

Usage

> flex linenos.l

> gcc -Wall -o linenos lex.yy.c

> ./linenos textfile.txt

> ./linenos < textfile.txt

241-437 Compilers: lex/3 22

./linenos < linenos.l 1 2 /* linenos.l */ 3 /* Andrew Davison, March 2005 */ 4 5 %{ 6 int lineno = 1; 7 %} 8 9 %% : :

241-437 Compilers: lex/3 23

5. Counting (counter.l)%{ int charCount = 0, wordCount = 0, lineCount = 0;%}word [^ \t\n]*%%

{word} { wordCount++; charCount += yyleng; }\n { charCount++; lineCount++; }. { charCount++; }

%%

int yywrap(void) { return 1; }

continued

241-437 Compilers: lex/3 24

int main(void) {

yylex(); printf("Characters %d, Words: %d, Lines: %d\n", charCount, wordCount,

lineCount); return 0;}

241-437 Compilers: lex/3 25

Usage

> flex counter.l

> gcc -Wall -o counter lex.yy.c

> ./counter < counter.l

Characters 496, Words: 78, Lines: 29

241-437 Compilers: lex/3 26

6. Counting IDs (ids.l)

%{ int count = 0;%}

digit [0-9]letter [A-Za-z]id {letter}({letter}|{digit})*%%

{id} { count++; }. ; /* ignore other things */\n ;

%%

continued

241-437 Compilers: lex/3 27

int yywrap(void) { return 1; }

int main() { yylex(); printf("No. of Idents: %d\n", count); return 0;}

241-437 Compilers: lex/3 28

Usage> flex ids.l > gcc -Wall -o ids lex.yy.c

> ./ids < test1.txtNo. of Idents: 6

> l test1.txtthis is a test177 23 bing2*((() this5

>

241-437 Compilers: lex/3 29

7. Matching Rules

1. A rule is chosen that matches the biggest amount of input.

beg {…}begin {…}

Both rules can match the input string "beginning", but the second rule is chosen because it matchesmore.

continued

241-437 Compilers: lex/3 30

2. If two rules can match the same amount of input, then the first rule is used.

begin {… }[a-z]+ {…}

Both rules can match the input string "begin", so the first rule is chosen

241-437 Compilers: lex/3 31

8. More Information

• Lex and Yaccby Levine, Mason, and BrownO'Reilly; 2nd edition

• On UNIX:– man lex– info lex

continued

in our library

241-437 Compilers: lex/3 32

• A Compact Guide to Lex & Yaccby Tom Niemannhttp://epaperpress.com/lexandyacc/

– with several calculator examples, which I'll be discussing when we get to yacc

– it's also on the course website in the "Niemann Tutorial" subdirectory of "Useful Info"• http://fivedots.coe.psu.ac.th/ Software.coe/Compilers/