19
JLex Lecture 4 Mon, Jan 24, 2005

JLex

  • Upload
    oriole

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

JLex. Lecture 4 Mon, Jan 24, 2005. JLex. JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator in C. The gnu lexical analyzer flex is also based on lex. - PowerPoint PPT Presentation

Citation preview

Page 1: JLex

JLex

Lecture 4

Mon, Jan 24, 2005

Page 2: JLex

JLex

JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a

lexical analyzer generator in C. The gnu lexical analyzer flex is also based on

lex. JLex reads a description of a set of tokens

and outputs a Java program that will process those tokens.

Page 3: JLex

The JLex Input File

The input file to JLex uses the extension .lex.

The file is divided into three parts. User code JLex directives Regular expression rules

These three sections are separated by %%.

Page 4: JLex

JLex User Code

See Section 2.1 of the JLex User’s Manual. Any code written in the user-code section is

copied directly into the Java source file created by JLex.

JLex creates a class named Yylex, which is at the heart of the lexer. The user code is not incorporated into this class.

Page 5: JLex

JLex Directives

See Section 2.2 of the JLex User’s Manual. Any code bracketed within %{ and %} is

copied directly into the Yylex class, at the beginning.

Although this code is incorporated into the Yylex class, it is not incorporated into any Yylex member function.

Thus, we may define Yylex class variables or additional member functions.

Page 6: JLex

The init Directive

Code bracketed within %init{ and %init} is copied into the Yylex default constructor, which is called on by the other constructors.

%init{ System.out.println("In the constructor");%init}

Page 7: JLex

The eof Directive

Code bracketed within %eof{ and %eof} is copied into the Yylex function yy_do_eof(), which is called once upon end of file.

%eof{ System.out.println("In yy_do_eof()");%eof}

Page 8: JLex

JLex Token Types

Unless we specify otherwise, the data type of the returned tokens is Yytoken.

This class is not created automatically. We may change the return type to int by

typing the directive %integer. We may change the return type to Integer

by typing the directive %intwrap. We may set the return type to any other type

by using the directive %type.

Page 9: JLex

JLex Token Types

If the return type is Yytoken or Integer, then the EOF token is null.

If the return type is int, then the EOF token is -1.

For any other type, we need to specify the EOF value.

Page 10: JLex

JLex EOF Value

By using the %eofval directive, we may indicate what value to return upon EOF.

We write

%eofval{ return new type(value);%eofval}

Page 11: JLex

JLex Regular Expression Rules

Each regular expression rule consists of a regular expression followed by an associated action.

The associated action is a segment of Java code, enclosed in braces { }.

Typically, the action will be to return the appropriate token.

Page 12: JLex

JLex Regular Expressions

Regular expressions are expressed using ASCII characters (0 – 127).

The following characters are metacharacters.

? * + | ( ) ^ $ . [ ] { } “ \ Metacharacters have special meaning; they

do not represent themselves. All other characters represent themselves.

Page 13: JLex

JLex Regular Expressions

Let r and s be regular expressions. r? matches zero or one occurrences of r. r* matches zero or more occurrences of r. r+ matches one or more occurrences of r. r|s matches r or s. rs matches r concatenated with s.

Page 14: JLex

JLex Regular Expressions

Parentheses are used for grouping.

("+"|"-")? If a regular expression begins with ^, then it

is matched only at the beginning of a line. If a regular expression ends with $, then it is

matched only at the end of a line. The dot . matches any non-newline

character.

Page 15: JLex

JLex Regular Expressions

Brackets [ ] match any single character listed within the brackets. [abc] matches a or b or c. [A-Za-z] matches any letter.

If the first character after [ is ^, then the brackets match any character except those listed. [^A-Za-z] matches any nonletter.

Page 16: JLex

JLex Regular Expressions

A single character within double quotes " “ or after \ represents itself.

Metacharacters lose their special meaning and represent themselves when they stand alone within single quotes or follow \. "?" and \? match ?.

Page 17: JLex

JLex Escape Sequences

Some escape sequences. \n matches newline. \b matches backspace. \r matches carriage return. \t matches tab. \f matches formfeed.

If c is not a special escape-sequence character, then \c matches c.

Page 18: JLex

Running JLex

The lexical analyzer generator is the Main class in the JLex folder.

To create a lexical analyzer from the file filename.lex, type

java JLex.Main filename.lex This produces a file filename.lex.java,

which must be compiled to create the lexical analyzer.

Page 19: JLex

Running the Lexical Analyzer

To run the lexical analyzer, a Yylex object must first be created.

The Yylex constructor has one parameter specifying an input stream.

For exampleYylex lexer = new Yylex(System.in);

Then, calls to the yylex() member function will return tokens.

token = lexer.yylex();