12
CSC3315 (Spring 2009) 1 CSC 3315 CSC 3315 Lexical and Syntax Lexical and Syntax Analysis Analysis Hamid Harroud Hamid Harroud School of Science and Engineering, Akhawayn School of Science and Engineering, Akhawayn University University http://www.aui.ma/~H.Harroud/csc3315/

CSC 3315 Lexical and Syntax Analysis

  • Upload
    edana

  • View
    58

  • Download
    1

Embed Size (px)

DESCRIPTION

CSC 3315 Lexical and Syntax Analysis. Hamid Harroud School of Science and Engineering, Akhawayn University http://www.aui.ma/~H.Harroud/csc3315/. Constructing a Lexical Analyzer. state = S // S is the start state repeat { k = next character from the input - PowerPoint PPT Presentation

Citation preview

Page 1: CSC 3315 Lexical and Syntax Analysis

CSC3315 (Spring 2009) 1

CSC 3315CSC 3315Lexical and Syntax Lexical and Syntax AnalysisAnalysis

Hamid HarroudHamid HarroudSchool of Science and Engineering, Akhawayn School of Science and Engineering, Akhawayn

UniversityUniversityhttp://www.aui.ma/~H.Harroud/csc3315/

Page 2: CSC 3315 Lexical and Syntax Analysis

Constructing a Lexical Analyzer

state = S // S is the start state

repeat {k = next character from the input

if k == EOF // the end of inputif state is a final state then accept

else reject

state = T[state,k]

if state = empty then reject // got stuck

}

Page 3: CSC 3315 Lexical and Syntax Analysis

Constructing a Lexical Analyzer

Page 4: CSC 3315 Lexical and Syntax Analysis

Constructing a Lexical Analyzer

int LexAnalyzer() {getChar();if (isLetter(nextChar)) {

addChar();getChar();while (isLetter(nextChar) || isDigit(nextChar)){ addChar(); getChar();}return lookup(lexeme);

} . . .

Page 5: CSC 3315 Lexical and Syntax Analysis

Constructing a Lexical Analyzer

int LexAnalyzer() {getChar();if (isLetter(nextChar)) { . . .}else if (isDigit(nextChar)) {

addChar();getChar();while (isDigit(nextChar)) { addChar(); getChar();}return INT_LIT;break;

}}

Page 6: CSC 3315 Lexical and Syntax Analysis

Lexical Errors

Consider the following two programs:

Page 7: CSC 3315 Lexical and Syntax Analysis

Lexical Errors

Page 8: CSC 3315 Lexical and Syntax Analysis

Jlex: a scanner generator

JLex.Main(java)

JLex.Main(java)

javacjavac

P.main(java)P.main(java)

jlex specificationxxx.jlex

xxx.jlex.java

generated scannerxxx.jlex.java

Yylex.class

Yylex.class

input programtest.sim

Output of P.main

Page 9: CSC 3315 Lexical and Syntax Analysis

public class P {public static void main(String[] args) {

FileReader inFile = new FileReader(args[0]); Yylex scanner = new Yylex(inFile);

Symbol token = scanner.next_token(); while (token.sym != sym.EOF) {

switch (token.sym) {case sym.INTLITERAL: System.out.println("INTLITERAL (" + ((IntLitTokenVal)token.value).intVal \+ ")");

break;…

} token = scanner.next_token(); } }

Jlex: a scanner generator

Page 10: CSC 3315 Lexical and Syntax Analysis

Regular expression rulesregular-expression { action } pattern to be matched code to be executed when

the

pattern is matched

When next_token() method is called, it repeats: Find the longest sequence of characters in the input (starting with

the current character) that matches a pattern. Perform the associated action

until a return in an action is executed.

Page 11: CSC 3315 Lexical and Syntax Analysis

Matching rules

If several patterns that match the same sequence of characters, then the longest pattern is considered to be matched.

If several patterns that match the same (longest) sequence of characters, then the first such pattern is considered to be matched

so the order of the patterns can be important!

If an input character is not matched in any pattern, the scanner throws an exception

Page 12: CSC 3315 Lexical and Syntax Analysis

An Example%%

DIGIT= [0-9]

LETTER= [a-zA-Z]

WHITESPACE= [ \t\n] // space, tab, newline

{LETTER}({LETTER}|{DIGIT}*)

{System.out.println(yyline+1

+ ": ID " + yytext());}

{DIGIT}+ {System.out.println(yyline+1 + ": INT");}

"=" {System.out.println(yyline+1 + ": ASSIGN");}

"==" {System.out.println(yyline+1 + ": EQUALS");}

{WHITESPACE}* { }

. {System.out.println(yyline+1 + ": bad char");}