Upload
bryce-underwood
View
218
Download
0
Embed Size (px)
Citation preview
by Neng-Fa Zhou
Lexical Analysis
Why separate lexical and syntax analyses?– simpler design– efficiency– portability
by Neng-Fa Zhou
Tokens, Patterns, Lexemes
– Tokens• Terminal symbols in the grammar
– Patterns• Description of a class of tokens
– Lexemes• Words in the the source program
by Neng-Fa Zhou
Languages
– Fixed and finite alphabet (vocabulary)– Finite length sentences– Possibly infinite number of sentences
Examples– Natural numbers {1,2,3,...10,11,...}– Strings over {a,b} anban
Terms on parts of a string– prefix, suffix, substring, proper ....
by Neng-Fa Zhou
Operations on Languages
by Neng-Fa Zhou
Examples
L = {A,B,...,Z,a,b,...,z}D = {0,1,...,9}
L D : the set of letters and digitsLD : a letter followed by a digitL4 : four-letter stringsL* : all strings of letters, including L(L D)* : strings of letters and digits beginning with a letterD+ : strings of one or more digits
by Neng-Fa Zhou
Regular Expression(RE)
is a RE a symbol in is a RE Let r and s be REs.
– (r) | (s) : or– (r)(s) : concatenation– (r)* : zero or more instances– (r)+ : one or more instances– (r)? : zero or one instance
by Neng-Fa Zhou
Precedence of Operators
high
low
r* r+ r?
rs
r|s
all left associative Examples
= {a,b}1. a|b2. (a|b)(a|b)3. a*4. (a|b)*5. a| a*b
by Neng-Fa Zhou
Algebraic Properties of RE
by Neng-Fa Zhou
d1 r1
d2 r2
dn rn
....di is a RE over {d1,d2,...,di-1}
Regular Definitions
not recursive
Example-1
by Neng-Fa Zhou
%{ int num_lines = 0, num_chars = 0;%} %% \n ++num_lines; ++num_chars; . ++num_chars;
%%main(){ yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );}
yywrap(){return 0;}
by Neng-Fa Zhou
Example-2D [0-9]INT {D}{D}*
%%{INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);}. {printf("unrecognized %s\n",yytext);}%%int main(int argc, char *argv[]){
++argv, --argc;if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin;yylex();
}
yywrap(){return 0;}
java.util.regex
by Neng-Fa Zhou
import java.util.regex.*;
class Number { public static void main(String[] args){
String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?";if (Pattern.matches(regExNum,args[0])) System.out.println("valid");else System.out.println("invalid");
}}
String Pattern Matching in Perl
by Neng-Fa Zhou
print "Input a string :";$_ = <STDIN>;chomp($_);if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n";} else { print "invalid\n"; }
by Neng-Fa Zhou
Finite Automata
Nondeterministic finite automaton (NFA)
NFA = (S,T,s0,F)
– S: a set of states– T: a transition mapping– s0: the start state– F: final states or accepting states
by Neng-Fa Zhou
Example
by Neng-Fa Zhou
Deterministic Finite Automata (DFA)
T: a transition function There is only one arc going out from each node on each symbol.
by Neng-Fa Zhou
Simulating a DFA
s = s0;c = nextchar;while (c != eof) {
s = move(s,c);c = nextchar;
}if (s is in F)
return "yes";else
return "no";
by Neng-Fa Zhou
From RE to NFA
– a in
– s|t
by Neng-Fa Zhou
From RE to NFA (cont.)
– st
– s*
by Neng-Fa Zhou
Example
(a|b)*a
by Neng-Fa Zhou
Building Lexical Analyzer
RE NFA DFA
Emulator
Algorithm 3.23(Thompson's construction)
Algorithm 3.32(Subset construction)
by Neng-Fa Zhou
Conversion of an NFA into a DFA Intuition
– move(s,a) is a function in a DFA– move(s,a) is a mapping in a NFA
NFA DFA
A state reachable from s0 in the DFA on an input string corresponds to a set of states in NFA that are reachable on the same string.
by Neng-Fa Zhou
Computation of -Closure
-Closure(T): Set of NFA states reachable from some NFA state s in T by transition alone.
by Neng-Fa Zhou
From an NFA to a DFA(The subset construction)
by Neng-Fa Zhou
Example
NFA
DFA
by Neng-Fa Zhou
Algorithm 3.39
F, S-F};do begin for each group G in do begin
partition G into subgroups such that two states s and tof G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group;
replace G in by the set of all subgroups formed; end if () return;; end;
by Neng-Fa Zhou
Example
a b
AC B ACB B DD B EE B AC
Construct a DFA Directly from a Regular Expression
by Neng-Fa Zhou
by Neng-Fa Zhou
Implementation Issues
Input buffering– Read in characters one by one
• Unable to look ahead
• Inefficient
– Read in a whole string and store it in memory• Requires a big buffer
– Buffer pairs
by Neng-Fa Zhou
Buffer Pairs
by Neng-Fa Zhou
Use Sentinels
by Neng-Fa Zhou
Lexical Analyzer
by Neng-Fa Zhou
Lex
A tool for automatically generating lexical analyzers
by Neng-Fa Zhou
Lex Specifications
declarations%%
translation rules
%%auxiliary procedures
p1 {action1}p2 {action2}...pn {actionn}
by Neng-Fa Zhou
Lex Regular Expressions
by Neng-Fa Zhou
yylex()
yylex(){switch (pattern_match()){ case 1: {action1} case 2: {action2}
... case n: {actionn}
}}
by Neng-Fa Zhou
Example
DIGIT [0-9]ID [a-z][a-z0-9]*%%{DIGIT}+ {printf("An integer:%s(%d)\n",yytext,atoi(yytext));}{DIGIT}+"."{DIGIT}* {printf("A float: %s (%g)\n",yytext,atof(yytext));}if|then|begin|end|procedure|function {printf("A keyword: %s\n",yytext);}{ID} {printf("An identifier %s\n",yytext);}"+"|"-"|"*"|"/" {printf("An operator %s\n",yytext);}"{"[^}\n]*"}" {/* eat up one-line comments */}[ \t\n]+ {/* eat up white space */}. {printf("Unrecognized character: %s\n", yytext);}%%int main(int argc, char *argv[]){
++argv, --argc;if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin;yylex();
}