Upload
ankita
View
213
Download
0
Embed Size (px)
Citation preview
8/14/2019 Final Presentation (Minor)
1/32
06/19/09 1Copyright Ankita
8/14/2019 Final Presentation (Minor)
2/32
Lexical analyzer
Syntax analyzer
Semantic analyzer
Intermediatecode generator
Code optimizer
Code generator
Source program
Target program
Lexical Analyzer
Group sequence of characters into lexemes smallest meaningful entity in a language(keywords, identifiers, constants)
Characters read from a file are buffered
helps decrease latency due to i/o. Lexicalanalyzer manages the buffer
Makes use of the theory of regular languagesand finite state machines
06/19/09 2Copyright Ankita
8/14/2019 Final Presentation (Minor)
3/32
Lexical analyzer
Syntax analyzer
Semantic analyzer
Intermediatecode generator
Code optimizer
Code generator
Source program
Target program
Parser
Convert a linear structure sequence oftokens to a hierarchical tree-like structure an AST
The parser imposes the syntax rules of the
language Work should be linear in the size of theinput (else unusable) type consistencycannot be checked in this phase
Deterministic context free languages andpushdown automata for the basis
06/19/09 3Copyright Ankita
8/14/2019 Final Presentation (Minor)
4/32
Lexical analyzer
Syntax analyzer
Semantic analyzer
Intermediatecode generator
Code optimizer
Code generator
Source program
Target program
Semantic Analysis
Calculates the programs meaning
Rules of the language are checked (variabledeclaration, type checking)
Type checking also needed for codegeneration (code gen for a + b depends on thetype of a and b)
06/19/09 4Copyright Ankita
8/14/2019 Final Presentation (Minor)
5/32
Lexical analyzer
Syntax analyzer
Semantic analyzer
Intermediatecode generator
Code optimizer
Code generator
Source program
Target program
Intermediate Code Generation
Makes it easy to port compiler to otherarchitectures
Can also be the basis for interpreters
Enables optimizations that are not machinespecific
06/19/09 5Copyright Ankita
8/14/2019 Final Presentation (Minor)
6/32
Lexical analyzer
Syntax analyzer
Semantic analyzer
Intermediatecode generator
Code optimizer
Code generator
Source program
Target program
Intermediate Code Optimization
Constant propagation, dead codeelimination, common sub-expressionelimination, strength reduction, etc.
Based on dataflow analysis properties that
are independent of execution paths
06/19/09 6Copyright Ankita
8/14/2019 Final Presentation (Minor)
7/32
Lexical analyzer
Syntax analyzer
Semantic analyzer
Intermediatecode generator
Code optimizer
Code generator
Source program
Target program
Native Code Generation
Intermediate code is translated into nativecode
Register allocation, instruction selection
Native Code Optimization
Peephole optimizations small window is
optimized at a time
06/19/09 7Copyright Ankita
8/14/2019 Final Presentation (Minor)
8/32
Informal sketch of lexical analysis Identifies tokens in input string
Issues in lexical analysis Lookahead
Ambiguities
Specifying lexers Regular expression
06/19/09 8Copyright Ankita
8/14/2019 Final Presentation (Minor)
9/32
Simplifies the design of the compiler LL(1) or LR(1) parsing with 1 indicates that the next
input symbol is used to decide the next parsingprocess.
Provides efficient implementation Systematic techniques to implement lexical analyzers
by hand or automatically from specifications Stream buffering methods to scan input
Improves portability Non-standard symbols and alternate character
encodings can be normalized.
06/19/09 9Copyright Ankita
8/14/2019 Final Presentation (Minor)
10/32
Lexical
AnalyzerParser
SourceProgram
Token,tokenval
Symbol Table
Get nexttoken
error error
06/19/09 10Copyright Ankita
8/14/2019 Final Presentation (Minor)
11/32
Lexical analyzer
y := 31 + 28*x
Parser
token
tokenval
(token attribute)
06/19/09 11Copyright Ankita
8/14/2019 Final Presentation (Minor)
12/3206/19/09 12Copyright Ankita
8/14/2019 Final Presentation (Minor)
13/32
A token is a classification of lexical units For example: id and num
Lexemes are the specific character strings that
make up a token For example: abc and 123
Patterns are rules describing the set of lexemes
belonging to a token For example: letter followed by letters and digits andnon-empty sequence of digits
06/19/09 13Copyright Ankita
8/14/2019 Final Presentation (Minor)
14/32
A syntactic category In English:
noun, verb, adjective,
In a programming language:
Identifier, Integer, Single-Float, Double-Float, operator(perhaps single or multiple character), Comment,
Keyword, Whitespace, string constant,
06/19/09 14Copyright Ankita
8/14/2019 Final Presentation (Minor)
15/32
This is the job of the Lexer, the first pass overthe source code. The Lexer classifies programsubstrings according to role.
We also tack on to each token the location (lineand column) of where it ends, so we can reporterrors in the source code by location.
We read tokens until we get to an end-of-file.
06/19/09 15Copyright Ankita
8/14/2019 Final Presentation (Minor)
16/32
Output of lexical analysis is a stream of tokens . . .
This stream is the input to the next pass, the parser.
Parser makes use of token distinctions.
06/19/09 16Copyright Ankita
8/14/2019 Final Presentation (Minor)
17/32
Define a finite set of tokens Tokens describe all items of interest
Choice of tokens depends on language, design of
parser\
Useful tokens for this expression:Integer, Keyword, operator, Identifier
06/19/09 17Copyright Ankita
8/14/2019 Final Presentation (Minor)
18/32
Describe which strings belong to each tokencategory
Recall: Identifier: strings of letters or digits, starting with a
letter Integer: a non-empty string of digits
Keyword: else or if or begin or Whitespace: a non-empty sequence of blanks, newlines,
and tabs
06/19/09 18Copyright Ankita
8/14/2019 Final Presentation (Minor)
19/32
Is it as easy as it sounds?
Not quite!
Look at some history . . .
06/19/09 19Copyright Ankita
8/14/2019 Final Presentation (Minor)
20/32
FORTRAN rule: Whitespace is insignificant
E.g.,VAR1is the same asVA R1Footnote: FORTRAN whitespace rulemotivated by inaccuracy of punch card
operators
06/19/09 20Copyright Ankita
8/14/2019 Final Presentation (Minor)
21/32
The goal of lexical analysis is to Partition the input string into lexemes
Identify the token type, and perhaps the value of
each lexeme .
Left-to-right scan => lookahead sometimes
required.
06/19/09 21Copyright Ankita
8/14/2019 Final Presentation (Minor)
22/32
There are several formalisms for specifying tokens.
Regular languages are the most popular
Simple and useful theory Easy to understand
Efficient implementations are possible
Almost powerful enough (Cant do nested comments)
Popular almost automatic tools to write programs.
The standard notation for regular languages is regularexpressions.
06/19/09 22Copyright Ankita
8/14/2019 Final Presentation (Minor)
23/32
Single character denotes a set of one string
Epsilon character denotes a set of one 0-lengthstring
Empty set is {}= not the same as . Size()=0. Size(e) =1
{ }' ' " "c c=
{ }"" =
06/19/09 23Copyright Ankita
8/14/2019 Final Presentation (Minor)
24/32
Union: If A and B are REs then
Concatenation of Sets Concatenation ofstrings
Iteration (Kleene closure)
{ }| orA B s s A s B+ =
{ }| and AB ab a A b B=
*
0where ... times ...i i
i A A A A i A
= =U
06/19/09 24Copyright Ankita
8/14/2019 Final Presentation (Minor)
25/32
Translate regular expressions to NFA
Translate NFA to an efficient DFA
regular
expressions NFA DFA
Simulate NFAto recognize
tokens
Simulate DFAto recognize
tokens
Optional
06/19/09 25Copyright Ankita
8/14/2019 Final Presentation (Minor)
26/32
An NFA is a 5-tuple (S, , , s0, F) where
S is a finite set of states
is a finite set of symbols, the alphabet is a mapping from S to a set of statess0S is the start stateFS is the set of accepting (or final) states
06/19/09 26Copyright Ankita
8/14/2019 Final Presentation (Minor)
27/32
An NFA can be diagrammatically representedby a labeled directed graph called a transition
graph
0start a 1 32
b b
a
b
S = {0,1,2,3}
= {a,b}s0 = 0
F= {3}
06/19/09 27Copyright Ankita
8/14/2019 Final Presentation (Minor)
28/32
The mapping of an NFA can be representedin a transition table
StateInput
a
Input
b
0 {0, 1} {0}1 {2}
2 {3}
(0,a) = {0,1}
(0,b) = {0}(1,b) = {2}(2,b) = {3}
06/19/09 28Copyright Ankita
8/14/2019 Final Presentation (Minor)
29/32
A deterministic finite automaton is a special caseof an NFA No state has an -transition
For each state s and input symbol a there is at mostone edge labeled a leaving s
Each entry in the transition table is a singlestate At most one path exists to accept a string Simulation algorithm is simple
06/19/09 29Copyright Ankita
8/14/2019 Final Presentation (Minor)
30/32
State
Input
a
Input
b
0 {0, 1} {0}
{0,1} {0,1} {0,2}
{0,2} {0,1} {0,3}
{0,3} {0,1} {0}
(0,a) = {0,1}(0,b) = {0}({0,1},a) = {0,1}({0,1},b) = {0,2}({0,2},a)= {0,1}
({0,2},b)= {0,3}({0,3},a)= {0,1}({0,3},b)= {0}
06/19/09 30Copyright Ankita
8/14/2019 Final Presentation (Minor)
31/32
0start a 1 32
b b
bb
a
a
a
A DFA that accepts (ab)*abb
06/19/09 31Copyright Ankita
8/14/2019 Final Presentation (Minor)
32/32