Final Presentation (Minor)

8/14/2019 Final Presentation (Minor)

1/32

06/19/09 1Copyright Ankita


2/32

Lexical analyzer

Syntax analyzer

Semantic analyzer

Intermediatecode generator

Code optimizer

Code generator

Source program

Target program

Lexical Analyzer

Group sequence of characters into lexemes smallest meaningful entity in a language(keywords, identifiers, constants)

Characters read from a file are buffered

helps decrease latency due to i/o. Lexicalanalyzer manages the buffer

Makes use of the theory of regular languagesand finite state machines



3/32

Lexical analyzer

Syntax analyzer

Semantic analyzer


Code optimizer

Code generator

Source program

Target program

Parser

Convert a linear structure sequence oftokens to a hierarchical tree-like structure an AST

The parser imposes the syntax rules of the

language Work should be linear in the size of theinput (else unusable) type consistencycannot be checked in this phase

Deterministic context free languages andpushdown automata for the basis



4/32

Lexical analyzer

Syntax analyzer

Semantic analyzer


Code optimizer

Code generator

Source program

Target program

Semantic Analysis

Calculates the programs meaning

Rules of the language are checked (variabledeclaration, type checking)

Type checking also needed for codegeneration (code gen for a + b depends on thetype of a and b)



5/32

Lexical analyzer

Syntax analyzer

Semantic analyzer


Code optimizer

Code generator

Source program

Target program

Intermediate Code Generation

Makes it easy to port compiler to otherarchitectures

Can also be the basis for interpreters

Enables optimizations that are not machinespecific



6/32

Lexical analyzer

Syntax analyzer

Semantic analyzer


Code optimizer

Code generator

Source program

Target program

Intermediate Code Optimization

Constant propagation, dead codeelimination, common sub-expressionelimination, strength reduction, etc.

Based on dataflow analysis properties that

are independent of execution paths



7/32

Lexical analyzer

Syntax analyzer

Semantic analyzer


Code optimizer

Code generator

Source program

Target program

Native Code Generation

Intermediate code is translated into nativecode

Register allocation, instruction selection

Native Code Optimization

Peephole optimizations small window is

optimized at a time



8/32

Informal sketch of lexical analysis Identifies tokens in input string

Issues in lexical analysis Lookahead

Ambiguities

Specifying lexers Regular expression



9/32

Simplifies the design of the compiler LL(1) or LR(1) parsing with 1 indicates that the next

input symbol is used to decide the next parsingprocess.

Provides efficient implementation Systematic techniques to implement lexical analyzers

by hand or automatically from specifications Stream buffering methods to scan input

Improves portability Non-standard symbols and alternate character

encodings can be normalized.



10/32

Lexical

AnalyzerParser

SourceProgram

Token,tokenval

Symbol Table

Get nexttoken

error error



11/32

Lexical analyzer

y := 31 + 28*x

Parser

token

tokenval

(token attribute)



12/3206/19/09 12Copyright Ankita


13/32

A token is a classification of lexical units For example: id and num

Lexemes are the specific character strings that

make up a token For example: abc and 123

Patterns are rules describing the set of lexemes

belonging to a token For example: letter followed by letters and digits andnon-empty sequence of digits



14/32

A syntactic category In English:

noun, verb, adjective,

In a programming language:

Identifier, Integer, Single-Float, Double-Float, operator(perhaps single or multiple character), Comment,

Keyword, Whitespace, string constant,



15/32

This is the job of the Lexer, the first pass overthe source code. The Lexer classifies programsubstrings according to role.

We also tack on to each token the location (lineand column) of where it ends, so we can reporterrors in the source code by location.

We read tokens until we get to an end-of-file.



16/32

Output of lexical analysis is a stream of tokens . . .

This stream is the input to the next pass, the parser.

Parser makes use of token distinctions.



17/32

Define a finite set of tokens Tokens describe all items of interest

Choice of tokens depends on language, design of

parser\

Useful tokens for this expression:Integer, Keyword, operator, Identifier



18/32

Describe which strings belong to each tokencategory

Recall: Identifier: strings of letters or digits, starting with a

letter Integer: a non-empty string of digits

Keyword: else or if or begin or Whitespace: a non-empty sequence of blanks, newlines,

and tabs



19/32

Is it as easy as it sounds?

Not quite!

Look at some history . . .



20/32

FORTRAN rule: Whitespace is insignificant

E.g.,VAR1is the same asVA R1Footnote: FORTRAN whitespace rulemotivated by inaccuracy of punch card

operators



21/32

The goal of lexical analysis is to Partition the input string into lexemes

Identify the token type, and perhaps the value of

each lexeme .

Left-to-right scan => lookahead sometimes

required.



22/32

There are several formalisms for specifying tokens.

Regular languages are the most popular

Simple and useful theory Easy to understand

Efficient implementations are possible

Almost powerful enough (Cant do nested comments)

Popular almost automatic tools to write programs.

The standard notation for regular languages is regularexpressions.



23/32

Single character denotes a set of one string

Epsilon character denotes a set of one 0-lengthstring

Empty set is {}= not the same as . Size()=0. Size(e) =1

{ }' ' " "c c=

{ }"" =



24/32

Union: If A and B are REs then

Concatenation of Sets Concatenation ofstrings

Iteration (Kleene closure)

{ }| orA B s s A s B+ =

{ }| and AB ab a A b B=

*

0where ... times ...i i

i A A A A i A

= =U



25/32

Translate regular expressions to NFA

Translate NFA to an efficient DFA

regular

expressions NFA DFA

Simulate NFAto recognize

tokens

Simulate DFAto recognize

tokens

Optional



26/32

An NFA is a 5-tuple (S, , , s0, F) where

S is a finite set of states

is a finite set of symbols, the alphabet is a mapping from S to a set of statess0S is the start stateFS is the set of accepting (or final) states



27/32

An NFA can be diagrammatically representedby a labeled directed graph called a transition

graph

0start a 1 32

b b

a

b

S = {0,1,2,3}

= {a,b}s0 = 0

F= {3}



28/32

The mapping of an NFA can be representedin a transition table

StateInput

a

Input

b

0 {0, 1} {0}1 {2}

2 {3}

(0,a) = {0,1}

(0,b) = {0}(1,b) = {2}(2,b) = {3}



29/32

A deterministic finite automaton is a special caseof an NFA No state has an -transition

For each state s and input symbol a there is at mostone edge labeled a leaving s

Each entry in the transition table is a singlestate At most one path exists to accept a string Simulation algorithm is simple



30/32

State

Input

a

Input

b

0 {0, 1} {0}

{0,1} {0,1} {0,2}

{0,2} {0,1} {0,3}

{0,3} {0,1} {0}

(0,a) = {0,1}(0,b) = {0}({0,1},a) = {0,1}({0,1},b) = {0,2}({0,2},a)= {0,1}

({0,2},b)= {0,3}({0,3},a)= {0,1}({0,3},b)= {0}



31/32

0start a 1 32

b b

bb

a

a

a

A DFA that accepts (ab)*abb



32/32

Documents

Final Presentation (Minor)