Final Presentation (Minor)

  • Upload
    ankita

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 Final Presentation (Minor)

    1/32

    06/19/09 1Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    2/32

    Lexical analyzer

    Syntax analyzer

    Semantic analyzer

    Intermediatecode generator

    Code optimizer

    Code generator

    Source program

    Target program

    Lexical Analyzer

    Group sequence of characters into lexemes smallest meaningful entity in a language(keywords, identifiers, constants)

    Characters read from a file are buffered

    helps decrease latency due to i/o. Lexicalanalyzer manages the buffer

    Makes use of the theory of regular languagesand finite state machines

    06/19/09 2Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    3/32

    Lexical analyzer

    Syntax analyzer

    Semantic analyzer

    Intermediatecode generator

    Code optimizer

    Code generator

    Source program

    Target program

    Parser

    Convert a linear structure sequence oftokens to a hierarchical tree-like structure an AST

    The parser imposes the syntax rules of the

    language Work should be linear in the size of theinput (else unusable) type consistencycannot be checked in this phase

    Deterministic context free languages andpushdown automata for the basis

    06/19/09 3Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    4/32

    Lexical analyzer

    Syntax analyzer

    Semantic analyzer

    Intermediatecode generator

    Code optimizer

    Code generator

    Source program

    Target program

    Semantic Analysis

    Calculates the programs meaning

    Rules of the language are checked (variabledeclaration, type checking)

    Type checking also needed for codegeneration (code gen for a + b depends on thetype of a and b)

    06/19/09 4Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    5/32

    Lexical analyzer

    Syntax analyzer

    Semantic analyzer

    Intermediatecode generator

    Code optimizer

    Code generator

    Source program

    Target program

    Intermediate Code Generation

    Makes it easy to port compiler to otherarchitectures

    Can also be the basis for interpreters

    Enables optimizations that are not machinespecific

    06/19/09 5Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    6/32

    Lexical analyzer

    Syntax analyzer

    Semantic analyzer

    Intermediatecode generator

    Code optimizer

    Code generator

    Source program

    Target program

    Intermediate Code Optimization

    Constant propagation, dead codeelimination, common sub-expressionelimination, strength reduction, etc.

    Based on dataflow analysis properties that

    are independent of execution paths

    06/19/09 6Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    7/32

    Lexical analyzer

    Syntax analyzer

    Semantic analyzer

    Intermediatecode generator

    Code optimizer

    Code generator

    Source program

    Target program

    Native Code Generation

    Intermediate code is translated into nativecode

    Register allocation, instruction selection

    Native Code Optimization

    Peephole optimizations small window is

    optimized at a time

    06/19/09 7Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    8/32

    Informal sketch of lexical analysis Identifies tokens in input string

    Issues in lexical analysis Lookahead

    Ambiguities

    Specifying lexers Regular expression

    06/19/09 8Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    9/32

    Simplifies the design of the compiler LL(1) or LR(1) parsing with 1 indicates that the next

    input symbol is used to decide the next parsingprocess.

    Provides efficient implementation Systematic techniques to implement lexical analyzers

    by hand or automatically from specifications Stream buffering methods to scan input

    Improves portability Non-standard symbols and alternate character

    encodings can be normalized.

    06/19/09 9Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    10/32

    Lexical

    AnalyzerParser

    SourceProgram

    Token,tokenval

    Symbol Table

    Get nexttoken

    error error

    06/19/09 10Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    11/32

    Lexical analyzer

    y := 31 + 28*x

    Parser

    token

    tokenval

    (token attribute)

    06/19/09 11Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    12/3206/19/09 12Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    13/32

    A token is a classification of lexical units For example: id and num

    Lexemes are the specific character strings that

    make up a token For example: abc and 123

    Patterns are rules describing the set of lexemes

    belonging to a token For example: letter followed by letters and digits andnon-empty sequence of digits

    06/19/09 13Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    14/32

    A syntactic category In English:

    noun, verb, adjective,

    In a programming language:

    Identifier, Integer, Single-Float, Double-Float, operator(perhaps single or multiple character), Comment,

    Keyword, Whitespace, string constant,

    06/19/09 14Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    15/32

    This is the job of the Lexer, the first pass overthe source code. The Lexer classifies programsubstrings according to role.

    We also tack on to each token the location (lineand column) of where it ends, so we can reporterrors in the source code by location.

    We read tokens until we get to an end-of-file.

    06/19/09 15Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    16/32

    Output of lexical analysis is a stream of tokens . . .

    This stream is the input to the next pass, the parser.

    Parser makes use of token distinctions.

    06/19/09 16Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    17/32

    Define a finite set of tokens Tokens describe all items of interest

    Choice of tokens depends on language, design of

    parser\

    Useful tokens for this expression:Integer, Keyword, operator, Identifier

    06/19/09 17Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    18/32

    Describe which strings belong to each tokencategory

    Recall: Identifier: strings of letters or digits, starting with a

    letter Integer: a non-empty string of digits

    Keyword: else or if or begin or Whitespace: a non-empty sequence of blanks, newlines,

    and tabs

    06/19/09 18Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    19/32

    Is it as easy as it sounds?

    Not quite!

    Look at some history . . .

    06/19/09 19Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    20/32

    FORTRAN rule: Whitespace is insignificant

    E.g.,VAR1is the same asVA R1Footnote: FORTRAN whitespace rulemotivated by inaccuracy of punch card

    operators

    06/19/09 20Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    21/32

    The goal of lexical analysis is to Partition the input string into lexemes

    Identify the token type, and perhaps the value of

    each lexeme .

    Left-to-right scan => lookahead sometimes

    required.

    06/19/09 21Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    22/32

    There are several formalisms for specifying tokens.

    Regular languages are the most popular

    Simple and useful theory Easy to understand

    Efficient implementations are possible

    Almost powerful enough (Cant do nested comments)

    Popular almost automatic tools to write programs.

    The standard notation for regular languages is regularexpressions.

    06/19/09 22Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    23/32

    Single character denotes a set of one string

    Epsilon character denotes a set of one 0-lengthstring

    Empty set is {}= not the same as . Size()=0. Size(e) =1

    { }' ' " "c c=

    { }"" =

    06/19/09 23Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    24/32

    Union: If A and B are REs then

    Concatenation of Sets Concatenation ofstrings

    Iteration (Kleene closure)

    { }| orA B s s A s B+ =

    { }| and AB ab a A b B=

    *

    0where ... times ...i i

    i A A A A i A

    = =U

    06/19/09 24Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    25/32

    Translate regular expressions to NFA

    Translate NFA to an efficient DFA

    regular

    expressions NFA DFA

    Simulate NFAto recognize

    tokens

    Simulate DFAto recognize

    tokens

    Optional

    06/19/09 25Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    26/32

    An NFA is a 5-tuple (S, , , s0, F) where

    S is a finite set of states

    is a finite set of symbols, the alphabet is a mapping from S to a set of statess0S is the start stateFS is the set of accepting (or final) states

    06/19/09 26Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    27/32

    An NFA can be diagrammatically representedby a labeled directed graph called a transition

    graph

    0start a 1 32

    b b

    a

    b

    S = {0,1,2,3}

    = {a,b}s0 = 0

    F= {3}

    06/19/09 27Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    28/32

    The mapping of an NFA can be representedin a transition table

    StateInput

    a

    Input

    b

    0 {0, 1} {0}1 {2}

    2 {3}

    (0,a) = {0,1}

    (0,b) = {0}(1,b) = {2}(2,b) = {3}

    06/19/09 28Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    29/32

    A deterministic finite automaton is a special caseof an NFA No state has an -transition

    For each state s and input symbol a there is at mostone edge labeled a leaving s

    Each entry in the transition table is a singlestate At most one path exists to accept a string Simulation algorithm is simple

    06/19/09 29Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    30/32

    State

    Input

    a

    Input

    b

    0 {0, 1} {0}

    {0,1} {0,1} {0,2}

    {0,2} {0,1} {0,3}

    {0,3} {0,1} {0}

    (0,a) = {0,1}(0,b) = {0}({0,1},a) = {0,1}({0,1},b) = {0,2}({0,2},a)= {0,1}

    ({0,2},b)= {0,3}({0,3},a)= {0,1}({0,3},b)= {0}

    06/19/09 30Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    31/32

    0start a 1 32

    b b

    bb

    a

    a

    a

    A DFA that accepts (ab)*abb

    06/19/09 31Copyright Ankita

  • 8/14/2019 Final Presentation (Minor)

    32/32