System Programming Unit-3

Embed Size (px)

Citation preview

  • 8/2/2019 System Programming Unit-3

    1/27

    Debre Berhan University School Of Computing

    PRINCIPLES OF COMPILER DESIGN

    1. Introduction to compilers:-A compiler is a program that reads a program written in one language (source language

    (or) high level language) and translates it into an equivalent program in another language.(Target language (or) low level language)

    Source program Target Program( High Level Language) (Low Level Language)

    Compiler:- It converts the high level language into an equivalent low level language program.

    Assembler:- It converts an assembly language(low level language) into machine code.(binaryrepresentation)

    PHASES OF COMPILER

    There are two parts to compilation. They are

    (i) Analysis Phase(ii) Synthesis Phase

    Source Program

    Target Program

    Analysis Phase:-The analysis phase breaks up the source program into constituent pieces. The analysis

    phase of a compiler performs,1. Lexical analysis2. Syntax Analysis3. Semantic Analysis

    Principles of Compiler Design- Rajasekhar24

    COMPILER

    Syntax Analyzer

    Lexical Analyzer

    Semantic Analyzer

    Intermediate Code Generator

    Code Optimizer

    Code Generator

    Error HandlerSymbol Table Manager

  • 8/2/2019 System Programming Unit-3

    2/27

    Debre Berhan University School Of Computing

    1.Lexical Analysis (or) Linear Analysis (or) Scanning:-

    The lexical analysis phase reads the characters in the program and groups them into

    tokens that are sequence of characters having a collective meaning.

    Such as an Identifier, a Keyboard, a Punctuation, character or a multi character operator like

    ++.

    The character sequence forming a token is called lexeme

    For Eg. Pos = init + rate * 60

    Lexeme Token Attribute value

    rate

    +

    60

    init

    ID

    ADD

    num

    ID

    Pointer to symbol table

    60

    Pointer to symbol table

    2. Syntax Analysis (or) Hierarchical Analysis :-

    Syntax analysis processes the string of descriptors (tokens), synthesized by the lexical

    analyzer, to determine the syntactic structure of an input statement. This process is known

    as parsing.

    ie, Output of the parsing step is a representation of the syntactic structure of a statement.

    Example:-

    pos = init + rate * 60

    =

    pos +

    init *

    rate 603. Semantic Analysis:-

    The semantic analysis phase checks the source program forsemantic errors.

    Processing performed by the semantic analysis step can classified intoa. Processing of declarative statementsb. Processing of executable statements

    During semantic processing of declarative statements items of information are added to thelexical tables.Example:- (symbol table or lexical table)

    real a, b;

    Principles of Compiler Design- Rajasekhar25

  • 8/2/2019 System Programming Unit-3

    3/27

    Debre Berhan University School Of Computing

    id A real length

    id B real length ..

    Synthesis Phase:-1. Intermediate code generation2. Code optimization3. Code Generator

    1. Intermediate code generation:-After syntax and semantic analysis some compilers generate an explicit intermediate

    representation of the source program. This intermediate representation should have twoimportant properties.

    a. It should be easy to produceb. It should be easy to translate into the target program.

    We consider the intermediate form called Three Address Code. It consists of sequence ofinstructions, each of which has atmost three operands.

    Example:-pos = init + rate * 60pos = init + rate * int to real (60)

    Might appear in three address code as,

    temp1 = int to real (60)temp2 = id3 * temp1temp3 = id2 + temp2id1 = temp3

    =

    id1 +

    id2 *id3 60

    2. Code Optimization:-The code optimization phase attempts to improve the intermediate code, so that faster

    running machine code will result.

    3. Code Generation:-The final phase of the compiler is the generation of the target code or machine code or

    assembly code.Memory locations are selected for each of the variables used by the program. Thenintermediate instructions are translated into a sequence of machine instructions that performthe same task.

    Example:-MOV F id3, R2MUL F #60.0, R2

    Principles of Compiler Design- Rajasekhar26

  • 8/2/2019 System Programming Unit-3

    4/27

    Debre Berhan University School Of Computing

    MOV F id2, R1ADD F R2, R1MOV F R1,id1

    Translation of a statement pos = init + rate * 60

    id1 = id2 + id3 * 60

    =

    id1 +

    id2 *id3 60

    =

    id1 +

    id2 *

    id3 int to real60

    temp1 = int to real (60)temp2 = id3 * temp1temp3 = id2 + temp2id1 = temp3

    temp1 = id3 * 60.0id1 = id2 + temp1

    Principles of Compiler Design- Rajasekhar27

    Lexical Analyzer

    Syntax Analyzer

    Semantic Analyzer

    Intermediate Code Generator

    Code Optimizer

    Code Generator

  • 8/2/2019 System Programming Unit-3

    5/27

    Debre Berhan University School Of Computing

    MOV F id3, R2MUL F #60.0, R2MOV F id2, R1ADD F R2, R1MOV F R1,id1

    Role of Lexical Analyzer:-

    The main task is to read the input characters and produce as output a sequence oftokens that the parser uses for syntax analysis.

    Source TokenProgram

    Get next Token

    After receiving a get next token command from the parser, the lexicalanalyzer reads input characters until it can identify a next token.

    Token:-Token is a sequence of characters that can be treated as a single logical entity.

    Typical tokens are,(a) Identifiers

    (b) Keywords(c) Operators(d) Special symbols(e) Constants

    Pattern:-A set of strings in the input for which the same token is produced as output, this set

    of strings is called pattern.

    Lexeme:-A lexeme is a sequence of characters in the source program that is matched by the

    pattern for a token.

    Finite Automata

    Definition:-A recognizer for a language is a program that takes as input a string x and answers

    yes if x is a sentence of the language and no otherwise.A better way to convert a regular expression to a recognizer is to construct a

    generalized transition diagram from the expression. This diagram is called finite automation.

    Principles of Compiler Design- Rajasekhar28

    Lexical Analyzer Parser

    Symbol Table

  • 8/2/2019 System Programming Unit-3

    6/27

    Debre Berhan University School Of Computing

    A finite automation can be,1. Deterministic finite automata2. Non-Deterministic finite automata

    Principles of Compiler Design- Rajasekhar29

  • 8/2/2019 System Programming Unit-3

    7/27

    Debre Berhan University School Of Computing

    1. Non deterministic Finite Automata:- [ NFA]A NFA is a mathematical model that consists of,

    1. a set of states S2. a set of input symbol 3. a transition function 4. a state S0 that is distinguished as start state5. a set of states F distinguished as accepting state. It is indicated by double

    circle.

    Example:-The transition graph for an NFA that recognizes the language (a/b)* a

    a

    start a

    b

    The transition table is,

    StateInput Symbol

    a b

    0 0,1 0

    1 - -

    2. Deterministic Finite Automata:- [DFA]A DFA is a special case of non deterministic finite automata in which,

    1. No state has an transition2. For each state S and input symbol there is almost one edge labeled a leaving

    S.

    PROBLEM:-

    1. Construct a non deterministic finite automata for a regular expression (a/b)*Solution;-

    r = (a/b)*Decomposition of (a/b)* (parse tree)

    r5

    r4 *

    ( r3 )

    r1 / r2

    Principles of Compiler Design- Rajasekhar30

    0

    1

  • 8/2/2019 System Programming Unit-3

    8/27

    Debre Berhan University School Of Computing

    a b

    Principles of Compiler Design- Rajasekhar31

  • 8/2/2019 System Programming Unit-3

    9/27

    Debre Berhan University School Of Computing

    For r1 construct NFA start a

    For r1 construct NFA start b

    a

    NFA for r3 = r1/r2 start

    b

    NFA for r4, that is (r3) is the same as that for r3.

    NFA for r5 = (r3)* a

    start

    b

    Principles of Compiler Design- Rajasekhar32

    2

    3

    2

    3

    1

    2

    4

    3

    5

    6

    5

    1

    2

    4

    3

    70

    6

  • 8/2/2019 System Programming Unit-3

    10/27

    Debre Berhan University School Of Computing

    2. Construct a non deterministic finite automata for a regular expression (a/b)*abbSolution;-

    r = (a/b)*Decomposition of (a/b)* abb (parse tree)

    r11

    r9 r10

    r7 r8 b

    r5 r6 b

    r4 * a

    ( r3 )

    r1 / r2

    a b

    For r1 construct NFA start a

    For r1 construct NFA start b

    a

    NFA for r3 = r1/r2 start

    b

    NFA for r4, that is (r3) is the same as that for r3.

    NFA for r5 = (r3)* a

    start

    b

    Principles of Compiler Design- Rajasekhar33

    2

    3

    2

    3

    1

    2

    4

    3

    5

    6

    1

    2

    4

    3

    5

    70

    6

  • 8/2/2019 System Programming Unit-3

    11/27

    Debre Berhan University School Of Computing

    Principles of Compiler Design- Rajasekhar34

  • 8/2/2019 System Programming Unit-3

    12/27

    Debre Berhan University School Of Computing

    NFA for r6=a start a

    NFA for r7= r5.r6

    a

    start a

    b

    NFA for r8 = bstart b

    NFA for r9 = r7. r8

    a

    start a b

    b

    NFA for r10 =b start b

    NFA for r11 = r9.r10 = (a/b)* abb

    a

    start a b b

    b

    Principles of Compiler Design- Rajasekhar35

    7 8

    1

    2

    4

    3

    5

    80

    6

    7

    8 9

    1

    2

    4

    3

    5

    90

    6 87

    9 10

    1

    2

    4

    3

    5

    1

    0

    0

    6 87 9

  • 8/2/2019 System Programming Unit-3

    13/27

    Debre Berhan University School Of Computing

    Principles of Compiler Design- Rajasekhar36

  • 8/2/2019 System Programming Unit-3

    14/27

    Debre Berhan University School Of Computing

    CONVERSION OF NFA INTO DFA

    1. Convert the NFA (a/b)* into DFA?Solution:

    The NFA for (a/b)* is, a

    start

    b

    closure {0} = { 0,1,2,4,7} -------------- ATransition of input symbol a on A = { 3 }Transition of input symbol b on A = { 5 }

    closure {3} = {3,6,1,2,4,7} ------------ B

    Transition of input symbol a on B = { 3 }Transition of input symbol b on B = { 5 }

    closure {5} = {5,6,1,2,4,7} ------------ CTransition of input symbol a on C = { 3 }Transition of input symbol b on C = { 5 }

    Since A is the start state and state C is the only accepting state then, the transition table is,

    StateInput symbol

    a b

    ABC

    BBB

    CCC

    The DFA is,

    a a b

    Start a b

    b

    Principles of Compiler Design- Rajasekhar37

    5

    1

    2

    4

    3

    70 6

    A B C

  • 8/2/2019 System Programming Unit-3

    15/27

    Debre Berhan University School Of Computing

    2. Convert the NFA (a/b)*abb into DFA?Solution:

    The NFA for (a/b)*abb is,

    a

    start a b b

    b

    closure {0} = { 0,1,2,4,7} -------------- ATransition of input symbol a on A = { 3,8 }Transition of input symbol b on A = { 5 }

    closure {3,8} = { 3,6,7,1,2,4,8} -------------- BTransition of input symbol a on B = { 8,3 }Transition of input symbol b on B = { 5,9 }

    closure {5} = { 5,6,7,1,2,4} -------------- CTransition of input symbol a on C = { 8,3 }Transition of input symbol b on C = { 5 }

    closure {5,9} = { 5,6,7,1,2,4,9} -------------- DTransition of input symbol a on D = { 8,3 }Transition of input symbol b on D = { 5,10 }

    closure {5,10} = { 5,6,7,1,2,4,10} -------------- ETransition of input symbol a on E = { 8,3 }Transition of input symbol b on E = { 5 }

    Since A is the start state and state E is the only accepting state then, the transition table is,

    StateInput symbol

    a b

    AB

    CDE

    BB

    BBB

    CD

    CEC

    Principles of Compiler Design- Rajasekhar38

    1

    2

    4

    3

    5

    10

    0

    6 87 9

  • 8/2/2019 System Programming Unit-3

    16/27

    Debre Berhan University School Of Computing

    Principles of Compiler Design- Rajasekhar39

  • 8/2/2019 System Programming Unit-3

    17/27

    Debre Berhan University School Of Computing

    b

    bb a

    astart a c b

    aa

    MINIMIZATION OF STATES

    Problem 1: Construct a minimum state DFA for a regular expression (a/b)* abb

    Solution:-1. The NFA of (a/b)*abb is

    a

    start a b b

    b

    2. Construct a DFA:

    closure {0} = { 0,1,2,4,7} -------------- A

    Transition of input symbol a on A = { 3,8 }Transition of input symbol b on A = { 5 }

    closure {3,8} = { 3,6,7,1,2,4,8} -------------- BTransition of input symbol a on B = { 8,3 }Transition of input symbol b on B = { 5,9 }

    closure {5} = { 5,6,7,1,2,4} -------------- C

    Principles of Compiler Design- Rajasekhar40

    EA B D

    C

    1

    2

    4

    3

    5

    10

    0

    6 87 9

  • 8/2/2019 System Programming Unit-3

    18/27

    Debre Berhan University School Of Computing

    Transition of input symbol a on C = { 8,3 }Transition of input symbol b on C = { 5 }

    closure {5,9} = { 5,6,7,1,2,4,9} -------------- DTransition of input symbol a on D = { 8,3 }Transition of input symbol b on D = { 5,10 }

    closure {5,10} = { 5,6,7,1,2,4,10} -------------- ETransition of input symbol a on E = { 8,3 }Transition of input symbol b on E = { 5 }

    Since A is the start state and state E is the only accepting state then, the transition table is,

    StateInput symbol

    a b

    ABCDE

    BBBBB

    CDCEC

    3. Minimizing the DFA

    Let = ABCDEThe initial partition consists of two groups.1 = ABCD ( that is the non accepting states)2 = E ( that is the accepting state)

    So, (ABCD) (E)

    AB

    a aA B B B

    b b

    A C B D

    AC

    a aA B C B

    b b

    A C C C

    Principles of Compiler Design- Rajasekhar41

  • 8/2/2019 System Programming Unit-3

    19/27

    Debre Berhan University School Of Computing

    AD

    a aA B D B

    b b

    A C D E

    On input a each of these states has a transition to B, so they could all remain in one groupas far as input a is concerned.On input b A,B,C go to members of the group 1 (ABCD) while D goes to 2 (E) . Thus1 group is split into two new groups.

    1 = ABC 2 = D , 3 = ESo, (ABC) (D) (E)

    AB

    a aA B B B

    b b

    A C B D

    Here B goes to 2. Thus 1 group is again split into two new groups. The new groups are,

    1 = AC 2 = B , 3 = D, 4 = ESo, (AC) (B) (D) (E)

    Here we cannot split any of the groups consisting of the single state. The only possibility istry to split only (AC)

    For AC

    a aA B C B

    b b

    A C C C

    But A and C go the same state B on input a, and they go to the same state C on input b.Hence after this,(AC) (B) (D) (E)

    Principles of Compiler Design- Rajasekhar42

  • 8/2/2019 System Programming Unit-3

    20/27

    Debre Berhan University School Of Computing

    Here we choose A as the representative for the group AC.Thus A is the start state and state E is the only accepting state.

    Principles of Compiler Design- Rajasekhar43

  • 8/2/2019 System Programming Unit-3

    21/27

    Debre Berhan University School Of Computing

    So the minimized transition table is,

    StateInput symbol

    a b

    ABDE

    BBBB

    ADEA

    Thus the minimized DFA is,

    bb

    astart a b b

    aa

    ______________________________________________________________________________

    SYNTAX ANALYSIS

    Definition of Context free Grammar:- [CFG]A CFG has four components.1. a set of Tokens known as Terminal symbols.2. a set of non-terminals3. start symbol4. production.

    Notational Conventions:-

    a) These symbols are terminals. (Ts)(i) Lower case letters early in the alphabet such as a,b,c(ii) Operator symbols such as +, -, etc.(iii) Punctuation symbols such as parenthesis, comma etc.

    (iv) The digits 0, 1, 2, 3, , 9(v) Bold face Strings.

    b) These symbols are Non-Terminals (NTs)(i) Upper case letters early in the alphabet such as A, B, C(ii) The letter S, which is the start symbol.(iii) Lower case italic names such as expr, stmt.

    c) Uppercase letters such as X, Y, Z represent grammar symbols either NTs or Ts.

    Principles of Compiler Design- Rajasekhar44

    EA B D

  • 8/2/2019 System Programming Unit-3

    22/27

    Debre Berhan University School Of Computing

    PARSER:

    A parser for a grammar G is a program that takes a string W as input and produces eithera parse tree for W, if W is a sentence of G or an error message indicating that W is not asentence of G as output.

    There are two basic types of parsers for CFG.1. Bottom up Parser2. Top down Parser

    1. Bottom up Parser:-The bottom up parser build parse trees from bottom (leaves) to the top (root). The

    input to the parser is being scanned from left to right, one symbol at a time. This is also called

    as Shift Reduce Parsing because it consist of shifting input symbols onto a stack until the

    right side of a production appears on top of the stack.

    There are two kinds of shift reduce parser (Bottom up Parser)1. Operator Precedence Parser2. LR Parser ( move general type)

    Operator Precedence Parsing;-

    In operator precedence parsing we use three disjoint relations.

    < if a < b means a yield precedence to b

    = if a = b means a has same precedence as b

    > if a > b means a takes precedence over bThere are two common ways of determiningprecedence relation hold between a pair of

    terminals.

    1. Based on associativity and precedence of operators

    2. Using operator precedence relation.

    For Ex, * have higher precedence than +. We make + < * and * > +

    Problem 1:- Create an operator precedence relation for id+id*id$

    id + * $Id - > > >

    + < > < >

    * < > > >

    $ < < < -

    Problem 2: Tabulate the operator precedence relation for the grammar

    EE+E | E-E | E*E | E/E | E E | (E) | -E | id

    Principles of Compiler Design- Rajasekhar45

  • 8/2/2019 System Programming Unit-3

    23/27

    Debre Berhan University School Of Computing

    Solution:-Assuming 1. has highest precedence and right associative

    2. * and / have next higher precedence and left associative3. + and have lowest precedence and left associative

    + - * / id ( ) $

    + > > < < < < < > >

    - > > < < < < < > >

    * > > > > < < < > >

    / > > > > < < < > >

    > > > > < < < > >

    id > > > > > - - > >

    ( < < < < < < < = -

    ) > > > > > - - > >

    $ < < < < < < < - -

    Derivations:-

    The central idea is that a production is treated as a rewriting rule in which the non-

    terminal in the left side is replaced by the string on the right side of the production.

    For Ex, consider the following grammar for arithmetic expression,

    EE+E | E*E | (E) | -E |id

    That is we can replace a single E by E. we describe this action by writing

    E => -E , which is read E derives E

    E(E) tells us that we could also replace by (E).

    So, E*E => (E) * E or E*E => E* (E)

    We can take a single E and repeatedly apply production in any order to obtain sequence of

    replacements.

    E => -E

    E => -(E)

    E => -(id)

    We call such sequence of replacements is called derivation.

    Parse trees & Derivations:-

    A parse tree is a graphical representation for a derivation that filters out the choice

    regarding replacement order.

    For a given CFG a parse tree is a tree with the following properties.

    1. The root is labeled by the start symbol

    Principles of Compiler Design- Rajasekhar46

  • 8/2/2019 System Programming Unit-3

    24/27

    Debre Berhan University School Of Computing

    2. Each leaf is labeled by a token or

    3. Each interior node is labeled by a NT

    Ex.

    E => -E E

    - E

    E => -(E) E

    - E

    ( E )E => -(E+E) E

    - E

    ( E )

    E + E

    E => -(id+E) E

    - E

    ( E )

    E + E

    id

    E => -(id+E) E

    - E

    ( E )

    E + E

    id id

    INTERMEDIATE CODE GENERATION

    A compiler while translating a source program into a functionally equivalent object

    code representation may first generate an intermediate representation.

    Principles of Compiler Design- Rajasekhar47

  • 8/2/2019 System Programming Unit-3

    25/27

    Debre Berhan University School Of Computing

    Advantages of generating intermediate representation

    1. Ease of conversion from the source program to the intermediate code

    2. Ease with which subsequent processing can be performed from the intermediate code

    Parse Tree Intermediate Code

    INTERMEDIATE LANGUAGES:There are three kinds of Intermediate representation. They are,

    1. Syntax Trees2. Postfix Notation3. Three address code

    1. Syntax Tree:-

    A syntax tree depicts the natural hierarchical structure of a source program. A DAG

    (Direct Acyclic Graph) gives the same information but in a more compact way becausecommon sub expressions are identified.

    A syntax tree and dag for the assignment statement a:= b* -c + b* -c

    assign

    a +

    Syntax Tree* *

    b uminus b uminus

    c c

    assign

    a +

    DAG

    *

    b uminus

    c2. Postfix notation:-

    Post fix notation is a linearized representation of a syntax tree. It is a list of nodes of

    the tree in which a node appears immediately after its children.

    Principles of Compiler Design- Rajasekhar48

    Parser Intermediate codeGenerator

    CodeGenerator

  • 8/2/2019 System Programming Unit-3

    26/27

    Debre Berhan University School Of Computing

    The postfix notation for the syntax tree is,

    a b c uminus * b c uminus * + assign

    3. Three Address Code:-

    Three Address code is a sequence of statements of the general form

    x := y op z

    where x,y and z are names, constants or compiler generated temporaries.

    op stands for any operator such as a fixed or floating point arithmetic operator or a logical

    operator on a Boolean valued data.

    The Three Address Code for the source language expression like x+y*z is,

    t1:= y * z

    t2 := x + t1

    Where t1 and t2 are compiler generated temporary names

    So, three address code is a linearized representation of a syntax tree or a dag in which explicit

    names correspond to the interior nodes of the graph.

    Three Address Code Corresponding to the syntax tree and DAG is,

    Code for Syntax Tree

    t1 := -c

    t2 := b * t1

    t3 := -c

    t4 := b * t3

    t5 := t2 + t4

    a := t5

    Code for DAG

    t1 := -c

    t2 := b * t1

    t5 := t2 + t2

    a := t5

    Types of Three Address Statements:-

    1. Assignment statement of the form x := y op z

    2. Assignment instructions of the form x := op z

    where op is a unary operation.

    Principles of Compiler Design- Rajasekhar49

  • 8/2/2019 System Programming Unit-3

    27/27

    Debre Berhan University School Of Computing

    3. Copy statements of the form x := y

    where, the value of y is assigned to x.

    4. The Unconditional Jump GOTO L

    5. Conditional Jumps such as if x relop y goto l

    6. param x and call p, n for procedure calls and return y.

    7. Indexed assignments of the form x := y[i] and x[i] := y

    8. Address and pointer assignments, x :=&y, x := *y and *x := y

    Principles of Compiler Design- Rajasekhar50