01_OverviewCompilers

Embed Size (px)

Citation preview

  • 8/13/2019 01_OverviewCompilers

    1/21

    Compiler Design

    1. Overview

    CIS 631, CSE 691, CIS400, CSE 400

    Kanat BolazarJanuary 19, 2010

  • 8/13/2019 01_OverviewCompilers

    2/21

    2

    Compilers

    Compilers translate from a source language(typically a highlevel language) to a functionally equivalent target language

    (typically the machine code of a particular machine or a

    machine-independent virtual machine).

    Compilers for high level programming languages are amongthe larger and more complex pieces of software

    Original languages included Fortran and Cobol

    Often multi-pass compilers (to facilitate memory reuse)

    Compiler development helped in better programming language design Early development focused on syntactic analysis and optimization

    Commercially, compilers are developed by very large software groups

    Current focus is on optimization and smart use of resources for

    modern RISC (reduced instruction set computer) architectures.

  • 8/13/2019 01_OverviewCompilers

    3/21

    3

    Why Study Compilers?

    General background information for good software engineer

    Increases understanding of language semantics

    Seeing the machine code generated for languageconstructs helps understand performance issues for

    languages Teaches good language design

    New devices may need device-specific languages

    New business fields may need domain-specific

    languages

  • 8/13/2019 01_OverviewCompilers

    4/21

    4

    Applications of Compiler Technology & Tools

    Processing XML/other to generate documents, code, etc.

    Processing domain-specific and device-specific languages.

    Implementing a server that uses a protocol such as http or

    imap

    Natural language processing, for example, spam filter,

    search, document comprehension, summary generation

    Translating from a hardware description language to the

    schematic of a circuit Automatic graph layout (graphviz, for example)

    Extending an existing programming language

    Program analysis and improvement tools

  • 8/13/2019 01_OverviewCompilers

    5/21

    5

    Dynamic Structure of a Compiler

    character stream v a l = 01 * v a l + i

    lexical analysis (scanning)

    token stream 1ident

    "val"

    3assign

    -

    2number

    10

    4times

    -

    1ident

    "val"

    5plus

    -

    1ident

    "i"

    token number

    token value

    syntax analysis (parsing)

    syntax tree

    ident = number * ident + ident

    Term

    Expression

    Statement

    Front end

    (analysis)

  • 8/13/2019 01_OverviewCompilers

    6/21

    6

    Dynamic Structure of a Compiler

    semantic analysis (type checking, ...)

    syntax tree

    ident = number * ident + ident

    Term

    Expression

    Statement

    intermediate

    representationsyntax tree, symbol table, or three address code (TAC) ...

    optimization

    code generation

    const 10load 1

    mul...

    machine code

    Front end

    Back end

    (synthesis)

  • 8/13/2019 01_OverviewCompilers

    7/21

    7

    Compiler versus Interpreter

    Compiler translates to machine code

    scanner parser ... code generator loader

    source code machine code

    Variant: interpretation of intermediate code

    ... compiler ...

    source code intermediate code

    (e.g. Java bytecode)

    VM source code is translated into the

    code of a virtual machine(VM)

    VM interprets the code

    simulating the physical machine

    Interpreter executes source code "directly"

    scanner parser

    source code interpretation

    statements in a loop are

    scanned and parsed

    again and again

  • 8/13/2019 01_OverviewCompilers

    8/21

    8

    Static Structure of a Compiler

    parser &

    sem. analysis

    scanner

    symbol table

    code generation

    provides tokens from

    the source code

    maintains information about

    declared names and types

    generates machine code

    "main program"

    directs the whole compilation

    uses

    data flow

  • 8/13/2019 01_OverviewCompilers

    9/21

    9

    Lexical Analysis

    Stream of characters is grouped into tokens

    Examples of tokens are identifiers, reserved words, integers, doubles orfloats, delimiters, operators and special symbols

    int a;

    a = a + 2;

    int reserved worda identifier; special symbola identifier

    = operatora identifier+ operator2 integer constant; special symbol

  • 8/13/2019 01_OverviewCompilers

    10/21

  • 8/13/2019 01_OverviewCompilers

    11/21

  • 8/13/2019 01_OverviewCompilers

    12/21

    12

    Intermediate Code Generation

    An intermediate code representation often helps contain

    complexity of compiler and discover code optimizations.

    Typical choices include:

    Annotated parse trees

    Three Address Code (TAC), and abstract machine language

    Bytecode, as in Java bytecode.

    Example statements:

    if (a b

    if_t1 gotoL0_t2 = ac

    a = _t2

    L0: _t3 = b * c

    C = _t3

  • 8/13/2019 01_OverviewCompilers

    13/21

    13

    Intermediate Code Generation (cont'd)

    Example statements:

    if (a )

    v1 v3store(v1)

    v2 v3 * store(v3)

    Java bytecode (javap -c):

    55: iload_1

    56: iload_2

    57: if_icmpgt 64

    60: iload_1

    61: iload_3

    62: isub

    63: istore_1

    64: iload_2

    65: iload_3

    66: imul

    67: istore_3

  • 8/13/2019 01_OverviewCompilers

    14/21

    14

    Code Optimization

    Compiler converts the intermediate representation to another

    one that attempts to be smaller and faster.

    Typical optimizations:

    Inhibit code generation for unreachable segments

    Getting rid of unused variables

    Eliminating multiplication by 1 and addition by 0

    Loop optimization: e.g. removing statements not modified in the

    loop

    Common sub-expression elimination . . .

  • 8/13/2019 01_OverviewCompilers

    15/21

    15

    Object Code Generation

    The target program is generated in the machine language of

    the target architecture.

    Memory locations are selected for each variable

    Instructions are chosen for each operation

    Individual tree nodes or TAC is translated into a sequence ofmachine language instructions that perform the same task

    Typical machine language instructions include things like

    Load register

    Add register to memory location Store register to memory

    . . .

  • 8/13/2019 01_OverviewCompilers

    16/21

  • 8/13/2019 01_OverviewCompilers

    17/21

    17

    Symbol Table

    Symbol table management is a part of the compiler that

    interacts with several of the phases

    Identifiers are found in lexical analysis and placed in the symbol

    table

    During syntactical and semantical analysis, type and scopeinformation is added

    During code generation, type information is used to determine what

    instructions to use

    During optimization, the live analysis may be kept in the symbol

    table

  • 8/13/2019 01_OverviewCompilers

    18/21

    18

    Error Handling

    Error handling and reporting also occurs across many phases

    Lexical analyzer reports invalid character sequences

    Syntactic analyzer reports invalid token sequences

    Semantic analyzer reports type and scope errors, and the like

    The compiler may be able to continue with some errors, butother errors may stop the process

  • 8/13/2019 01_OverviewCompilers

    19/21

  • 8/13/2019 01_OverviewCompilers

    20/21

    20

    Example MicroJava ProgramprogramP

    finalint size = 10;

    classTable {int[] pos;int[] neg;

    }

    Table val;{voidmain()

    int x, i;{ //---------- initialize val ----------val = newTable;val.pos = newint[size];val.neg = newint[size];i = 0;while(i < size) {

    val.pos[i] = 0; val.neg[i] = 0; i = i + 1;}//---------- read values ----------read(x);while(x != 0) {

    if(x > 0) val.pos[x] = val.pos[x] + 1;else if(x < 0) val.neg[-x] = val.neg[-x] + 1;read(x);

    }}

    }

    main program; no separate compilation

    classes (without methods)

    global variables

    local variables

  • 8/13/2019 01_OverviewCompilers

    21/21

    21

    References

    Original slides: Nancy McCracken.

    Niklaus Wirth, Compiler Construction, chapters 1 and 2

    Course notes from H. Mossenback, System Specification and Compiler

    Construction, http://www.ssw.uni-linz.ac.at/Misc/CC/

    Also notes on MicroJava Course notes from Jerry Cain, Compilers,

    http://www.stanford.edu/class/cs143/

    General references:

    Aho, A., Lam, M., Sethi, R., Ullman, J., Compilers: Principles,

    Techniques and Tools, 2ndEdition, Addison-Wesley, 2006. Steven Muchnik, Advanced Compiler Design and Implementation,

    Morgan-Kaufmann, 1997.

    Keith Cooper and Linda Torczon, Engineering a Compiler,

    Morgan-Kaufmann, 2003.

    http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.stanford.edu/class/cs143/http://www.stanford.edu/class/cs143/http://www.stanford.edu/class/cs143/http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.ssw.uni-linz.ac.at/Misc/CC/