Upload
chandra-bhushan-sah
View
217
Download
0
Embed Size (px)
Citation preview
8/13/2019 01_OverviewCompilers
1/21
Compiler Design
1. Overview
CIS 631, CSE 691, CIS400, CSE 400
Kanat BolazarJanuary 19, 2010
8/13/2019 01_OverviewCompilers
2/21
2
Compilers
Compilers translate from a source language(typically a highlevel language) to a functionally equivalent target language
(typically the machine code of a particular machine or a
machine-independent virtual machine).
Compilers for high level programming languages are amongthe larger and more complex pieces of software
Original languages included Fortran and Cobol
Often multi-pass compilers (to facilitate memory reuse)
Compiler development helped in better programming language design Early development focused on syntactic analysis and optimization
Commercially, compilers are developed by very large software groups
Current focus is on optimization and smart use of resources for
modern RISC (reduced instruction set computer) architectures.
8/13/2019 01_OverviewCompilers
3/21
3
Why Study Compilers?
General background information for good software engineer
Increases understanding of language semantics
Seeing the machine code generated for languageconstructs helps understand performance issues for
languages Teaches good language design
New devices may need device-specific languages
New business fields may need domain-specific
languages
8/13/2019 01_OverviewCompilers
4/21
4
Applications of Compiler Technology & Tools
Processing XML/other to generate documents, code, etc.
Processing domain-specific and device-specific languages.
Implementing a server that uses a protocol such as http or
imap
Natural language processing, for example, spam filter,
search, document comprehension, summary generation
Translating from a hardware description language to the
schematic of a circuit Automatic graph layout (graphviz, for example)
Extending an existing programming language
Program analysis and improvement tools
8/13/2019 01_OverviewCompilers
5/21
5
Dynamic Structure of a Compiler
character stream v a l = 01 * v a l + i
lexical analysis (scanning)
token stream 1ident
"val"
3assign
-
2number
10
4times
-
1ident
"val"
5plus
-
1ident
"i"
token number
token value
syntax analysis (parsing)
syntax tree
ident = number * ident + ident
Term
Expression
Statement
Front end
(analysis)
8/13/2019 01_OverviewCompilers
6/21
6
Dynamic Structure of a Compiler
semantic analysis (type checking, ...)
syntax tree
ident = number * ident + ident
Term
Expression
Statement
intermediate
representationsyntax tree, symbol table, or three address code (TAC) ...
optimization
code generation
const 10load 1
mul...
machine code
Front end
Back end
(synthesis)
8/13/2019 01_OverviewCompilers
7/21
7
Compiler versus Interpreter
Compiler translates to machine code
scanner parser ... code generator loader
source code machine code
Variant: interpretation of intermediate code
... compiler ...
source code intermediate code
(e.g. Java bytecode)
VM source code is translated into the
code of a virtual machine(VM)
VM interprets the code
simulating the physical machine
Interpreter executes source code "directly"
scanner parser
source code interpretation
statements in a loop are
scanned and parsed
again and again
8/13/2019 01_OverviewCompilers
8/21
8
Static Structure of a Compiler
parser &
sem. analysis
scanner
symbol table
code generation
provides tokens from
the source code
maintains information about
declared names and types
generates machine code
"main program"
directs the whole compilation
uses
data flow
8/13/2019 01_OverviewCompilers
9/21
9
Lexical Analysis
Stream of characters is grouped into tokens
Examples of tokens are identifiers, reserved words, integers, doubles orfloats, delimiters, operators and special symbols
int a;
a = a + 2;
int reserved worda identifier; special symbola identifier
= operatora identifier+ operator2 integer constant; special symbol
8/13/2019 01_OverviewCompilers
10/21
8/13/2019 01_OverviewCompilers
11/21
8/13/2019 01_OverviewCompilers
12/21
12
Intermediate Code Generation
An intermediate code representation often helps contain
complexity of compiler and discover code optimizations.
Typical choices include:
Annotated parse trees
Three Address Code (TAC), and abstract machine language
Bytecode, as in Java bytecode.
Example statements:
if (a b
if_t1 gotoL0_t2 = ac
a = _t2
L0: _t3 = b * c
C = _t3
8/13/2019 01_OverviewCompilers
13/21
13
Intermediate Code Generation (cont'd)
Example statements:
if (a )
v1 v3store(v1)
v2 v3 * store(v3)
Java bytecode (javap -c):
55: iload_1
56: iload_2
57: if_icmpgt 64
60: iload_1
61: iload_3
62: isub
63: istore_1
64: iload_2
65: iload_3
66: imul
67: istore_3
8/13/2019 01_OverviewCompilers
14/21
14
Code Optimization
Compiler converts the intermediate representation to another
one that attempts to be smaller and faster.
Typical optimizations:
Inhibit code generation for unreachable segments
Getting rid of unused variables
Eliminating multiplication by 1 and addition by 0
Loop optimization: e.g. removing statements not modified in the
loop
Common sub-expression elimination . . .
8/13/2019 01_OverviewCompilers
15/21
15
Object Code Generation
The target program is generated in the machine language of
the target architecture.
Memory locations are selected for each variable
Instructions are chosen for each operation
Individual tree nodes or TAC is translated into a sequence ofmachine language instructions that perform the same task
Typical machine language instructions include things like
Load register
Add register to memory location Store register to memory
. . .
8/13/2019 01_OverviewCompilers
16/21
8/13/2019 01_OverviewCompilers
17/21
17
Symbol Table
Symbol table management is a part of the compiler that
interacts with several of the phases
Identifiers are found in lexical analysis and placed in the symbol
table
During syntactical and semantical analysis, type and scopeinformation is added
During code generation, type information is used to determine what
instructions to use
During optimization, the live analysis may be kept in the symbol
table
8/13/2019 01_OverviewCompilers
18/21
18
Error Handling
Error handling and reporting also occurs across many phases
Lexical analyzer reports invalid character sequences
Syntactic analyzer reports invalid token sequences
Semantic analyzer reports type and scope errors, and the like
The compiler may be able to continue with some errors, butother errors may stop the process
8/13/2019 01_OverviewCompilers
19/21
8/13/2019 01_OverviewCompilers
20/21
20
Example MicroJava ProgramprogramP
finalint size = 10;
classTable {int[] pos;int[] neg;
}
Table val;{voidmain()
int x, i;{ //---------- initialize val ----------val = newTable;val.pos = newint[size];val.neg = newint[size];i = 0;while(i < size) {
val.pos[i] = 0; val.neg[i] = 0; i = i + 1;}//---------- read values ----------read(x);while(x != 0) {
if(x > 0) val.pos[x] = val.pos[x] + 1;else if(x < 0) val.neg[-x] = val.neg[-x] + 1;read(x);
}}
}
main program; no separate compilation
classes (without methods)
global variables
local variables
8/13/2019 01_OverviewCompilers
21/21
21
References
Original slides: Nancy McCracken.
Niklaus Wirth, Compiler Construction, chapters 1 and 2
Course notes from H. Mossenback, System Specification and Compiler
Construction, http://www.ssw.uni-linz.ac.at/Misc/CC/
Also notes on MicroJava Course notes from Jerry Cain, Compilers,
http://www.stanford.edu/class/cs143/
General references:
Aho, A., Lam, M., Sethi, R., Ullman, J., Compilers: Principles,
Techniques and Tools, 2ndEdition, Addison-Wesley, 2006. Steven Muchnik, Advanced Compiler Design and Implementation,
Morgan-Kaufmann, 1997.
Keith Cooper and Linda Torczon, Engineering a Compiler,
Morgan-Kaufmann, 2003.
http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.stanford.edu/class/cs143/http://www.stanford.edu/class/cs143/http://www.stanford.edu/class/cs143/http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.ssw.uni-linz.ac.at/Misc/CC/http://www.ssw.uni-linz.ac.at/Misc/CC/