View
221
Download
1
Tags:
Embed Size (px)
Citation preview
CSE4100Compiler
(a.k.a. Programming Language Translation)
CSE4100Compiler
(a.k.a. Programming Language Translation)
Notes credit go to
Me CSE
Laurent Michel CSE
Aggelos Kiayias CSE
Steven Demurjian CSE
Robert La Barre UTRC
2CSE244 Compilers
Overview• Objectives
• Structure of the course
• Evaluation
• Compiler Introduction– A compiler bird’s eye view
• Lexical analysis
• Parsing (Syntax analysis)
• Semantic analysis
• Code generation
• Optimization
3CSE244 Compilers
Information• Course web page
– Will be on HuskyCT
– Currently• http://www.engr.uconn.edu/~weiwei/
• Instructor– Wei Wei
• Office hour– TuTh 3:30pm ~ 4:30pm
– Wednesday 1:30pm ~ 4:30pm
4CSE244 Compilers
Objectives• Compilers are….
– Ubiquitous in Computer Science
– Central to symbolic processing
– Relates to• Theory of computing
• Practice of computing
• Programming languages
• Operating Systems
• Computer Architecture
6CSE244 Compilers
Translate… Why?• Languages offer
– Abstractions
– At different levels• From low
– Good for machines….
• To high– Good for humans….
Let the computerDo the heavy lifting.
8CSE244 Compilers
Interpreter• Motivation…
– Easiest to implement!
• Upside ?
• Downside ?
• Phases– Lexical analysis
– Parsing
– Semantic checking
– Interpretation
9CSE244 Compilers
Compiler• Motivation
– It is natural!
• Upside?
• Downside?
• Phases– [Interpreter]
– Code Generation
– Code Optimization
– Link & load
11
CSE244 Compilers
Objectives• Learn about compilers because…
– It helps to better program– Understand the tools of the trade– Many languages are compiled
• Programming Languages [C,C#,ML,LISP,…]• Communication Languages [XML,HTML,…]• Presentation Languages [CSS,SGML,…]• Hardware Languages [VHDL,…]• Formatting Languages [Postscript,troff,LaTeX,…]• Query Languages [SQL & friends]
– Many assistive tools use compiler technology…• Syntax highlighting• Type assist - type completion
– You may write/use compiler technology!
12
CSE244 Compilers
Overview• Objectives
• Structure of the course
• Evaluation
• Compiler Introduction– A compiler bird’s eye view
• Lexical analysis
• Parsing
• Semantic analysis
• Code generation
• Optimization
13
CSE244 Compilers
Course structure• A reflection of the topic
– Lexical analysis
– Parsing
– Semantic analysis
– Midterm
– Runtime structures
– Intermediate code generation
– Machine code generation
– Optimization
– Final
14
CSE244 Compilers
Evaluation• Course evaluation
– Six homeworks (50%)• 5 mandatory
• 1 extra credit
– One midterm (20%)
– One final (30%)
• Exams– Open book
15
CSE244 Compilers
Six Homeworks• One project to rule them all...
• Purpose– First Hand Experience with compiler technology
• Six Homeworks are connected– Scanning
– Parsing
– Analysis
– IR Code Generation
– IR Optimization
– Machine Code Generation
And in the Darkness Bind You...
17
CSE244 Compilers
C--• The Object Oriented Language For Dummies
– C-- supports• Classes
• Inheritance
• Polymorphism
• 2 basic types
• Arrays
• Simple control primitives– if-then-else
– while
18
CSE244 Compilers
C-- Exampleclass Foo {
int fact(int n) {return 0;
}int fib(int x) {
return 1;}
};
class Main extends Foo {Main() {
int x;x = fact(5);}
int fact(int n) {if (n==0)
return 1;else
return n * fact(n-1);}
};
class Foo {int fact(int n) {
return 0;}int fib(int x) {
return 1;}
};
class Main extends Foo {Main() {
int x;x = fact(5);}
int fact(int n) {if (n==0)
return 1;else
return n * fact(n-1);}
};
19
CSE244 Compilers
The Target Language• Something realistic...
• Something challenging...
• Something useful...
It’s True!We will generate code for a Pentium Class
Machine Running EithterWindows or Linux
20
CSE244 Compilers
But...... • Won’t that be hard?
• No!– My C-- compiler
• 6000 lines of code
• Written in < 10 days
– Your C-- compiler• Will use some of my code...
21
CSE244 Compilers
Really ?global mainextern printfextern mallocsection .data
D_0: dd block_0,block_1D_1: dd block_3,block_1,block_2
section .textmain:push 4 ; push constantcall malloc ; call to C functionadd esp,4 ; pop argumentmov [eax], dword D_1 ; move source into destpush eax ; push variablemov eax, dword [eax] ; get the VTBLmov eax, dword [eax+8] ; get the k=2 methodcall eax ; call the methodadd esp,4 ; pop argsret
block_0:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,8 ; reserve space mov eax, dword 0 ; write result in output registermov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from functionblock_1:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,8 ; reserve space formov eax, dword 1 ; write result in output registermov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from function
block_3:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,20 ; reserve spacemov eax, dword [ebp+8] ; move o2
cmp eax,0 ; do the operationmov eax,0sete ahcmp eax,0 ; compare to 0 to set CCjz block_5 ; transfer to true block
block_4:mov eax, dword 1 ; write result in output registerjmp block_6 ; transfer controlblock_5:mov eax, dword [ebp+8] ; move o2sub eax,1 ; do the operationpush eax ; push variablemov ebx, dword [ebp+4] ; get argument in registerpush ebx ; push variablemov ebx, dword [ebx] ; get the VTBLmov ebx, dword [ebx] ; get the k=0 methodcall ebx ; call the methodadd esp,8 ; pop argsmov ecx, dword [ebp+8] ; move o2imul ecx,eax ; do the operationmov eax, dword ecx ; write result in output register
block_6:mov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from function
block_2:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,12 ; reserve spacepush 5 ; push constantmov eax, dword [ebp+4] ; get argument in registerpush eax ; push variablemov eax, dword [eax] ; get the VTBLmov eax, dword [eax] ; get the k=0 methodcall eax ; call the methodadd esp,8 ; pop argsmov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from function
This is NASM assembly This is NASM assembly generated for the generated for the C--C--
example shown earlier.example shown earlier.
22
CSE244 Compilers
Overview• Objectives
• Structure of the course
• Evaluation
• Compiler Introduction– A compiler bird’s eye view
• Lexical analysis
• Parsing
• Semantic analysis
• Code generation
• Optimization
23
CSE244 Compilers
Compiler Classes• Compilers Viewed from Many Perspectives
• However, All utilize same basic tasks to accomplish their actions
Single PassMultiple PassLoad & Go
Construction
DebuggingOptimizing
Functional
24
CSE244 Compilers
Compiler Structure• Two fundamental sets of issues
• Our Focus:– Both
AnalysisText analysisSyntactic analysisStructural analysis
Synthesis Program generation
Program optimization
25
CSE244 Compilers
Some Good news!• Tools do exist for
– Lexical and syntactic analysis
– Note: it was not always the case.• Structure / Syntax directed editors:
– Force “syntactically” correct code to be entered
• Pretty Printers: – Standardized version for program structure (i.e.,indenting)
• Static Checkers: – A “quick” compilation to detect rudimentary errors
• Interpreters: – “real” time execution of code a “line-at-a-time”
26
CSE244 Compilers
Phases of compilationSource Program
Lexical Analyzer1
Syntax Analyzer2
Semantic Analyzer3
Intermediate Code Generator
4
Code Optimizer5
Code Generator6
Target Program
Symbol-table Manager
Error Handler
1, 2, 3 : Analysis 4, 5, 6 : Synthesis
27
CSE244 Compilers
RelocatableSource Program
Pre-Processor1
Compiler2
Assembler3
RelocatableMachine Code
4
Loader Link/Editor5
Executable
Library,relocatable object files
28
CSE244 Compilers
AnalysisSource Program
Lexical Analyzer1
Syntax Analyzer2
Semantic Analyzer3
Intermediate Code Generator
4
Code Optimizer5
Code Generator6
Target Program
Error Handler
Language Analysis Phases
29
CSE244 Compilers
Lexical analysis• Purpose
– Slice the sequence of symbols into tokens
Date x := new Date ( System.today( ) + 30 ) ;
Id Date
Id x
Symbol :=
Keyword new
Symbol (
Id Date Id System
Symbol .
Id Today
Symbol (
Symbol )
Symbol +
Integer 30
Symbol )
Symbol ;
Id Date
Id x
Symbol :=
Keyword new
Symbol (
Id Date Id System
Symbol .
Id Today
Symbol (
Symbol )
Symbol +
Integer 30
Symbol )
Symbol ;
30
CSE244 Compilers
Syntax Analysis (parsing)• Purpose
– Organize tokens in sentences based on grammar
Symbol :=
Keyword new
Symbol (
Symbol .
Symbol (
Symbol )
Symbol +
Symbol )
Symbol ;
31
CSE244 Compilers
What is a grammar ?• Grammar is a Set of Rules Which Govern
– The Interdependencies &
– Structure Among the Tokens
statement is anassignment statement, or while statement, or if statement, or ...
assignment statement
is an identifier := expression ;
expression is an
(expression), or expression + expression, or expression * expression, or number, or identifier, or ...
33
CSE244 Compilers
Semantic Analysis• Purpose
– Determine Unique / Unambiguous Interpretation
– Catch errors related to program meaning
• Examples– Natural Language
• “John Took Picture of Mary Out on the Patio”
– Programming Language• Wrong types• Missing declaration• Missing methods• Ambiguous statements…
34
CSE244 Compilers
Semantic Analysis• Main task
– Type checking– Many Different Situations
– Primary tool• Symbol table
Real := int + char ;
A[int] := A[real] + int ;
while char <> int do
…. Etc.
36
CSE244 Compilers
AnalysisSource Program
Lexical Analyzer1
Syntax Analyzer2
Semantic Analyzer3
Intermediate Code Generator
4
Code Optimizer5
Code Generator6
Target Program
Error Handler
Synthesis Phases
37
CSE244 Compilers
Intermediate Code• What is intermediate code ?
– A low level representation
– A simple representation
– Easy to reason about
– Easy to manipulate
– Programming Language independent
– Hardware independent
38
CSE244 Compilers
IR Code example• Three-address code
– Three operands per instruction
– Each assignment instruction has at most one operator on the right side
– Instructions fix the order in which operations are to be done
– Generate a temporal name to hold value computed
• ExamplePosition = initial + rate *60
t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3
t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3id1 id2 id3
39
CSE244 Compilers
Code Optimization• Purpose
– Improve the intermediate code• Get rid of redundant / dead code
• Get rid of redundant computation
• Reorganize code
• Schedule instructions
• Factor out code
– Improve the machine code• Register allocation
• Instruction scheduling [for specific hardware]
40
CSE244 Compilers
Machine Code Generation• Purpose
– Generate code for the target architecture
– Code is relocatable
– Accounts for platforms specific• Registers
– IA32: 6 GP registers (int)
– MIPS: 32 GP registers (int)
• Calling convention– IA32: stack-based
– MIPS: register based
– Sparc: register sliding window based
41
CSE244 Compilers
Machine code generation• Pragmatics
– Generate assembly
– Let an assembler produced the binary code.
42
CSE244 Compilers
Assemblers• Assembly code: names are used for instructions,
and names are used for memory addresses.
• Two-pass Assembly:– First Pass: all identifiers are assigned to memory
addresses (0-offset)– e.g. substitute 0 for a, and 4 for b– Second Pass: produce relocatable machine code:
MOV a, R1
ADD #2, R1MOV R1, b
0001 01 00 00000000 *
0011 01 10 000000100010 01 00 00000100 *
relocationbit
43
CSE244 Compilers
Linker & Loader• Loader
– Input• Relocatable code in a file
– Output• Bring executable into virtual address space. Relocate
“shared” libs.
• Linker– Input
• Many relocatable machine code files (object files)
– Output• A single executable file with all the object file glued
together.– Need to relocate each object file as needed
44
CSE244 Compilers
Pre-processors• Purpose
– Macro processing– Performs text replacement (editing)– Performs file concatenation (#include)
• Example– In C, #define does not exist– In C, #define is handled by a preprocessor
#define X 3
#define Y A*B+C
#define Z getchar()
45
CSE244 Compilers
Modern Compilers• Motto
– Re-targetable compilers
• Organization– Front-end
• The analysis• The generation of intermediate code
– Flexible IR
– Back-end• The generation of machine code• The architecture-specific optimization
46
CSE244 Compilers
Modern Compiler Example• Gcc
– Front-ends for• C,C++,FORTRAN,Objective-C,Java
– IR• Static Single Assignment (SSA)
– Back-ends for• IA32, PPC, Ultra, MIPS, ARM, Itanium, DEC, Alpha, VMS,
…