112
Compilation 0368-3133 Lecture 1: Introduction Noam Rinetzky 1

0368-3133 Lecture 1: Introduction Noam Rinetzkymaon/teaching/2016-2017/compilation/compilation... · High Level Programming Languages • Imperative Algol, PL1, Fortran, Pascal, Ada,

  • Upload
    lamcong

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Compilation0368-3133

Lecture1:Introduction

NoamRinetzky

1

2

Admin

• Lecturer:NoamRinetzky– [email protected]– http://www.cs.tau.ac.il/~maon

• T.A.:OrenIsh Shalom

• Textbooks:– ModernCompilerDesign– Compilers:principles,techniquesandtools

3

Admin

• CompilerProject50%– 4-4.5practicalexercises– Groupsof3

• Finalexam50%– mustpass

4

CourseGoals

• Whatisacompiler• Howdoesitwork• (Reusable)techniques&tools

5

CourseGoals

• Whatisacompiler• Howdoesitwork• (Reusable)techniques&tools

• Programminglanguageimplementation– runtimesystems

• Executionenvironments– Assembly,linkers,loaders,OS

6

Introduction

Compilers:principles,techniquesandtools

7

WhatisaCompiler?

8

WhatisaCompiler?

“Acompilerisacomputerprogramthattransforms sourcecodewritteninaprogramminglanguage(sourcelanguage)intoanotherlanguage(targetlanguage).Themostcommonreasonforwantingtotransformsourcecodeistocreateanexecutableprogram.”

--Wikipedia

9

WhatisaCompiler?source language target language

Compiler

Executable code

exe

Sourcetext

txt

10

WhatisaCompiler?

Executable code

exe

Sourcetext

txt

Compiler

int a, b;a = 2;b = a*2 + 1;

MOV R1,2SAL R1INC R1MOV R2,R1

11

WhatisaCompiler?source language target language

CC++

PascalJava

PostscriptTeX

PerlJavaScript

PythonRuby

Prolog

LispScheme

MLOCaml

IA32IA64

SPARC

CC++

PascalJava

Java Bytecode…

Compiler

12

HighLevelProgrammingLanguages• Imperative Algol,PL1,Fortran,Pascal,Ada,Modula,C

– Closelyrelatedto“vonNeumann”Computers

• Object-orientedSimula,Smalltalk,Modula3,C++,Java,C#,Python– Dataabstractionand‘evolutionary’formofprogramdevelopment• Classanimplementationofanabstractdatatype(data+code)• ObjectsInstancesofaclass• Inheritance+generics

• Functional Lisp,Scheme,ML,Miranda,Hope,Haskel,OCaml,F#

• LogicProgramming Prolog 13

MoreLanguages• HardwaredescriptionlanguagesVHDL

– TheprogramdescribesHardwarecomponents– Thecompilergenerateshardwarelayouts

• GraphicsandTextprocessingTeX,LaTeX,postscript– Thecompilergeneratespagelayouts

• ScriptinglanguagesShell,C-shell,Perl– Includeprimitivesconstructsfromthecurrentsoftwareenvironment

• Web/Internet HTML,Telescript,JAVA,Javascript• Intermediate-languagesJavabytecode,IDL

14

HighLevelProg. Lang.,Why?

15

HighLevelProg. Lang.,Why?

16

Compilervs.Interpreter

17

Compiler

• Aprogramwhichtransforms programs• Inputaprogram(P)• Outputanobjectprogram(O)

– Foranyx,“O(x)”“=““P(x)”Compiler

Sourcetext

txt

Executable code

exe

P O 18

CompilingCtoAssembly

Compilerint x;scanf(“%d”, &x);x = x + 1 ;printf(“%d”, x);

add %fp,-8, %l1mov %l1, %o1call scanfld [%fp-8],%l0add %l0,1,%l0st %l0,[%fp-8]ld [%fp-8], %l1mov %l1, %o1call printf

5

6

19

Interpreter

• Aprogramwhichexecutes aprogram• Inputaprogram(P)+itsinput(x)• Outputthecomputedoutput(P(x))

Interpreter

Sourcetext

txt

Input

Output

20

Interpreting(running).py programs

• Aprogramwhichexecutes aprogram• Inputaprogram(P)+itsinput(x)• Outputthecomputedoutput(“P(x)”)

Interpreter

5

int x;scanf(“%d”, &x);x = x + 1 ;printf(“%d”, x);

6

21

Compilervs.InterpreterSource

Code

Executable

Code Machine

Source

Code

Intermediate

Code Interpreter

preprocessing

processingpreprocessing

processing

22

Compiledprogramsareusuallymoreefficientthaninterpreted

scanf(“%d”,&x);y = 5 ;z = 7 ;x = x + y * z;printf(“%d”,x);

add %fp,-8, %l1mov %l1, %o1call scanfmov 5, %l0st %l0,[%fp-12]mov 7,%l0st %l0,[%fp-16]ld [%fp-8], %l0ld [%fp-8],%l0add %l0, 35 ,%l0st %l0,[%fp-8]ld [%fp-8], %l1mov %l1, %o1 call printf

Compiler

23

Compilersreportinput-independentpossible errors• Input-program

• Compiler-Output– “line 88: x may be used before set''

scanf(“%d”, &y);if (y < 0)

x = 5;...if (y <= 0)

z = x + 1;

24

Interpretersreportinput-specificdefinite errors

• Input-program

• Inputdata– y=-1– y=0

scanf(“%d”, &y);if (y < 0)

x = 5;...if (y <= 0)

z = x + 1;

25

Interpretervs.Compiler

• Conceptuallysimpler– “define”theprog.lang.

• Canprovidemorespecificerrorreport

• Easiertoport

• Fasterresponsetime

• [Moresecure]

• Howdoweknowthetranslationiscorrect?

• Canreporterrorsbeforeinputisgiven

• Moreefficientcode– Compilationcanbeexpensive– movecomputationsto

compile-time• compile-time+execution-time

<interpretation-timeispossible

26

ConcludingRemarks

• Bothcompilersandinterpretersareprogramswritteninhighlevellanguage

• Compilersandinterpreterssharefunctionality

• Inthiscoursewefocusoncompilers

27

Everythingshouldbebuilttop-down,exceptthefirsttime

-- AlanJ.Perlis

28

AlanJayPerlis (1922– 1990)apioneerinthefieldofprogramminglanguageandcompilers,firstrecipientoftheTuringAward

Ex0:ASimpleInterpreter

29

Toycompiler/interpreter

• Trivialprogramminglanguage• Stackmachine• Compiler/interpreterwritteninC• Demonstratethebasicsteps

• Textbook:ModernCompilerDesign1.2

30

ConceptualStructureofaCompiler

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend(synthesis)

Compiler

Frontend(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

IntermediateRepresentation

(IR)

Code

Generation

31

StructureoftoyCompiler/interpreter

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 32

SourceLanguage

• Fullyparameterizedexpressions• Argumentscanbeasingledigit

ü (4+(3*9))✗3+4+5✗(12+3)

expression® digit|‘(‘ expressionoperatorexpression‘)’operator® ‘+’ |‘*’digit® ‘0’ |‘1’ |‘2’ |‘3’ |‘4’ |‘5’ |‘6’ |‘7’ |‘8’ |‘9’

33

Theabstractsyntaxtree(AST)

• Intermediateprogramrepresentation• Definesatree

– Preservesprogramhierarchy• Generatedbytheparser• Keywordsandpunctuationsymbolsarenotstored– Notrelevantoncethetreeexists

34

Concretesyntaxtree# for5*(a+b)

expression

number expression‘*’

identifier

expression‘(’ ‘)’

‘+’ identifier

‘a’ ‘b’

‘5’

#Parse tree 35

AbstractSyntaxtreefor5*(a+b)

‘*’

‘+’

‘a’ ‘b’

‘5’

36

AnnotatedAbstractSyntaxtree

‘*’

‘+’

‘a’ ‘b’

‘5’

type:real

loc: reg1

type:real

loc: reg2

type:real

loc: sp+8

type:real

loc: sp+24

type:integer

37

Driverforthetoycompiler/interpreter

#include "parser.h" /* for type AST_node */#include "backend.h" /* for Process() */#include "error.h" /* for Error() */

int main(void) {AST_node *icode;

if (!Parse_program(&icode)) Error("No top-level expression");Process(icode);

return 0;}

38

StructureoftoyCompiler/interpreter

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 39

LexicalAnalysis

• Partitionstheinputsintotokens– DIGIT– EOF– ‘*’– ‘+’– ‘(‘– ‘)’

• Eachtokenhasitsrepresentation• Ignoreswhitespaces 40

lex.h:HeaderFileforLexicalAnalysis

/* Define class constants */

/* Values 0-255 are reserved for ASCII characters */

#define EoF 256

#define DIGIT 257

typedef struct {

int class;

char repr;} Token_type;

extern Token_type Token;

extern void get_next_token(void);

41

#include "lex.h" token_type Token; // Global variable

void get_next_token(void) {int ch;do {

ch = getchar();if (ch < 0) {

Token.class = EoF; Token.repr = '#';return;

}} while (Layout_char(ch));if ('0' <= ch && ch <= '9') {Token.class = DIGIT;}else {Token.class = ch;}Token.repr = ch;

}

static int Layout_char(int ch) {switch (ch) {

case ' ': case '\t': case '\n': return 1;default: return 0;

}}

LexicalAnalyzer

42

StructureoftoyCompiler/interpreter

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 43

Parser

• Invokeslexicalanalyzer• Reportssyntaxerrors• ConstructsAST

44

ParserHeaderFile

typedef int Operator;

typedef struct _expression {

char type; /* 'D' or 'P' */

int value; /* for 'D' type expression */

struct _expression *left, *right; /* for 'P' type expression */

Operator oper; /* for 'P' type expression */

} Expression;

typedef Expression AST_node; /* the top node is an Expression */

extern int Parse_program(AST_node **);

45

ASTfor(2*((3*4)+9))P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

346

ASTfor(2*((3*4)+9))P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

347

DriverfortheToyCompiler

#include "parser.h" /* for type AST_node */#include "backend.h" /* for Process() */#include "error.h" /* for Error() */

int main(void) {AST_node *icode;

if (!Parse_program(&icode)) Error("No top-level expression");Process(icode);

return 0;}

48

SourceLanguage

• Fullyparenthesizedexpressions• Argumentscanbeasingledigit

ü (4+(3*9))✗3+4+5✗(12+3)

expression® digit|‘(‘ expressionoperatorexpression‘)’operator® ‘+’ |‘*’digit® ‘0’ |‘1’ |‘2’ |‘3’ |‘4’ |‘5’ |‘6’ |‘7’ |‘8’ |‘9’

49

lex.h:HeaderFileforLexicalAnalysis

/* Define class constants */

/* Integers are used to encode characters + special codes */

/* Values 0-255 are reserved for ASCII characters */

#define EoF 256

#define DIGIT 257

typedef struct {

int class;

char repr;} Token_type;

extern Token_type Token;

extern void get_next_token(void);

50

#include "lex.h" token_type Token; // Global variable

void get_next_token(void) {int ch;do {

ch = getchar();if (ch < 0) {

Token.class = EoF; Token.repr = '#’; return;}} while (Layout_char(ch));

if ('0' <= ch && ch <= '9') Token.class = DIGIT;

else Token.class = ch;

Token.repr = ch;}

static int Layout_char(int ch) {switch (ch) {

case ' ': case '\t': case '\n': return 1;default: return 0;

}}

LexicalAnalyzer

51

ASTfor(2*((3*4)+9))P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

352

DriverfortheToyCompiler

#include "parser.h" /* for type AST_node */#include "backend.h" /* for Process() */#include "error.h" /* for Error() */

int main(void) {AST_node *icode;

if (!Parse_program(&icode)) Error("No top-level expression");Process(icode);

return 0;}

53

ParserEnvironment#include "lex.h”, "error.h”, "parser.h"

static Expression *new_expression(void) {return (Expression *)malloc(sizeof (Expression));

}

static int Parse_operator(Operator *oper_p);static int Parse_expression(Expression **expr_p);int Parse_program(AST_node **icode_p) {

Expression *expr;get_next_token(); /* start the lexical analyzer */if (Parse_expression(&expr)) {

if (Token.class != EoF) {Error("Garbage after end of program");

}*icode_p = expr;return 1;

}return 0;

} 54

Top-DownParsing• Optimisticallybuildthetreefromtheroottoleaves• ForeveryP® A1A2…An|B1B2…Bm

– IfA1succeeds• IfA2succeeds&A3succeeds&…• Elsefail

– ElseifB1succeeds• IfB2succeeds&B3succeeds&..• Elsefail

– Elsefail

• Recursivedescentparsing– Simplified:nobacktracking

• Canbeappliedforcertaingrammars55

static int Parse_expression(Expression **expr_p) {Expression *expr = *expr_p = new_expression();if (Token.class == DIGIT) {

expr->type = 'D'; expr->value = Token.repr - '0';get_next_token(); return 1;

}if (Token.class == '(') {

expr->type = 'P'; get_next_token();if (!Parse_expression(&expr->left)) { Error("Missing expression"); }if (!Parse_operator(&expr->oper)) { Error("Missing operator"); }if (!Parse_expression(&expr->right)) { Error("Missing expression"); }if (Token.class != ')') { Error("Missing )"); }get_next_token();return 1;

}/* failed on both attempts */free_expression(expr); return 0;

}

Parser

static int Parse_operator(Operator *oper) {if (Token.class == '+') {

*oper = '+'; get_next_token(); return 1;}if (Token.class == '*') {

*oper = '*'; get_next_token(); return 1;}return 0;

}56

ASTfor(2*((3*4)+9))

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3 57

StructureoftoyCompiler/interpreter

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 58

SemanticAnalysis

• Trivialinourcase• Noidentifiers• Noprocedure/functions• Asingletypeforallexpressions

59

StructureoftoyCompiler/interpreter

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 60

IntermediateRepresentation

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3 61

AlternativeIR:3-AddressCode

L1:_t0=a_t1=b_t2=_t0*_t1_t3=d_t4=_t2-_t3GOTO L1

“Simple Basic-like programming language”62

StructureoftoyCompiler/interpreter

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 63

Codegeneration

• Stackbasedmachine• Fourinstructions

– PUSHn– ADD– MULT– PRINT

64

Codegeneration#include "parser.h" #include "backend.h" static void Code_gen_expression(Expression *expr) {

switch (expr->type) {case 'D':

printf("PUSH %d\n", expr->value);break;

case 'P':Code_gen_expression(expr->left);Code_gen_expression(expr->right);switch (expr->oper) {case '+': printf("ADD\n"); break;case '*': printf("MULT\n"); break;}break;

}}void Process(AST_node *icode) {

Code_gen_expression(icode); printf("PRINT\n");} 65

Compiling(2*((3*4)+9))

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3 66

ExecutingCompiledProgram

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 67

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack Stack’

2

68

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

3

2

Stack

2

69

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

4

3

2

Stack

3

2

70

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

12

2

Stack

4

3

2

71

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

9

12

2

Stack

12

2

72

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

21

2

Stack

9

12

2

73

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’

42

Stack

21

2

74

GeneratedCodeExecution

PUSH 2

PUSH 3

PUSH 4

MULT

PUSH 9

ADD

MULT

PRINT

Stack’Stack

42

75

Shortcuts

• Avoidgeneratingmachinecode– Useassembler

• GenerateCcode

76

StructureoftoyCompiler/interpreter

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend (synthesis)Frontend

(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

(NOP)

IntermediateRepresentation

(AST)

Code

Generation

Execution

Engine

Execution Engine Output*

* Programs in our PL do not take input 77

Interpretation

• Bottom-upevaluationofexpressions• Thesameinterfaceofthecompiler

78

#include "parser.h" #include "backend.h”

static int Interpret_expression(Expression *expr) {switch (expr->type) {case 'D':

return expr->value;break;

case 'P': int e_left = Interpret_expression(expr->left);int e_right = Interpret_expression(expr->right);switch (expr->oper) {case '+': return e_left + e_right;case '*': return e_left * e_right;break;

}}

void Process(AST_node *icode) {printf("%d\n", Interpret_expression(icode));

}79

Interpreting(2*((3*4)+9))

P

*oper

typeleft right

P+

P

*

D

2

D

9

D

4D

3 80

Summary:Journeyinsideacompiler

LexicalAnalysis

Syntax Analysis

Sem.Analysis

Inter.Rep.

Code Gen.

x = b*b – 4*a*c

txt

<ID,”x”> <EQ> <ID,”b”> <MULT> <ID,”b”> <MINUS> <INT,4> <MULT> <ID,”a”> <MULT> <ID,”c”>

TokenStream

81

LexicalAnalysis

Syntax Analysis

Sem.Analysis

Inter.Rep.

Code Gen.

<ID,”x”><EQ><ID,”b”><MULT><ID,”b”><MINUS><INT,4><MULT><ID,”a”><MULT><ID,”c”>

‘b’ ‘4’

‘b’‘a’

‘c’

ID

ID

ID

ID

ID

factor

term factorMULT

term

expression

expression

factor

term factorMULT

term

expression

term

MULT factor

MINUS

SyntaxTree

Summary:Journeyinsideacompiler

82

Statement

‘x’

ID EQ

LexicalAnalysis

Syntax Analysis

Sem.Analysis

Inter.Rep.

Code Gen.

‘b’ ‘4’

‘b’‘a’

‘c’

ID

ID

ID

ID

ID

factor

term factorMULT

term

expression

expression

factor

term factorMULT

term

expression

term

MULT factor

MINUS

SyntaxTree

Summary:Journeyinsideacompiler

83

<ID,”x”><EQ><ID,”b”><MULT><ID,”b”><MINUS><INT,4><MULT><ID,”a”><MULT><ID,”c”>

Sem.Analysis

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

LexicalAnalysis

Syntax Analysis

AbstractSyntaxTree

Summary:Journeyinsideacompiler

84

LexicalAnalysis

Syntax Analysis

Sem.Analysis

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

type:intloc:sp+8

type:intloc:const

type:intloc:sp+16

type:intloc:sp+16

type:intloc:sp+24

type:intloc:R2

type:intloc:R2

type:intloc:R1

type:intloc:R1

AnnotatedAbstractSyntaxTree

Summary:Journeyinsideacompiler

85

Journeyinsideacompiler

LexicalAnalysis

Syntax Analysis

Sem.Analysis

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

type:intloc:sp+8

type:intloc:const

type:intloc:sp+16

type:intloc:sp+16

type:intloc:sp+24

type:intloc:R2

type:intloc:R2

type:intloc:R1

type:intloc:R1

R2=4*aR1=b*bR2=R2*cR1=R1-R2

IntermediateRepresentation

86

Journeyinsideacompiler

Inter.Rep.

Code Gen.

‘b’

‘4’

‘b’

‘a’

‘c’

MULT

MULT

MULT

MINUS

type:intloc:sp+8

type:intloc:const

type:intloc:sp+16

type:intloc:sp+16

type:intloc:sp+24

type:intloc:R2

type:intloc:R2

type:intloc:R1

type:intloc:R1

R2=4*aR1=b*bR2=R2*cR1=R1-R2

MOVR2,(sp+8)SALR2,2MOVR1,(sp+16)MULR1,(sp+16)MULR2,(sp+24)SUBR1,R2

LexicalAnalysis

Syntax Analysis

Sem.Analysis

IntermediateRepresentation

AssemblyCode

87

ErrorChecking

• Ineverystage…

• Lexicalanalysis:illegaltokens• Syntaxanalysis:illegalsyntax• Semanticanalysis:incompatibletypes,undefinedvariables,…

• Everyphasetriestorecoverandproceedwithcompilation(why?)– Divergenceisachallenge

88

TheRealAnatomyofaCompiler

Executable code

exe

Sourcetext

txtLexicalAnalysis

Sem.Analysis

Process text input

characters SyntaxAnalysistokens AST

Intermediate code

generation

Annotated AST

Intermediate code

optimizationIR Code

generationIR

Target code optimization

Symbolic Instructions

SI Machine code generation

Write executable

output

MI

89

Optimizations• “Optimalcode”isoutofreach

– manyproblemsareundecidable ortooexpensive(NP-complete)– Useapproximationand/orheuristics

• Loopoptimizations:hoisting,unrolling,…• Peepholeoptimizations• Constantpropagation

– Leveragecompile-timeinformationtosaveworkatruntime(pre-computation)

• Deadcodeelimination– space

• …

90

Machinecodegeneration

• Registerallocation– OptimalregisterassignmentisNP-Complete– Inpractice,knownheuristicsperformwell

• assignvariablestomemorylocations• Instructionselection

– ConvertIRtoactualmachineinstructions

• Modernarchitectures– Multicores– Challengingmemoryhierarchies

91

AndonaMoreGeneralNote

92

CourseGoals

• Whatisacompiler• Howdoesitwork• (Reusable)techniques&tools

• Programminglanguageimplementation– runtimesystems

• Executionenvironments– Assembly,linkers,loaders,OS

93

ToCompilers,andBeyond…

• Compilerconstructionissuccessful– Clearproblem– Properstructureofthesolution– Judicioususeofformalisms

• Widerapplication– Manyconversionscanbeviewedascompilation

• Usefulalgorithms94

ConceptualStructureofaCompiler

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend(synthesis)

Compiler

Frontend(analysis)

95

ConceptualStructureofaCompiler

Executable code

exe

Sourcetext

txt

SemanticRepresentation

Backend(synthesis)

Compiler

Frontend(analysis)

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

IntermediateRepresentation

(IR)

Code

Generation

96

Judicioususeofformalisms

• Regularexpressions(lexicalanalysis)• Context-freegrammars(syntacticanalysis)• Attributegrammars(contextanalysis)• Codegeneratorgenerators(dynamicprogramming)

• Butalsosomenitty-grittyprogramming

LexicalAnalysis

Syntax Analysis

Parsing

Semantic Analysis

IntermediateRepresentation

(IR)

Code

Generation

97

Useofprogram-generatingtools

• Partsofthecompilerareautomaticallygeneratedfromspecification

Streamoftokens

Jlex

regularexpressions

inputprogram scanner

98

Useofprogram-generatingtools

• Partsofthecompilerareautomaticallygeneratedfromspecification

Jcup

Contextfreegrammar

Streamoftokens parser Syntaxtree

99

Useofprogram-generatingtools• Simplercompilerconstruction

– Lesserrorprone– Moreflexible

• Useofpre-cannedtailoredcode– Useofdirtyprogramtricks

• Reuseofspecification

toolspecification

input (generated) code

output100

CompilerConstructionToolset

• Lexicalanalysisgenerators– Lex,JLex

• Parsergenerators– Yacc,Jcup

• Syntax-directedtranslators• Dataflowanalysisengines

101

Wideapplicability

• Structureddatacanbeexpressedusingcontextfreegrammars– HTMLfiles– Postscript– Tex/dvifiles– …

102

Generallyusefulalgorithms

• Parsergenerators• Garbagecollection• Dynamicprogramming• Graphcoloring

103

Howtowriteacompiler?

104

Howtowriteacompiler?

L1CompilerExecutablecompiler

exe

L2Compilersource

txtL1

105

Howtowriteacompiler?

L1CompilerExecutablecompiler

exe

L2Compilersource

txtL1

L2CompilerExecutableprogram

exe

Programsource

txtL2

=

106

Howtowriteacompiler?

L1CompilerExecutablecompiler

exe

L2Compilersource

txtL1

L2CompilerExecutableprogram

exe

Programsource

txtL2

=

107

ProgramOutput

Y

Input

X

=

107

Bootstrappingacompiler

108

Bootstrappingacompiler

L1Compilersimple

L2executablecompiler

exeSimple

L2compilersource

txtL1

L2s CompilerInefficientadv.

L2executablecompiler

exeadvanced

L2compilersource

txtL2

L2CompilerEfficientadv.

L2executablecompiler

Yadvanced

L2compilersource

X

=

=

109

ProperDesign

• Simplifythecompilationphase– Portabilityofthecompilerfrontend– Reusabilityofthecompilerbackend

• Professionalcompilersareintegrated

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

Java

C

Pascal

C++

ML

Pentium

MIPS

Sparc

IR

110

Modularity

SourceLanguage 1

txt

SemanticRepresentation

BackendTL2

FrontendSL2

int a, b;a = 2;b = a*2 + 1;

MOV R1,2SAL R1INC R1MOV R2,R1

FrontendSL3

FrontendSL1

BackendTL1

BackendTL3

SourceLanguage 1

txt

SourceLanguage 1

txt

Executabletarget 1

exe

Executabletarget 1

exe

Executabletarget 1

exe

SET R1,2STORE #0,R1SHIFT R1,1STORE #1,R1ADD R1,1STORE #2,R1

111

112