41
Chapter 14: Building a Runnable Program

Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

Embed Size (px)

Citation preview

Page 1: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

Chapter 14: Building a Runnable Program

Page 2: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 2 -

Chapter 14: Building a runnable program

14.1 Back-End Compiler Structure

14.2 Intermediate Forms

14.3 Code Generation

14.4 Address Space Organization

Page 3: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 3 -

Where We Are...

Lexical Analysis

Syntax Analysis

Semantic Analysis

Intermediate Code Gen

Source code(character stream)

token stream

abstract syntaxtree

abstract syntaxtree + symboltables, types

Intermediate code

regular expressions

grammars

static semantics

Page 4: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 4 -

Chapter 14: Building a runnable program

14.1 Back-End Compiler Structure

14.2 Intermediate Forms

14.3 Code Generation

14.4 Address Space Organization

Page 5: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 5 -

Intermediate Representation (aka IR)

The compilers internal representation» Is language-independent and machine-

independent

AST IR

Pentium

Java bytecode

Itanium

TI C5x

ARM

optimize

Enables machine independentand machine dependent optis

Page 6: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 6 -

What Makes a Good IR?

Captures high-level language constructs» Easy to translate from AST» Supports high-level optimizations

Captures low-level machine features» Easy to translate to assembly» Supports machine-dependent optimizations

Narrow interface: small number of node types (instructions)» Easy to optimize» Easy to retarget

Page 7: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 7 -

Multiple IRs

Most compilers use 2 IRs:» High-level IR (HIR): Language independent but

closer to the language

» Low-level IR (LIR): Machine independent but closer to the machine

» A significant part of the compiler is both language and machine independent!

AST HIR

Pentium

Java bytecodeItaniumTI C5xARM

optimize

LIR

optimizeoptimize

C++C

Fortran

Page 8: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 8 -

High-Level IR

HIR is essentially the AST» Must be expressive for all input languages

Preserves high-level language constructs» Structured control flow: if, while, for, switch» Variables, expressions, statements, functions

Allows high-level optimizations based on properties of source language» Function inlining, memory dependence

analysis, loop transformations

Page 9: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 9 -

Low-Level IR

A set of instructions which emulates an abstract machine (typically RISC)

Has low-level constructs» Unstructured jumps, registers, memory

locations Types of instructions

» Arithmetic/logic (a = b OP c), unary operations, data movement (move, load, store), function call/return, branches

Page 10: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 10 -

Alternatives for LIR

3 general alternatives» Three-address code or quadruples

a = b OP c Advantage: Makes compiler analysis/opti easier

» Tree representation Was popular for CISC architectures Advantage: Easier to generate machine code

» Stack machine Like Java bytecode Advantage: Easier to generate from AST

Page 11: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 11 -

Three-Address Code

a = b OP c» Originally, because instruction had at most 3

addresses or operands This is not enforced today, ie MAC: a = b * c + d

» May have fewer operands Also called quadruples: (a,b,c,OP) Example

a = (b+c) * (-e) t1 = b + ct2 = -ea = t1 * t2

Compiler-generatedtemporary variable

Page 12: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 12 -

IR Instructions Assignment instructions

» a = b OP C (binary op) arithmetic: ADD, SUB,

MUL, DIV, MOD logic: AND, OR, XOR comparisons: EQ, NEQ,

LT, GT, LEQ, GEQ

» a = OP b (unary op) arithmetic MINUS,

logical NEG

» a = b : copy instruction

» a = [b] : load instruction

» [a] = b : store instruction

» a = addr b: symbolic address

Flow of control» label L: label instruction

» jump L: unconditional jump

» cjump a L : conditional jump

Function call» call f(a1, ..., an)

» a = call f(a1, ..., an)

IR describes the instruction set of an abstract machine

Page 13: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 13 -

IR Operands

The operands in 3-address code can be:» Program variables» Constants or literals» Temporary variables

Temporary variables = new locations» Used to store intermediate values» Needed because 3-address code not as

expressive as high-level languages

Page 14: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 14 -

Chapter 14: Building a runnable program

14.1 Back-End Compiler Structure

14.2 Intermediate Forms

14.3 Code Generation

14.4 Address Space Organization

Page 15: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 15 -

Translating High IR to Low IR

May have nested language constructs» E.g., while nested within an if statement

Need an algorithmic way to translate» Strategy for each high IR construct» High IR construct sequence of low IR

instructions Solution

» Start from the high IR (AST like) representation» Define translation for each node in high IR» Recursively translate nodes

Page 16: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 16 -

Notation

Use the following notation:» [[e]] = the low IR representation of high IR construct

e

[[e]] is a sequence of low IR instructions If e is an expression (or statement expression), it

represents a value» Denoted as: t = [[e]]

» Low IR representation of e whose result value is stored in t

For variable v: t = [[v]] is the copy instruction» t = v

Page 17: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 17 -

Translating Expressions

Binary operations: t = [[e1 OP e2]]» (arithmetic, logical operations and comparisons)

Unary operations: t = [[OP e]]

OP

e1 e2

t1 = [[e1]]t2 = [[e2]]t1 = t1 OP t2

OP

e1

t1 = [[e1]]t = OP t1

Page 18: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 18 -

Translating Array Accesses

Array access: t = [[ v[e] ]]» (type of e is array [T] and S = size of T)

t1 = addr vt2 = [[e]]t3 = t2 * St4 = t1 + t3t = [t4] /* ie load */

array

v e

Page 19: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 19 -

Translating Structure Accesses

Structure access: t = [[ v.f ]]» (v is of type T, S = offset of f in T)

t1 = addr vt2 = t1 + St = [t2] /* ie load */

struct

v f

Page 20: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 20 -

Translating Short-Circuit OR

Short-circuit OR: t = [[e1 SC-OR e2]]» e.g., || operator in C/C++

t = [[e1]]cjump t Lendt = [[e2]]Lend:

semantics:1. evaluate e12. if e1 is true, then done3. else evaluate e2

SC-OR

e1 e2

Page 21: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 21 -

Class Problem

Short-circuit AND: t = [[e1 SC-AND e2]]» e.g., && operator in C/C++

Semantics:1. Evaluate e12. if e1 is true, then evaluate e23. else done

Page 22: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 22 -

Translating Statements

Statement sequence: [[s1; s2; ...; sN]]

IR instructions of a statement sequence = concatenation of IR instructions of statements

[[ s1 ]][[ s2 ]]...[[ sN ]]

seq

s1 s2 sN...

Page 23: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 23 -

Assignment Statements

Variable assignment: [[ v = e ]]

Array assignment: [[ v[e1] = e2 ]]

v = [[ e ]]

t1 = addr vt2 = [[e1]]t3 = t2 * St4 = t1 + t3t5 = [[e2][t4] = t5 /* ie store */

recall S = sizeof(T)where v is array(T)

Page 24: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 24 -

Translating If-Then [-Else]

[[ if (e) then s ]] [[ if (e) then s1 else s2 ]]

t1 = [[ e ]]t2 = not t1cjump t2 LelseLthen: [[ s1 ]]jump LendLelse: [[ s2 ]]Lend:

t1 = [[ e ]]t2 = not t1cjump t2 Lend[[ s ]]Lend:

How could I do this moreefficiently??

Page 25: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 25 -

While Statements

[[ while (e) s ]]

Lloop: t1 = [[ e ]]t2 = NOT t1cjump t2 Lend[[ s ]]jump LloopLend:

or

while-do translation do-while translation

t1 = [[ e ]]t2 = NOT t1cjump t2 LendLloop: [[ s ]]t3 = [[ e ]]cjump t3 LloopLend:

Which is better and why?

Page 26: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 26 -

Class Problem

n = 0;while (n < 10) {

n = n+1;}

Convert the following code segment to IR

Page 27: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 27 -

Switch Statements

[[ switch (e) case v1:s1, ..., case vN:sN ]]

t = [[ e ]]L1: c = t != v1cjump c L2[[ s1 ]]jump Lend /* if there is a break */L2: c = t != v2cjump c L3[[ s2 ]]jump Lend /* if there is a break */...Lend:

Can also implementswitch as table lookup.Table contains targetlabels, ie L1, L2, L3.‘t’ is used to index table.

Benefit: k branchesreduced to 1. Negative: target of branchhard to figure out inhardware

Page 28: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 28 -

Call and Return Statements

[[ call f(e1, e2, ..., eN) ]]

[[ return e ]]

t1 = [[ e1 ]]t2 = [[ e2 ]]...tN = [[ eN ]]call f(t1, t2, ..., tN)

t = [[ e ]]return t

Page 29: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 29 -

Statement Expressions

So far: statements which do not return values

Easy extensions for statement expressions:» Block statements» If-then-else» Assignment statements

t = [[ s ]] is the sequence of low IR code for statement s, whose result is stored in t

Page 30: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 30 -

Statement Expressions

t = [[ if (e) then s1 else s2 ]]

t = [[ s1; s2; .. sN ]]

Result value of a block statement = value of last stmt in the sequence

t1 = [[ e ]]cjump t1 Lthent = [[ s2 ]]jump LendLthen: t = [[ s1 ]]Lend:

[[ s1 ]][[ s2 ]]...t = [[ sN ]]

Page 31: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 31 -

Assignment Statements

t = [[ v = e ]]

Result value of an assignment statement = value of the assigned expression

v = [[ e ]]t = v

Page 32: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 32 -

Nested Expressions

Translation recurses on the expression structure

Example: t = [[ (a – b) * (c + d) ]]

t1 = at2 = bt3 = t1 – t2t4 = ct5 = dt5 = t4 + t5t = t3 * t5

[[ (a – b) ]]

[[ (c + d) ]]

[[ (a-b) * (c+d) ]]

Page 33: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 33 -

Nested Statements

Same for statements: recursive translation Example: t = [[ if c then if d then a = b ]]

t1 = ct2 = NOT t1cjump t2 Lend1t3 = dt4 = NOT t3cjump t4 Lend2t3 = ba = t3Lend2:Lend1:

[[ if c ... ]][[ a = b ]]

[[ if d ... ]]

Page 34: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 34 -

Class Problem

for (i=0; i<100; i++) { A[i] = 0;}

if ((a > 0) && (b > 0)) c = 2;else c = 3;

Translate the following to the generic assembly code discussed

Page 35: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 35 -

Chapter 14: Building a runnable program

14.1 Back-End Compiler Structure

14.2 Intermediate Forms

14.3 Code Generation

14.4 Address Space Organization

Page 36: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 36 -

Issues

These translations are straightforward But, inefficient:

» Lots of temporaries» Lots of labels» Lots of instructions

Can we do this more intelligently?» Should we worry about it?

Page 37: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 37 -

2 Classes of Storage in Processor

Registers» Fast access, but only a few of them» Address space not visible to programmer

Doesn’t support pointer access!

Memory» Slow access, but large» Supports pointers

Storage class for each variable generally determined when map HIR to LIR

Page 38: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 38 -

Storage Class Selection

Standard (simple) approach» Globals/statics – memory

» Locals Composite types (structs, arrays, etc.) – memory Scalars

Accessed via ‘&’ operator? – memory Rest – Virtual register, later we will map virtual registers to true

machine registers. Note, as a result, some local scalars may be “spilled to memory”

All memory approach» Put all variables into memory

» Register allocation relocates some mem vars to registers

Page 39: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 39 -

4 Distinct Regions of Memory

Code space – Instructions to be executed» Best if read-only

Static (or Global) – Variables that retain their value over the lifetime of the program

Stack – Variables that is only as long as the block within which they are defined (local)

Heap – Variables that are defined by calls to the system storage allocator (malloc, new)

Page 40: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 40 -

Memory Organization

Code

Static Data

Stack

Heap

. . .

Code and staticdata sizes determinedby the compiler

Stack and heap sizesvary at run-time Stack grows downward Heap grows upward

Some ABI’s have stack/heap switched

Page 41: Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code

- 41 -

Class Problem

Specify whether each variable is stored in register or memory.For memory which area of the memory?

int a;void foo(int b, double c){ int d; struct { int e; char f;} g; int h[10]; char i = 5; float j;}