38
CSC 8505 Compiler Construction Intermediate Representations

CSC 8505 Compiler Construction

  • Upload
    ulla

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

CSC 8505 Compiler Construction. Intermediate Representations. The Role of Intermediate Code. lexical analysis syntax analysis static checking. intermediate code generation. final code generation. final code. source code. tokens. intermediate code. Why Intermediate Code?. - PowerPoint PPT Presentation

Citation preview

Page 1: CSC 8505 Compiler Construction

CSC 8505Compiler Construction

Intermediate Representations

Page 2: CSC 8505 Compiler Construction

The Role of Intermediate Code

2

lexical analysissyntax analysisstatic checking

intermediate code

generation

final code generation

source code

finalcode

tokens intermediatecode

Intermediate Reps.

Page 3: CSC 8505 Compiler Construction

3

Why Intermediate Code?

• Closer to target language. – simplifies code generation.

• Machine-independent.– simplifies retargeting of the compiler.– Allows a variety of optimizations to be

implemented in a machine-independent way.• Many compilers use several different

intermediate representations.

Intermediate Reps.

Page 4: CSC 8505 Compiler Construction

4

Different Kinds of IRs• Graphical IRs: the program structure is

represented as a graph (or tree) structure.Example: parse trees, syntax trees, DAGs.

• Linear IRs: the program is represented as a list of instructions for some virtual machine.Example: three-address code.

• Hybrid IRs: combines elements of graphical and linear IRs.Example: control flow graphs with 3-address code.

Intermediate Reps.

Page 5: CSC 8505 Compiler Construction

Types of Intermediate Languages• Graphical Representations.

– Consider the assignment a:=b*-c+b*-c:

assign

a +

* *

uminus uminusb

c c

b

assign

a+

*

uminus

b cIntermediate Reps. 5

Page 6: CSC 8505 Compiler Construction

6

Graphical IRs 1: Parse Trees

• A parse tree is a tree representation of a derivation during parsing.

• Constructing a parse tree:– The root is the start symbol S of the grammar.– Given a parse tree for X , if the next derivation step

is X 1…n then the parse tree is obtained as:

Intermediate Reps.

Page 7: CSC 8505 Compiler Construction

7

Graphical IRs 2: Abstract Syntax Trees (AST)

A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree.– Each node represents a computation to be performed;– The children of the node represents what that

computation is performed on.

Intermediate Reps.

Page 8: CSC 8505 Compiler Construction

8

Abstract Syntax Trees: ExampleGrammar :

E E + T | TT T * F | FF ( E ) | id

Input: id + id * id

Parse tree:

Syntax tree:

Intermediate Reps.

Page 9: CSC 8505 Compiler Construction

9

Syntax Trees: Structure• Expressions:

– leaves: identifiers or constants;– internal nodes are labeled with

operators;– the children of a node are its operands.

• Statements:– a node’s label indicates what kind of

statement it is;– the children correspond to the

components of the statement.

Intermediate Reps.

Page 10: CSC 8505 Compiler Construction

10

Graphical IRs 3: Directed Acyclic Graphs (DAGs)

A DAG is a contraction of an AST that avoids duplication of nodes.

• reduces compiler memory requirements;• exposes redundancies.

E.g.: for the expression (x+y)*(x+y), we have:

AST: DAG:

Intermediate Reps.

Page 11: CSC 8505 Compiler Construction

11

Linear IRs• A linear IR consists of a sequence of instructions

that execute in order.– “machine-independent assembly code”

• Instructions may contain multiple operations, which (if present) execute in parallel.

• They often form a starting point for hybrid representations (e.g., control flow graphs).

Intermediate Reps.

Page 12: CSC 8505 Compiler Construction

12

Linear IR 1: Three Address Code• Instructions are of the form ‘x = y op z,’ where x, y, z

are variables, constants, or “temporaries”.

• At most one operator allowed on RHS, so no ‘built-up” expressions.Instead, expressions are computed using temporaries

(compiler-generated variables).

• The specific set of operators represented, and their level of abstraction, can vary widely.

Intermediate Reps.

Page 13: CSC 8505 Compiler Construction

13

Three Address Code: Example• Source:

if ( x + y*z > x*y + z) a = 0;

• Three Address Code:t1 = y*zt2 = x+t1 // x + y*zt3 = x*yt4 = t3+z // x*y + zif (t2 t4) goto La = 0

L:

Intermediate Reps.

Page 14: CSC 8505 Compiler Construction

Three Address Code• Statements of general form x:=y op z

• No built-up arithmetic expressions are allowed.

• As a result, x:=y + z * wshould be represented ast1:=z * wt2:=y + t1x:=t2

Intermediate Reps. 14

Page 15: CSC 8505 Compiler Construction

Three Address Code• Observe that given the syntax-tree or the dag of

the graphical representation we can easily derive a three address code for assignments as above.

• In fact three-address code is a linearization of the tree.

• Three-address code is useful: related to machine-language/ simple/ optimizable.

Intermediate Reps. 15

Page 16: CSC 8505 Compiler Construction

Example of 3-address code

t1:=- ct2:=b * t1

t5:=t2 + t2

a:=t5

t1:=- ct2:=b * t1t3:=- ct4:=b * t3t5:=t2 + t4a:=t5

Intermediate Reps. 16

Page 17: CSC 8505 Compiler Construction

Types of Three-Address Statements

Assignment Statement: x:=y op zAssignment Statement: x:=op zCopy Statement: x:=zUnconditional Jump: goto LConditional Jump: if x relop y goto LStack Operations: Push/pop

Intermediate Reps. 17

Page 18: CSC 8505 Compiler Construction

Types of Three-Address StatementsProcedure:

param x1param x2…param xncall p,n

Index Assignments:x:=y[i]x[i]:=y

Address and Pointer Assignments:x:=&yx:=*y*x:=y

Intermediate Reps. 18

Page 19: CSC 8505 Compiler Construction

19

An Example Intermediate Instruction Set

• Assignment:– x = y op z (op binary)– x = op y (op unary); – x = y

• Jumps:– if ( x op y ) goto L (L a label); – goto L

• Pointer and indexed assignments:– x = y[ z ]– y[ z ] = x– x = &y– x = *y– *y = x.

• Procedure call/return:– param x, k (x is the kth

param)– retval x– call p– enter p– leave p– return– retrieve x

• Type Conversion:– x = cvt_A_to_B y (A, B base

types) e.g.: cvt_int_to_float• Miscellaneous

– label L

Intermediate Reps.

Page 20: CSC 8505 Compiler Construction

20

Three Address Code: Representation• Each instruction represented as a structure called a

quadruple (or “quad”):– contains info about the operation, up to 3 operands.– for operands: use a bit to indicate whether constant or Symbol

Table pointer.

E.g.: x = y + z if ( x y ) goto L

Intermediate Reps.

Page 21: CSC 8505 Compiler Construction

Implementations of 3-address statements• Quadruplest1:=- ct2:=b * t1

t3:=- ct4:=b * t3

t5:=t2 + t4

a:=t5

op arg1 arg2 result

(0) uminus c t1

(1) * b t1 t2

(2) uminus c

(3) * b t3 t4

(4) + t2 t4 t5

(5) := t5 a

Temporary names must be entered into the symbol table as they are created.

Intermediate Reps. 21

Page 22: CSC 8505 Compiler Construction

Implementations of 3-address statements, II

• Triplest1:=- ct2:=b * t1

t3:=- ct4:=b * t3

t5:=t2 + t4

a:=t5

op arg1 arg2

(0) uminus c

(1) * b (0)

(2) uminus c

(3) * b (2)

(4) + (1) (3)

(5) assign a (4)

Temporary names are not entered into the symbol table.

Intermediate Reps. 22

Page 23: CSC 8505 Compiler Construction

Other types of 3-address statements• e.g. ternary operations like

x[i]:=y x:=y[i]• require two or more entries. e.g.

op arg1 arg2

(0) [ ] = x i

(1) assign (0) y

op arg1 arg2

(0) [ ] = y i

(1) assign x (0)

Intermediate Reps. 23

Page 24: CSC 8505 Compiler Construction

Implementations of 3-address statements, III

• Indirect Triplesop arg1 arg2

(14) uminus c

(15) * b (14)

(16) uminus c

(17) * b (16)

(18) + (15) (17)

(19) assign a (18)

op

(0) (14)

(1) (15)

(2) (16)

(3) (17)

(4) (18)

(5) (19)

Intermediate Reps. 24

Page 25: CSC 8505 Compiler Construction

25

Linear IRs 2: Stack Machine Code• Sometimes called “One-address code.”• Assumes the presence of an operand stack.

– Most operations take (pop) their operands from the stack and push the result on the stack.

• Example: code for “x*y + z”

Stack machine code push x push y mult push z add

Three Address Code tmp1 = x tmp2 = y tmp3 = tmp1 * tmp2 tmp4 = z tmp5 = tmp3 + tmp4

Intermediate Reps.

Page 26: CSC 8505 Compiler Construction

26

Stack Machine Code: Features• Compact

– the stack creates an implicit name space, so many operands don’t have to be named explicitly in instructions.

– this shrinks the size of the IR.• Necessitates new operations for manipulating

the stack, e.g., “swap top two values”, “duplicate value on top.”

• Simple to generate and execute. • Interpreted stack machine codes easy to port.Intermediate Reps.

Page 27: CSC 8505 Compiler Construction

27

Linear IRs 3: Register Transfer Lang. (GNU RTL)

• Inspired by (and has syntax resembling) Lisp lists.

• Expressions are not “flattened” as in three-address code, but may be nested.– gives them a tree structure.

• Incorporates a variety of machine-level information.

Intermediate Reps.

Page 28: CSC 8505 Compiler Construction

28

RTLs (cont’d)

Low-level information associated with an RTL expression include:

• “machine modes” – gives the size of a data object;

• information about access to registers and memory;

• information relating to instruction scheduling and delay slots;

• whether a memory reference is “volatile.”Intermediate Reps.

Page 29: CSC 8505 Compiler Construction

29

RTLs: ExamplesExample operations:

– (plus:m x y), (minus:m x y), (compare:m x y), etc., where m is a machine mode.

– (cond [test1 value1 test2 value2 …] default)

– (set lval x) (assigns x to the place denoted by lval).

– (call func argsz), (return)

– (parallel [x0 x1 …]) (simultaneous side effects).

– (sequence [ins1 ins2 … ])Intermediate Reps.

Page 30: CSC 8505 Compiler Construction

30

RTL Examples (cont’d)

• A call to a function at address a passing n bytes of arguments, where the return value is in a (“hard”) register r:

(set (reg:m r) (call (mem:fm a) n))– here m and fm are machine modes.

• A division operation where the result is truncated to a smaller size:

(truncate:m1 (div:m2 x (sign_extend:m2 y)))

Intermediate Reps.

Page 31: CSC 8505 Compiler Construction

31

Hybrid IRs• Combine features of graphical and linear IRs:

– linear IR aspects capture a lower-level program representation;

– graphical IR aspects make control flow behavior explicit.

• Examples:– control flow graphs– static single assignment form (SSA).

Intermediate Reps.

Page 32: CSC 8505 Compiler Construction

32

Hybrid IRs 1: Control Flow Graphs

Example: L1: if x > y goto L0 t1 = x+1 x = t1 L0: y = 0 goto L1

Definition: A control flow graph for a function is a directed graph G = (V, E) such that:– each v V is a straight-line code sequence (“basic block”); and– there is an edge a b E iff control can go directly from a to b.

Intermediate Reps.

Page 33: CSC 8505 Compiler Construction

33

Basic Blocks• Definition: A basic block B is a sequence

of consecutive instructions such that:1. control enters B only at its beginning; and2. control leaves B only at its end (under normal

execution); and

• This implies that if any instruction in a basic block B is executed, then all instructions in B are executed.Þ for program analysis purposes, we can treat a

basic block as a single entity. Intermediate Reps.

Page 34: CSC 8505 Compiler Construction

34

Identifying Basic Blocks1. Determine the set of leaders, i.e., the first

instruction of each basic block:– the entry point of the function is a leader;– any instruction that is the target of a branch is a

leader;– any instruction following a (conditional or

unconditional) branch is a leader.2. For each leader, its basic block consists of:

– the leader itself;– all subsequent instructions upto, but not including,

the next leader.

Intermediate Reps.

Page 35: CSC 8505 Compiler Construction

35

Exampleint dotprod(int a[], int b[], int N){ int i, prod = 0; for (i = 1; i N; i++) { prod += a[i]b[i]; } return prod;}

No. Instruction leader? Block No.

1 enter dotprod Y 1

2 prod = 0 1

3 i = 1 1

4 t1 = 4*i Y 2

5 t2 = a[t1] 2

6 t3 = 4*i 2

7 t4 = b[t3] 2

8 t5 = t2*t4 2

9 t6 = prod+t5 2

10 prod = t6 2

11 t7 = i+i 2

12 i = t7 2

13 if i N goto 4 2

14 retval prod Y 3

15 leave dotprod 3

16 return 3

Intermediate Reps.

Page 36: CSC 8505 Compiler Construction

36

Hybrid IRs 2: Static Single Assignment Form

• The Static Single Assignment (SSA) form of a program makes information about variable definitions and uses explicit.– This can simplify program analysis.

• A program is in SSA form if it satisfies:– each definition has a distinct name; and– each use refers to a single definition.

• To make this work, the compiler inserts special operations, called -functions, at points where control flow paths join.

Intermediate Reps.

Page 37: CSC 8505 Compiler Construction

37

SSA Form: - Functions• A -function behaves as follows: x1 = … x2 = …

x3 = (x1, x2)

This assigns to x3 the value of x1, if control comes from the left, and that of x2 if control comes from the right.

• On entry to a basic block, all the -functions in the block execute (conceptually) in parallel.

Intermediate Reps.

Page 38: CSC 8505 Compiler Construction

38

SSA Form: ExampleExample:

Original code Code in SSA form

Intermediate Reps.