Transcript
Page 1: CSC 8505 Compiler Construction

CSC 8505Compiler Construction

Intermediate Representations

Page 2: CSC 8505 Compiler Construction

The Role of Intermediate Code

2

lexical analysissyntax analysisstatic checking

intermediate code

generation

final code generation

source code

finalcode

tokens intermediatecode

Intermediate Reps.

Page 3: CSC 8505 Compiler Construction

3

Why Intermediate Code?

• Closer to target language. – simplifies code generation.

• Machine-independent.– simplifies retargeting of the compiler.– Allows a variety of optimizations to be

implemented in a machine-independent way.• Many compilers use several different

intermediate representations.

Intermediate Reps.

Page 4: CSC 8505 Compiler Construction

4

Different Kinds of IRs• Graphical IRs: the program structure is

represented as a graph (or tree) structure.Example: parse trees, syntax trees, DAGs.

• Linear IRs: the program is represented as a list of instructions for some virtual machine.Example: three-address code.

• Hybrid IRs: combines elements of graphical and linear IRs.Example: control flow graphs with 3-address code.

Intermediate Reps.

Page 5: CSC 8505 Compiler Construction

Types of Intermediate Languages• Graphical Representations.

– Consider the assignment a:=b*-c+b*-c:

assign

a +

* *

uminus uminusb

c c

b

assign

a+

*

uminus

b cIntermediate Reps. 5

Page 6: CSC 8505 Compiler Construction

6

Graphical IRs 1: Parse Trees

• A parse tree is a tree representation of a derivation during parsing.

• Constructing a parse tree:– The root is the start symbol S of the grammar.– Given a parse tree for X , if the next derivation step

is X 1…n then the parse tree is obtained as:

Intermediate Reps.

Page 7: CSC 8505 Compiler Construction

7

Graphical IRs 2: Abstract Syntax Trees (AST)

A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree.– Each node represents a computation to be performed;– The children of the node represents what that

computation is performed on.

Intermediate Reps.

Page 8: CSC 8505 Compiler Construction

8

Abstract Syntax Trees: ExampleGrammar :

E E + T | TT T * F | FF ( E ) | id

Input: id + id * id

Parse tree:

Syntax tree:

Intermediate Reps.

Page 9: CSC 8505 Compiler Construction

9

Syntax Trees: Structure• Expressions:

– leaves: identifiers or constants;– internal nodes are labeled with

operators;– the children of a node are its operands.

• Statements:– a node’s label indicates what kind of

statement it is;– the children correspond to the

components of the statement.

Intermediate Reps.

Page 10: CSC 8505 Compiler Construction

10

Graphical IRs 3: Directed Acyclic Graphs (DAGs)

A DAG is a contraction of an AST that avoids duplication of nodes.

• reduces compiler memory requirements;• exposes redundancies.

E.g.: for the expression (x+y)*(x+y), we have:

AST: DAG:

Intermediate Reps.

Page 11: CSC 8505 Compiler Construction

11

Linear IRs• A linear IR consists of a sequence of instructions

that execute in order.– “machine-independent assembly code”

• Instructions may contain multiple operations, which (if present) execute in parallel.

• They often form a starting point for hybrid representations (e.g., control flow graphs).

Intermediate Reps.

Page 12: CSC 8505 Compiler Construction

12

Linear IR 1: Three Address Code• Instructions are of the form ‘x = y op z,’ where x, y, z

are variables, constants, or “temporaries”.

• At most one operator allowed on RHS, so no ‘built-up” expressions.Instead, expressions are computed using temporaries

(compiler-generated variables).

• The specific set of operators represented, and their level of abstraction, can vary widely.

Intermediate Reps.

Page 13: CSC 8505 Compiler Construction

13

Three Address Code: Example• Source:

if ( x + y*z > x*y + z) a = 0;

• Three Address Code:t1 = y*zt2 = x+t1 // x + y*zt3 = x*yt4 = t3+z // x*y + zif (t2 t4) goto La = 0

L:

Intermediate Reps.

Page 14: CSC 8505 Compiler Construction

Three Address Code• Statements of general form x:=y op z

• No built-up arithmetic expressions are allowed.

• As a result, x:=y + z * wshould be represented ast1:=z * wt2:=y + t1x:=t2

Intermediate Reps. 14

Page 15: CSC 8505 Compiler Construction

Three Address Code• Observe that given the syntax-tree or the dag of

the graphical representation we can easily derive a three address code for assignments as above.

• In fact three-address code is a linearization of the tree.

• Three-address code is useful: related to machine-language/ simple/ optimizable.

Intermediate Reps. 15

Page 16: CSC 8505 Compiler Construction

Example of 3-address code

t1:=- ct2:=b * t1

t5:=t2 + t2

a:=t5

t1:=- ct2:=b * t1t3:=- ct4:=b * t3t5:=t2 + t4a:=t5

Intermediate Reps. 16

Page 17: CSC 8505 Compiler Construction

Types of Three-Address Statements

Assignment Statement: x:=y op zAssignment Statement: x:=op zCopy Statement: x:=zUnconditional Jump: goto LConditional Jump: if x relop y goto LStack Operations: Push/pop

Intermediate Reps. 17

Page 18: CSC 8505 Compiler Construction

Types of Three-Address StatementsProcedure:

param x1param x2…param xncall p,n

Index Assignments:x:=y[i]x[i]:=y

Address and Pointer Assignments:x:=&yx:=*y*x:=y

Intermediate Reps. 18

Page 19: CSC 8505 Compiler Construction

19

An Example Intermediate Instruction Set

• Assignment:– x = y op z (op binary)– x = op y (op unary); – x = y

• Jumps:– if ( x op y ) goto L (L a label); – goto L

• Pointer and indexed assignments:– x = y[ z ]– y[ z ] = x– x = &y– x = *y– *y = x.

• Procedure call/return:– param x, k (x is the kth

param)– retval x– call p– enter p– leave p– return– retrieve x

• Type Conversion:– x = cvt_A_to_B y (A, B base

types) e.g.: cvt_int_to_float• Miscellaneous

– label L

Intermediate Reps.

Page 20: CSC 8505 Compiler Construction

20

Three Address Code: Representation• Each instruction represented as a structure called a

quadruple (or “quad”):– contains info about the operation, up to 3 operands.– for operands: use a bit to indicate whether constant or Symbol

Table pointer.

E.g.: x = y + z if ( x y ) goto L

Intermediate Reps.

Page 21: CSC 8505 Compiler Construction

Implementations of 3-address statements• Quadruplest1:=- ct2:=b * t1

t3:=- ct4:=b * t3

t5:=t2 + t4

a:=t5

op arg1 arg2 result

(0) uminus c t1

(1) * b t1 t2

(2) uminus c

(3) * b t3 t4

(4) + t2 t4 t5

(5) := t5 a

Temporary names must be entered into the symbol table as they are created.

Intermediate Reps. 21

Page 22: CSC 8505 Compiler Construction

Implementations of 3-address statements, II

• Triplest1:=- ct2:=b * t1

t3:=- ct4:=b * t3

t5:=t2 + t4

a:=t5

op arg1 arg2

(0) uminus c

(1) * b (0)

(2) uminus c

(3) * b (2)

(4) + (1) (3)

(5) assign a (4)

Temporary names are not entered into the symbol table.

Intermediate Reps. 22

Page 23: CSC 8505 Compiler Construction

Other types of 3-address statements• e.g. ternary operations like

x[i]:=y x:=y[i]• require two or more entries. e.g.

op arg1 arg2

(0) [ ] = x i

(1) assign (0) y

op arg1 arg2

(0) [ ] = y i

(1) assign x (0)

Intermediate Reps. 23

Page 24: CSC 8505 Compiler Construction

Implementations of 3-address statements, III

• Indirect Triplesop arg1 arg2

(14) uminus c

(15) * b (14)

(16) uminus c

(17) * b (16)

(18) + (15) (17)

(19) assign a (18)

op

(0) (14)

(1) (15)

(2) (16)

(3) (17)

(4) (18)

(5) (19)

Intermediate Reps. 24

Page 25: CSC 8505 Compiler Construction

25

Linear IRs 2: Stack Machine Code• Sometimes called “One-address code.”• Assumes the presence of an operand stack.

– Most operations take (pop) their operands from the stack and push the result on the stack.

• Example: code for “x*y + z”

Stack machine code push x push y mult push z add

Three Address Code tmp1 = x tmp2 = y tmp3 = tmp1 * tmp2 tmp4 = z tmp5 = tmp3 + tmp4

Intermediate Reps.

Page 26: CSC 8505 Compiler Construction

26

Stack Machine Code: Features• Compact

– the stack creates an implicit name space, so many operands don’t have to be named explicitly in instructions.

– this shrinks the size of the IR.• Necessitates new operations for manipulating

the stack, e.g., “swap top two values”, “duplicate value on top.”

• Simple to generate and execute. • Interpreted stack machine codes easy to port.Intermediate Reps.

Page 27: CSC 8505 Compiler Construction

27

Linear IRs 3: Register Transfer Lang. (GNU RTL)

• Inspired by (and has syntax resembling) Lisp lists.

• Expressions are not “flattened” as in three-address code, but may be nested.– gives them a tree structure.

• Incorporates a variety of machine-level information.

Intermediate Reps.

Page 28: CSC 8505 Compiler Construction

28

RTLs (cont’d)

Low-level information associated with an RTL expression include:

• “machine modes” – gives the size of a data object;

• information about access to registers and memory;

• information relating to instruction scheduling and delay slots;

• whether a memory reference is “volatile.”Intermediate Reps.

Page 29: CSC 8505 Compiler Construction

29

RTLs: ExamplesExample operations:

– (plus:m x y), (minus:m x y), (compare:m x y), etc., where m is a machine mode.

– (cond [test1 value1 test2 value2 …] default)

– (set lval x) (assigns x to the place denoted by lval).

– (call func argsz), (return)

– (parallel [x0 x1 …]) (simultaneous side effects).

– (sequence [ins1 ins2 … ])Intermediate Reps.

Page 30: CSC 8505 Compiler Construction

30

RTL Examples (cont’d)

• A call to a function at address a passing n bytes of arguments, where the return value is in a (“hard”) register r:

(set (reg:m r) (call (mem:fm a) n))– here m and fm are machine modes.

• A division operation where the result is truncated to a smaller size:

(truncate:m1 (div:m2 x (sign_extend:m2 y)))

Intermediate Reps.

Page 31: CSC 8505 Compiler Construction

31

Hybrid IRs• Combine features of graphical and linear IRs:

– linear IR aspects capture a lower-level program representation;

– graphical IR aspects make control flow behavior explicit.

• Examples:– control flow graphs– static single assignment form (SSA).

Intermediate Reps.

Page 32: CSC 8505 Compiler Construction

32

Hybrid IRs 1: Control Flow Graphs

Example: L1: if x > y goto L0 t1 = x+1 x = t1 L0: y = 0 goto L1

Definition: A control flow graph for a function is a directed graph G = (V, E) such that:– each v V is a straight-line code sequence (“basic block”); and– there is an edge a b E iff control can go directly from a to b.

Intermediate Reps.

Page 33: CSC 8505 Compiler Construction

33

Basic Blocks• Definition: A basic block B is a sequence

of consecutive instructions such that:1. control enters B only at its beginning; and2. control leaves B only at its end (under normal

execution); and

• This implies that if any instruction in a basic block B is executed, then all instructions in B are executed.Þ for program analysis purposes, we can treat a

basic block as a single entity. Intermediate Reps.

Page 34: CSC 8505 Compiler Construction

34

Identifying Basic Blocks1. Determine the set of leaders, i.e., the first

instruction of each basic block:– the entry point of the function is a leader;– any instruction that is the target of a branch is a

leader;– any instruction following a (conditional or

unconditional) branch is a leader.2. For each leader, its basic block consists of:

– the leader itself;– all subsequent instructions upto, but not including,

the next leader.

Intermediate Reps.

Page 35: CSC 8505 Compiler Construction

35

Exampleint dotprod(int a[], int b[], int N){ int i, prod = 0; for (i = 1; i N; i++) { prod += a[i]b[i]; } return prod;}

No. Instruction leader? Block No.

1 enter dotprod Y 1

2 prod = 0 1

3 i = 1 1

4 t1 = 4*i Y 2

5 t2 = a[t1] 2

6 t3 = 4*i 2

7 t4 = b[t3] 2

8 t5 = t2*t4 2

9 t6 = prod+t5 2

10 prod = t6 2

11 t7 = i+i 2

12 i = t7 2

13 if i N goto 4 2

14 retval prod Y 3

15 leave dotprod 3

16 return 3

Intermediate Reps.

Page 36: CSC 8505 Compiler Construction

36

Hybrid IRs 2: Static Single Assignment Form

• The Static Single Assignment (SSA) form of a program makes information about variable definitions and uses explicit.– This can simplify program analysis.

• A program is in SSA form if it satisfies:– each definition has a distinct name; and– each use refers to a single definition.

• To make this work, the compiler inserts special operations, called -functions, at points where control flow paths join.

Intermediate Reps.

Page 37: CSC 8505 Compiler Construction

37

SSA Form: - Functions• A -function behaves as follows: x1 = … x2 = …

x3 = (x1, x2)

This assigns to x3 the value of x1, if control comes from the left, and that of x2 if control comes from the right.

• On entry to a basic block, all the -functions in the block execute (conceptually) in parallel.

Intermediate Reps.

Page 38: CSC 8505 Compiler Construction

38

SSA Form: ExampleExample:

Original code Code in SSA form

Intermediate Reps.


Recommended