Upload
jorden-oxford
View
224
Download
3
Embed Size (px)
Citation preview
Intermediate Representations
Saumya DebrayDept. of Computer ScienceThe University of Arizona
Tucson, AZ 85721
The Role of Intermediate Code
Intermediate Code 2
lexical analysissyntax analysisstatic checking
intermediate code
generation
final code generation
source code
finalcode
tokens intermediatecode
Intermediate Code 3
Why Intermediate Code?
• Closer to target language. – simplifies code generation.
• Machine-independent.– simplifies retargeting of the compiler.– Allows a variety of optimizations to be implemented in
a machine-independent way.
• Many compilers use several different intermediate representations.
Intermediate Code 4
Different Kinds of IRs
• Graphical IRs: the program structure is represented as a graph (or tree) structure.Example: parse trees, syntax trees, DAGs.
• Linear IRs: the program is represented as a list of instructions for some virtual machine.Example: three-address code.
• Hybrid IRs: combines elements of graphical and linear IRs.Example: control flow graphs with 3-address code.
Intermediate Code 5
Graphical IRs 1: Parse Trees
• A parse tree is a tree representation of a derivation during parsing.
• Constructing a parse tree:– The root is the start symbol S of the grammar.– Given a parse tree for X , if the next derivation step is
X 1…n then the parse tree is obtained as:
Intermediate Code 6
Graphical IRs 2: Abstract Syntax Trees (AST)
A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree.– Each node represents a computation to be performed;– The children of the node represents what that
computation is performed on.
Intermediate Code 7
Abstract Syntax Trees: Example
Grammar :E E + T | T
T T * F | F
F ( E ) | id
Input: id + id * id
Parse tree:
Syntax tree:
Intermediate Code 8
Syntax Trees: Structure
• Expressions:– leaves: identifiers or constants;– internal nodes are labeled with
operators;– the children of a node are its operands.
• Statements:– a node’s label indicates what kind of
statement it is;– the children correspond to the
components of the statement.
Intermediate Code 9
Graphical IRs 3: Directed Acyclic Graphs (DAGs)A DAG is a contraction of an AST that avoids
duplication of nodes.• reduces compiler memory requirements;• exposes redundancies.
E.g.: for the expression (x+y)*(x+y), we have:
AST: DAG:
Intermediate Code 10
Linear IRs
• A linear IR consists of a sequence of instructions that execute in order.– “machine-independent assembly code”
• Instructions may contain multiple operations, which (if present) execute in parallel.
• They often form a starting point for hybrid representations (e.g., control flow graphs).
Intermediate Code 11
Linear IR 1: Three Address Code
• Instructions are of the form ‘x = y op z,’ where x, y, z are variables, constants, or “temporaries”.
• At most one operator allowed on RHS, so no ‘built-up” expressions.Instead, expressions are computed using temporaries
(compiler-generated variables).
• The specific set of operators represented, and their level of abstraction, can vary widely.
Intermediate Code 12
Three Address Code: Example
• Source: if ( x + y*z > x*y + z) a = 0;
• Three Address Code:t1 = y*zt2 = x+t1 // x + y*zt3 = x*yt4 = t3+z // x*y + zif (t2 t4) goto La = 0
L:
Intermediate Code 13
An Example Intermediate Instruction Set• Assignment:
– x = y op z (op binary)– x = op y (op unary); – x = y
• Jumps:– if ( x op y ) goto L (L a label); – goto L
• Pointer and indexed assignments:– x = y[ z ]– y[ z ] = x– x = &y– x = *y– *y = x.
• Procedure call/return:– param x, k (x is the kth
param)– retval x– call p– enter p– leave p– return– retrieve x
• Type Conversion:– x = cvt_A_to_B y (A, B base
types) e.g.: cvt_int_to_float• Miscellaneous
– label L
Intermediate Code 14
Three Address Code: Representation• Each instruction represented as a structure called a
quadruple (or “quad”):– contains info about the operation, up to 3 operands.– for operands: use a bit to indicate whether constant or Symbol
Table pointer.
E.g.: x = y + z if ( x y ) goto L
Intermediate Code 15
Linear IRs 2: Stack Machine Code
• Sometimes called “One-address code.”• Assumes the presence of an operand stack.
– Most operations take (pop) their operands from the stack and push the result on the stack.
• Example: code for “x*y + z”
Stack machine code push x
push y
mult
push z
add
Three Address Code tmp1 = x
tmp2 = y
tmp3 = tmp1 * tmp2
tmp4 = z
tmp5 = tmp3 + tmp4
Intermediate Code 16
Stack Machine Code: Features
• Compact– the stack creates an implicit name space, so many
operands don’t have to be named explicitly in instructions.
– this shrinks the size of the IR.
• Necessitates new operations for manipulating the stack, e.g., “swap top two values”, “duplicate value on top.”
• Simple to generate and execute. • Interpreted stack machine codes easy to port.
Intermediate Code 17
Linear IRs 3: Register Transfer Language (GNU RTL)
• Inspired by (and has syntax resembling) Lisp lists.
• Expressions are not “flattened” as in three-address code, but may be nested.– gives them a tree structure.
• Incorporates a variety of machine-level information.
Intermediate Code 18
RTLs (cont’d)
Low-level information associated with an RTL expression include:
• “machine modes” – gives the size of a data object;
• information about access to registers and memory;
• information relating to instruction scheduling and delay slots;
• whether a memory reference is “volatile.”
Intermediate Code 19
RTLs: Examples
Example operations:– (plus:m x y), (minus:m x y), (compare:m x y), etc.,
where m is a machine mode.
– (cond [test1 value1 test2 value2 …] default)
– (set lval x) (assigns x to the place denoted by lval).
– (call func argsz), (return)
– (parallel [x0 x1 …]) (simultaneous side effects).
– (sequence [ins1 ins2 … ])
Intermediate Code 20
RTL Examples (cont’d)
• A call to a function at address a passing n bytes of arguments, where the return value is in a (“hard”) register r:
(set (reg:m r) (call (mem:fm a) n))
– here m and fm are machine modes.
• A division operation where the result is truncated to a smaller size:
(truncate:m1 (div:m2 x (sign_extend:m2 y)))
Intermediate Code 21
Hybrid IRs
• Combine features of graphical and linear IRs:– linear IR aspects capture a lower-level program
representation;– graphical IR aspects make control flow behavior
explicit.
• Examples:– control flow graphs– static single assignment form (SSA).
Intermediate Code 22
Hybrid IRs 1: Control Flow Graphs
Example: L1: if x > y goto L0 t1 = x+1 x = t1 L0: y = 0 goto L1
Definition: A control flow graph for a function is a directed graph G = (V, E) such that:– each v V is a straight-line code sequence (“basic block”); and– there is an edge a b E iff control can go directly from a to b.
Intermediate Code 23
Basic Blocks
• Definition: A basic block B is a sequence of consecutive instructions such that:1. control enters B only at its beginning; and
2. control leaves B only at its end (under normal execution); and
• This implies that if any instruction in a basic block B is executed, then all instructions in B are executed.Þ for program analysis purposes, we can treat a
basic block as a single entity.
Intermediate Code 24
Identifying Basic Blocks
1. Determine the set of leaders, i.e., the first instruction of each basic block:
– the entry point of the function is a leader;– any instruction that is the target of a branch is a
leader;– any instruction following a (conditional or
unconditional) branch is a leader.
2. For each leader, its basic block consists of:– the leader itself;– all subsequent instructions upto, but not including,
the next leader.
Intermediate Code 25
Example
int dotprod(int a[], int b[], int N)
{
int i, prod = 0;
for (i = 1; i N; i++) {
prod += a[i]b[i];
}
return prod;
}
No. Instruction leader? Block No.
1 enter dotprod Y 1
2 prod = 0 1
3 i = 1 1
4 t1 = 4*i Y 2
5 t2 = a[t1] 2
6 t3 = 4*i 2
7 t4 = b[t3] 2
8 t5 = t2*t4 2
9 t6 = prod+t5 2
10 prod = t6 2
11 t7 = i+i 2
12 i = t7 2
13 if i N goto 4 2
14 retval prod Y 3
15 leave dotprod 3
16 return 3
Intermediate Code 26
Hybrid IRs 2: Static Single Assignment Form
• The Static Single Assignment (SSA) form of a program makes information about variable definitions and uses explicit.– This can simplify program analysis.
• A program is in SSA form if it satisfies:– each definition has a distinct name; and– each use refers to a single definition.
• To make this work, the compiler inserts special operations, called -functions, at points where control flow paths join.
Intermediate Code 27
SSA Form: - Functions
• A -function behaves as follows:
x1 = … x2 = …
x3 = (x1, x2)
This assigns to x3 the value of x1, if control comes from the left, and that of x2 if control comes from the right.
• On entry to a basic block, all the -functions in the block execute (conceptually) in parallel.