Upload
eunice-webb
View
220
Download
3
Tags:
Embed Size (px)
Citation preview
Spring 2014 Jim Hogg - UW - CSE - P501 P-1
CSE P501 – Compiler Construction
Register allocation constraints
Local allocation Fast, but poorer code
Global allocation Register coloring
Spring 2014 Jim Hogg - UW - CSE - P501 P-2
k
IR typically assumes infinite number of virtual registers
Real machine has k physical registers available x86: k = 8 x64: k = 16
Goals Produce correct code that uses k or less registers Minimize added loads and stores Minimize space needed for spilled values Do this efficiently – O(n), O(n log n), maybe O(n2)
Spring 2014 Jim Hogg - UW - CSE - P501 P-3
Register Allocation
At each point in the code, pick which values to keep in registers
Insert code to move values between registers and memory
No additional transformations – scheduling already ran
But will usually re-run scheduling afterwards (to include the effects of spilling)
Minimize inserted save/restores ("spill" code)
Spring 2014 Jim Hogg - UW - CSE - P501 P-4
Allocation vs Assignment
Allocation: which values to keep in registers?
Assignment: which specific registers?
Compiler must do both
Spring 2014 Jim Hogg - UW - CSE - P501 P-5
Applied to basic blocks (ie: "local")
Produces good register usage inside a block But can have inefficiencies at boundaries between
blocks
Two variations: top-down, bottom-up
Local Register Allocation
"Top-down" Local Allocation
We assume a RISC target machine ("load/store architecture") load operands into two registers (if not already there) perform instruction (eg: add, lshift) store result back to memory (if required) ie: operands for instructions MUST be in registers, rather than memory (note, by contrast, that x86 allows operands directly from memory)
Principle: keep most heavily used values in registers Priority = # of times virtual-register used (destination and/or source) in
block
If more virtual registers than physical (the common case) Reserve some registers for values allocated to memory
Need enough to address and load two operands and store result (R1 = R2 + R3)
Other registers dedicated to “hot” values But are tied up for entire block with particular value, even if only needed for
part of the block
Spring 2014 Jim Hogg - UW - CSE - P501 P-6
"Bottom-up" Local Allocation, 1
Keep a list of free physical registers initially all physical registers at beginning of block except frame-pointer, stack-pointer - pre-allocated
Scan code, instruction-by-instruction Allocate a register when one is needed
if one is free, allocate it otherwise, pick a victim - eg: VR3 using R7 save contents of R7 allocate R7 to requesting virtual register
Emit instructions to use physical register Free physical register as soon as possible
In x = y op z, free y and z if no longer needed before allocating x
Spring 2014 Jim Hogg - UW - CSE - P501 P-7
Bottom-up Local Allocation, 2
How to pick a victim?
Scan ahead thru code Look for next use of each currently-allocated-to virtual
register Pick the virtual register whose next use is furthest off.
Why? Save the victim register's to memory (restore later)
Spring 2014 Jim Hogg - UW - CSE - P501 P-8
Virtual RegisterVR
Physical Register VR Next Use
Instruction
14 3 5
23 2 8
2 4 5
5 1
Example: CPU with 4 available registers
Local "bottom-up" Register Allocation, -1
1. ; load v2 from memory2. ; load v3 from memory3. v1 = v2 + v34. ; load v5, v6 from
memory5. v4 = v5 - v66. v7 = v2 - 297. ; load v9 from memory8. v8 = - v99. v10 = v6 * v4 10. v11 = v10 - v3
Spring 2014 Jim Hogg - UW - CSE - P501 P-9
• Still in LIR. So lots (too many!) virtual registers required (v2, etc).• Grey instructions (1,2,4,7) load operands from memory into virtual
registers.• We will ignore these going forward. Focus on mapping virtual to
physical.
Local "bottom-up" Register Allocation, 0
1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3
Spring 2014 Jim Hogg - UW - CSE - P501 P-10
v1 1v2 1v3 1v4 2v5 2v6 2v7 3v8 4v9 4v10 5v11 6
vReg NextRef
R1 -R2 -R3 -R4 -
pReg vReg
Local "bottom-up" Register Allocation, 1
1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3
Spring 2014 Jim Hogg - UW - CSE - P501 P-11
v1 1 v2 1 3v3 1 6v4 2v5 2v6 2v7 3v8 4v9 4v10 5v11 6
vReg NextRef
R1 v2R2 v3R3 v1R4 -
pReg vReg
R3 = R1 + R2
Local "bottom-up" Register Allocation, 2
1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3
Spring 2014 Jim Hogg - UW - CSE - P501 P-12
v1 v2 3v3 6v4 2 5v5 2 v6 2 5v7 3v8 4v9 4v10 5v11 6
vReg NextRef
R1 v2R2 v3 v4R3 v1 v6R4 v5
pReg vReg
R3 = R1 + R2; spill R3; spill R2? - no - still cleanR2 = R4 - R3
Local "bottom-up" Register Allocation, 3
1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3
Spring 2014 Jim Hogg - UW - CSE - P501 P-13
v1 v2 3 v3 6v4 5v5 v6 5v7 3 v8 4v9 4v10 5v11 6
vReg NextRef
R1 v2R2 v4R3 v6R4 v5 v7
pReg vReg
And so on . . .
R3 = R1 + R2; spill R3; spill R2? - no!R2 = R4 - R3; spill R4? - no!R4 = R1 - 29
Spring 2014 Jim Hogg - UW - CSE - P501 P-14
Bottom-Up Allocator
Invented about once per decade Sheldon Best, 1955, for Fortran I Laslo Belady, 1965, for analyzing paging algorithms William Harrison, 1975, ECS compiler work Chris Fraser, 1989, LCC compiler Vincenzo Liberatore, 1997, Rutgers
Will be reinvented again, no doubt
Many arguments for optimality of this
Standard technique is graph coloring
Use control-graph and dataflow-graph to derive interference graph
Nodes are live ranges (not registers!) Edge between nodes when live at same time Connected nodes cannot use same register
Then color the nodes in the graph Two nodes connected by an edge may not have same color (ie:
be allocated to same physical register) If more than k colors are needed, insert spill code
Spring 2014 Jim Hogg - UW - CSE - P501 P-15
Global Register Allocation
Graph Coloring, 1
Spring 2014 Jim Hogg - UW - CSE - P501 P-16
1
32 4
5
• Assign a color to each node, such that
• No adjacent nodes have same color
Graph Coloring, 2
Spring 2014 Jim Hogg - UW - CSE - P501 P-17
1
32 4
5
• 2 colors are enough for this graph• called a 2-coloring• this graph's chromatic number = 2
Graph Coloring, 3
Spring 2014 Jim Hogg - UW - CSE - P501 P-18
1
32 4
5
1
32 4
5
small changes forces a 3rd color
Live Ranges
1 a = ...2 b = ...3 c = ...4 a = a + b + c5 if a < 10 then6 d = c + 97 print(c)8 elseif a < 20 then9 e = 1010 d = e + a11 print(e)12 else13 f = 1214 d = f + a15 print(f)16 endif17 print(d)
Spring 2014 Jim Hogg - UW - CSE - P501 P-19
• Live Range of a variable is between a def and all subsequent possible uses
• if a < 10, then a is live lines 1..5• if a < 20, then a is live lines 1..10• otherwise, a is live lines 1..14
• Mmm - clearly nonsense! We need a flowgraph for a correct specification
• Note: line 14: a goes dead on RHS; d comes live on LHS. So death and birth happen 'half-way thru' the instruction
• This example is simplified: variables are never re-def'd. And there are no temps
Live Ranges in Flowgraphs
a = ...b = ...c = ...a = a + b + cif a < 10 then d = c + 9 print(c)elseif a < 20 then e = 10 d = e + a print(e)else f = 12 d = f + a print(f)endifprint(d)
Spring 2014 Jim Hogg - UW - CSE - P501 P-20
a = ...b = ...c = ...a = a + b + c
a < 10
a < 20d = c + 8print(c)
e = 10d = e + aprint(e)
f = 12d = f + aprint(f)
print(d)
ab
c
Register Interference Graph
Spring 2014 Jim Hogg - UW - CSE - P501 P-21
f
e
ba
d c
• If live range of 2 variables overlap, or interfere, they cannot use same register
• Eg: a and b cannot both be stored in register R5• Eg: c and e can be stored in register R5 (at different, non-overlapping
times)
• How many registers will we need? - Guesses?
Interference Graph - Coloring
Spring 2014 Jim Hogg - UW - CSE - P501 P-22
f
e
ba
d c
• Problem: color this interference graph, where each color represents a different physical register
• Solution: simplify graph: recursively remove easiest nodes (lowest degree)
NodeDegreea 3
b 2c 3d 3e 2f 2
Interference Graph - Remove b
Spring 2014 Jim Hogg - UW - CSE - P501 P-23
f
e
ba
d c
NodeDegreea 2
b 2c 2d 3e 2f 2
Removed: b
Interference Graph - Remove a
Spring 2014 Jim Hogg - UW - CSE - P501 P-24
f
e
ba
d c
NodeDegreea 2
b 2c 1d 3e 1f 1
Removed: b a
Interference Graph - Remove c
Spring 2014 Jim Hogg - UW - CSE - P501 P-25
f
e
ba
d c
NodeDegreea 2
b 2c 1d 2e 1f 1
Removed: b a c
Interference Graph - Remove e
Spring 2014 Jim Hogg - UW - CSE - P501 P-26
f
e
ba
d c
NodeDegreea 2
b 2c 1d 2e 1f 1
Removed: b a c e
Interference Graph - Color!
Spring 2014 Jim Hogg - UW - CSE - P501 P-27
f
e
ba
d c
Removed: b a c e
Interference Graph - Add e
Spring 2014 Jim Hogg - UW - CSE - P501 P-28
f
e
ba
d c
Removed: b a c
Interference Graph - Add c
Spring 2014 Jim Hogg - UW - CSE - P501 P-29
f
e
ba
d c
Removed: b a
Interference Graph - Add a
Spring 2014 Jim Hogg - UW - CSE - P501 P-30
f
e
ba
d c
Removed: b
Interference Graph - Add b
Spring 2014 Jim Hogg - UW - CSE - P501 P-31
f
e
ba
d c
• 3 colors are enough• eg: red = R1, green = R2, blue = R3
• If less registers available - need to spill
Live Ranges: More Realistic Example1. varp = ...
2. vw = [varp + 0]
3. v2 = 2
4. vx = [varp + @x]
5. vy = [varp + @y]
6. vz = [varp + @z]
7. vw = vw * v2
8. vw = vw * vx
9. vw = vw * vy
10. vw = vw * vz
11. [varp + 0] = vw
varp 1..11
vw 2..7
vw 7..8
vw 8..9
vw 9..10
vw 10..11
v2 3..7
vx 4..8
vy 5..9
vz 6..10
P-32
RegisterInterval
In this example, virtual registers are re-def'd. Eg: vw participates in 5 live ranges Eg: vw [7..8] interferes with vx [4..8] but not vw [9..10]But no control flow!
Build Live Ranges, 1
Use dataflow information to build interference graph
Nodes = live ranges Add an edge for each pair of live ranges that overlap
Beware copy instructions: rx = ry does not create interference between rx and ry : can use same register if the ranges do not otherwise
interfere
Spring 2014 Jim Hogg - UW - CSE - P501 P-33
Build Live Ranges, 2
Construct the interference graph
Find live ranges – SSA
Build SSA form of IR Live range initially includes single SSA name' At a -function, form union of live ranges Either rewrite code to use live range names or keep a mapping
between SSA names and live-range names
Spring 2014 Jim Hogg - UW - CSE - P501 P-34
Simplify & Color Algorithm
Spring 2014 Jim Hogg - UW - CSE - P501 P-35
while graph cannot be colored remove easiest node (fewest neighbors = lowest degree) if degree of easiest node > k insert spill code remember that node into stack recalculate degree of each remaining node
Simplify
color the sole remaining nodewhile stack is non-empty pop node from stack color newly expanded graph
Color
Coalescing Live Ranges
Idea: if two live ranges are connected by a copy operation (rx = ry) do not otherwise interfere, then coalesce
Rewrite all references to rx to use ry
Remove the copy instruction
Then fix up interference graph
Spring 2014 Jim Hogg - UW - CSE - P501 P-36
Advantages of Coalescing?
Makes the code smaller, faster (no copy operation)
Shrinks number of live ranges
Reduces the degree of any live range that interfered with both live ranges rx and ry
But: coalescing two live ranges can prevent coalescing of others, so ordering matters
coalesce most frequently executed ranges first (eg: inner loops)
Can have a substantial payoff – do it!
Spring 2014 Jim Hogg - UW - CSE - P501 P-37
Overall Structure
Spring 2014 Jim Hogg - UW - CSE - P501 P-38
Find live
ranges
Build int.
graph
Coalesce
Spill Costs
Find Coloring
Insert Spills
No Spills
More Coalescing Possible
Spills
Real-Life Complications
Need to deal with irregularities in the register set
Some operations require dedicated registers Eg: idiv in x86 Eg: split address/data registers in M68k
Register conventions like function results Eg: use of registers across calls (return in EAX)
Different register classes Eg: integer, floating-point, SIMD
Overlapping registers Eg: x86 AL, AH, AX, EAX, RAX
Model by precoloring nodes, adding constraints in the graph, etc.
Spring 2014 Jim Hogg - UW - CSE - P501 P-39
Graph Representation
Recall: Register allocation is one of the most important optimizations Finding optimum solution for typical interference graph will take a
million years So we use heuristics to find a good solution a million times each day
Interference graph representation drives the time & space requirements for the allocator (may even dominate the entire compiler)
Not unknown to have ~5k nodes and ~1M edges
Spring 2014 Jim Hogg - UW - CSE - P501 P-40