40
Spring 2014 Jim Hogg - UW - CSE - P501 P-1 CSE P501 – Compiler Construction Register allocation constraints Local allocation Fast, but poorer code Global allocation Register coloring

CSE P501 – Compiler Construction

  • Upload
    alesia

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

CSE P501 – Compiler Construction. Register allocation constraints Local allocation Fast, but poorer code Global allocation Register coloring. k. IR typically assumes infinite number of virtual registers Real machine has k physical registers available x86: k = 8 x64: k = 16 Goals - PowerPoint PPT Presentation

Citation preview

Page 1: CSE P501 – Compiler Construction

Spring 2014 Jim Hogg - UW - CSE - P501 P-1

CSE P501 – Compiler Construction

Register allocation constraints

Local allocation Fast, but poorer code

Global allocation Register coloring

Page 2: CSE P501 – Compiler Construction

Spring 2014 Jim Hogg - UW - CSE - P501 P-2

k

IR typically assumes infinite number of virtual registers

Real machine has k physical registers available x86: k = 8 x64: k = 16

Goals Produce correct code that uses k or less registers Minimize added loads and stores Minimize space needed for spilled values Do this efficiently – O(n), O(n log n), maybe O(n2)

Page 3: CSE P501 – Compiler Construction

Spring 2014 Jim Hogg - UW - CSE - P501 P-3

Register Allocation

At each point in the code, pick which values to keep in registers

Insert code to move values between registers and memory

No additional transformations – scheduling already ran

But will usually re-run scheduling afterwards (to include the effects of spilling)

Minimize inserted save/restores ("spill" code)

Page 4: CSE P501 – Compiler Construction

Spring 2014 Jim Hogg - UW - CSE - P501 P-4

Allocation vs Assignment

Allocation: which values to keep in registers?

Assignment: which specific registers?

Compiler must do both

Page 5: CSE P501 – Compiler Construction

Spring 2014 Jim Hogg - UW - CSE - P501 P-5

Applied to basic blocks (ie: "local")

Produces good register usage inside a block But can have inefficiencies at boundaries between

blocks

Two variations: top-down, bottom-up

Local Register Allocation

Page 6: CSE P501 – Compiler Construction

"Top-down" Local Allocation

We assume a RISC target machine ("load/store architecture") load operands into two registers (if not already there) perform instruction (eg: add, lshift) store result back to memory (if required) ie: operands for instructions MUST be in registers, rather than memory (note, by contrast, that x86 allows operands directly from memory)

Principle: keep most heavily used values in registers Priority = # of times virtual-register used (destination and/or source) in block

If more virtual registers than physical (the common case) Reserve some registers for values allocated to memory

Need enough to address and load two operands and store result (R1 = R2 + R3) Other registers dedicated to “hot” values

But are tied up for entire block with particular value, even if only needed for part of the block

Spring 2014 Jim Hogg - UW - CSE - P501 P-6

Page 7: CSE P501 – Compiler Construction

"Bottom-up" Local Allocation, 1

Keep a list of free physical registers initially all physical registers at beginning of block except frame-pointer, stack-pointer - pre-allocated

Scan code, instruction-by-instruction Allocate a register when one is needed

if one is free, allocate it otherwise, pick a victim - eg: VR3 using R7 save contents of R7 allocate R7 to requesting virtual register

Emit instructions to use physical register Free physical register as soon as possible

In x = y op z, free y and z if no longer needed before allocating x

Spring 2014 Jim Hogg - UW - CSE - P501 P-7

Page 8: CSE P501 – Compiler Construction

Bottom-up Local Allocation, 2

How to pick a victim? Scan ahead thru code Look for next use of each currently-allocated-to virtual

register Pick the virtual register whose next use is furthest off.

Why? Save the victim register's to memory (restore later)

Spring 2014 Jim Hogg - UW - CSE - P501 P-8

Virtual RegisterVR

Physical Register VR Next Use

Instruction14 3 523 2 82 4 55 1

Example: CPU with 4 available registers

Page 9: CSE P501 – Compiler Construction

Local "bottom-up" Register Allocation, -1

1. ; load v2 from memory2. ; load v3 from memory3. v1 = v2 + v34. ; load v5, v6 from

memory5. v4 = v5 - v66. v7 = v2 - 297. ; load v9 from memory8. v8 = - v99. v10 = v6 * v4 10. v11 = v10 - v3

Spring 2014 Jim Hogg - UW - CSE - P501 P-9

• Still in LIR. So lots (too many!) virtual registers required (v2, etc).• Grey instructions (1,2,4,7) load operands from memory into virtual

registers.• We will ignore these going forward. Focus on mapping virtual to

physical.

Page 10: CSE P501 – Compiler Construction

Local "bottom-up" Register Allocation, 0

1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3

Spring 2014 Jim Hogg - UW - CSE - P501 P-10

v1 1v2 1v3 1v4 2v5 2v6 2v7 3v8 4v9 4v10 5v11 6

vReg NextRefR1 -

R2 -R3 -R4 -

pReg vReg

Page 11: CSE P501 – Compiler Construction

Local "bottom-up" Register Allocation, 1

1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3

Spring 2014 Jim Hogg - UW - CSE - P501 P-11

v1 1 v2 1 3v3 1 6v4 2v5 2v6 2v7 3v8 4v9 4v10 5v11 6

vReg NextRefR1 v2

R2 v3R3 v1R4 -

pReg vReg

R3 = R1 + R2

Page 12: CSE P501 – Compiler Construction

Local "bottom-up" Register Allocation, 2

1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3

Spring 2014 Jim Hogg - UW - CSE - P501 P-12

v1 v2 3v3 6v4 2 5v5 2 v6 2 5v7 3v8 4v9 4v10 5v11 6

vReg NextRefR1 v2

R2 v3 v4R3 v1 v6R4 v5

pReg vReg

R3 = R1 + R2; spill R3; spill R2? - no - still cleanR2 = R4 - R3

Page 13: CSE P501 – Compiler Construction

Local "bottom-up" Register Allocation, 3

1. v1 = v2 + v32. v4 = v5 - v63. v7 = v2 - 294. v8 = - v95. v10 = v6 * v4 6. v11 = v10 - v3

Spring 2014 Jim Hogg - UW - CSE - P501 P-13

v1 v2 3 v3 6v4 5v5 v6 5v7 3 v8 4v9 4v10 5v11 6

vReg NextRefR1 v2

R2 v4R3 v6R4 v5 v7

pReg vReg

And so on . . .

R3 = R1 + R2; spill R3; spill R2? - no!R2 = R4 - R3; spill R4? - no!R4 = R1 - 29

Page 14: CSE P501 – Compiler Construction

Spring 2014 Jim Hogg - UW - CSE - P501 P-14

Bottom-Up Allocator

Invented about once per decade Sheldon Best, 1955, for Fortran I Laslo Belady, 1965, for analyzing paging algorithms William Harrison, 1975, ECS compiler work Chris Fraser, 1989, LCC compiler Vincenzo Liberatore, 1997, Rutgers

Will be reinvented again, no doubt

Many arguments for optimality of this

Page 15: CSE P501 – Compiler Construction

Standard technique is graph coloring

Use control-graph and dataflow-graph to derive interference graph

Nodes are live ranges (not registers!) Edge between nodes when live at same time Connected nodes cannot use same register

Then color the nodes in the graph Two nodes connected by an edge may not have same color (ie:

be allocated to same physical register) If more than k colors are needed, insert spill code

Spring 2014 Jim Hogg - UW - CSE - P501 P-15

Global Register Allocation

Page 16: CSE P501 – Compiler Construction

Graph Coloring, 1

Spring 2014 Jim Hogg - UW - CSE - P501 P-16

1

32 4

5

• Assign a color to each node, such that

• No adjacent nodes have same color

Page 17: CSE P501 – Compiler Construction

Graph Coloring, 2

Spring 2014 Jim Hogg - UW - CSE - P501 P-17

1

32 4

5

• 2 colors are enough for this graph• called a 2-coloring• this graph's chromatic number = 2

Page 18: CSE P501 – Compiler Construction

Graph Coloring, 3

Spring 2014 Jim Hogg - UW - CSE - P501 P-18

1

32 4

5

1

32 4

5

small changes forces a 3rd color

Page 19: CSE P501 – Compiler Construction

Live Ranges1 a = ...2 b = ...3 c = ...4 a = a + b + c5 if a < 10 then6 d = c + 97 print(c)8 elseif a < 20 then9 e = 1010 d = e + a11 print(e)12 else13 f = 1214 d = f + a15 print(f)16 endif17 print(d)

Spring 2014 Jim Hogg - UW - CSE - P501 P-19

• Live Range of a variable is between a def and all subsequent possible uses

• if a < 10, then a is live lines 1..5• if a < 20, then a is live lines 1..10• otherwise, a is live lines 1..14

• Mmm - clearly nonsense! We need a flowgraph for a correct specification

• Note: line 14: a goes dead on RHS; d comes live on LHS. So death and birth happen 'half-way thru' the instruction

• This example is simplified: variables are never re-def'd. And there are no temps

Page 20: CSE P501 – Compiler Construction

Live Ranges in Flowgraphsa = ...b = ...c = ...a = a + b + cif a < 10 then d = c + 9 print(c)elseif a < 20 then e = 10 d = e + a print(e)else f = 12 d = f + a print(f)endifprint(d)

Spring 2014 Jim Hogg - UW - CSE - P501 P-20

a = ...b = ...c = ...a = a + b + c

a < 10

a < 20d = c + 8print(c)

e = 10d = e + aprint(e)

f = 12d = f + aprint(f)

print(d)

ab

c

Page 21: CSE P501 – Compiler Construction

Register Interference Graph

Spring 2014 Jim Hogg - UW - CSE - P501 P-21

f

e

ba

d c

• If live range of 2 variables overlap, or interfere, they cannot use same register

• Eg: a and b cannot both be stored in register R5• Eg: c and e can be stored in register R5 (at different, non-overlapping

times)

• How many registers will we need? - Guesses?

Page 22: CSE P501 – Compiler Construction

Interference Graph - Coloring

Spring 2014 Jim Hogg - UW - CSE - P501 P-22

f

e

ba

d c

• Problem: color this interference graph, where each color represents a different physical register

• Solution: simplify graph: recursively remove easiest nodes (lowest degree)

NodeDegreea 3

b 2c 3d 3e 2f 2

Page 23: CSE P501 – Compiler Construction

Interference Graph - Remove b

Spring 2014 Jim Hogg - UW - CSE - P501 P-23

f

e

ba

d c

NodeDegreea 2

b 2c 2d 3e 2f 2

Removed: b

Page 24: CSE P501 – Compiler Construction

Interference Graph - Remove a

Spring 2014 Jim Hogg - UW - CSE - P501 P-24

f

e

ba

d c

NodeDegreea 2

b 2c 1d 3e 1f 1

Removed: b a

Page 25: CSE P501 – Compiler Construction

Interference Graph - Remove c

Spring 2014 Jim Hogg - UW - CSE - P501 P-25

f

e

ba

d c

NodeDegreea 2

b 2c 1d 2e 1f 1

Removed: b a c

Page 26: CSE P501 – Compiler Construction

Interference Graph - Remove e

Spring 2014 Jim Hogg - UW - CSE - P501 P-26

f

e

ba

d c

NodeDegreea 2

b 2c 1d 2e 1f 1

Removed: b a c e

Page 27: CSE P501 – Compiler Construction

Interference Graph - Color!

Spring 2014 Jim Hogg - UW - CSE - P501 P-27

f

e

ba

d c

Removed: b a c e

Page 28: CSE P501 – Compiler Construction

Interference Graph - Add e

Spring 2014 Jim Hogg - UW - CSE - P501 P-28

f

e

ba

d c

Removed: b a c

Page 29: CSE P501 – Compiler Construction

Interference Graph - Add c

Spring 2014 Jim Hogg - UW - CSE - P501 P-29

f

e

ba

d c

Removed: b a

Page 30: CSE P501 – Compiler Construction

Interference Graph - Add a

Spring 2014 Jim Hogg - UW - CSE - P501 P-30

f

e

ba

d c

Removed: b

Page 31: CSE P501 – Compiler Construction

Interference Graph - Add b

Spring 2014 Jim Hogg - UW - CSE - P501 P-31

f

e

ba

d c

• 3 colors are enough• eg: red = R1, green = R2, blue = R3

• If less registers available - need to spill

Page 32: CSE P501 – Compiler Construction

Live Ranges: More Realistic Example1. varp = ...2. vw = [varp + 0]3. v2 = 24. vx = [varp + @x]5. vy = [varp + @y]6. vz = [varp + @z]7. vw = vw * v2

8. vw = vw * vx

9. vw = vw * vy

10. vw = vw * vz

11. [varp + 0] = vw

varp 1..11vw 2..7vw 7..8vw 8..9vw 9..10vw 10..11v2 3..7vx 4..8vy 5..9vz 6..10

P-32

RegisterInterval

In this example, virtual registers are re-def'd. Eg: vw participates in 5 live ranges Eg: vw [7..8] interferes with vx [4..8] but not vw [9..10]But no control flow!

Page 33: CSE P501 – Compiler Construction

Build Live Ranges, 1

Use dataflow information to build interference graph

Nodes = live ranges Add an edge for each pair of live ranges that overlap

Beware copy instructions: rx = ry does not create interference between rx and ry : can use same register if the ranges do not otherwise

interfere

Spring 2014 Jim Hogg - UW - CSE - P501 P-33

Page 34: CSE P501 – Compiler Construction

Build Live Ranges, 2

Construct the interference graph

Find live ranges – SSA

Build SSA form of IR Live range initially includes single SSA name' At a -function, form union of live ranges Either rewrite code to use live range names or keep a mapping

between SSA names and live-range names

Spring 2014 Jim Hogg - UW - CSE - P501 P-34

Page 35: CSE P501 – Compiler Construction

Simplify & Color Algorithm

Spring 2014 Jim Hogg - UW - CSE - P501 P-35

while graph cannot be colored remove easiest node (fewest neighbors = lowest degree) if degree of easiest node > k insert spill code remember that node into stack recalculate degree of each remaining node

Simplify

color the sole remaining nodewhile stack is non-empty pop node from stack color newly expanded graph

Color

Page 36: CSE P501 – Compiler Construction

Coalescing Live Ranges

Idea: if two live ranges are connected by a copy operation (rx = ry) do not otherwise interfere, then coalesce

Rewrite all references to rx to use ry Remove the copy instruction

Then fix up interference graph

Spring 2014 Jim Hogg - UW - CSE - P501 P-36

Page 37: CSE P501 – Compiler Construction

Advantages of Coalescing?

Makes the code smaller, faster (no copy operation)

Shrinks number of live ranges

Reduces the degree of any live range that interfered with both live ranges rx and ry

But: coalescing two live ranges can prevent coalescing of others, so ordering matters

coalesce most frequently executed ranges first (eg: inner loops)

Can have a substantial payoff – do it!

Spring 2014 Jim Hogg - UW - CSE - P501 P-37

Page 38: CSE P501 – Compiler Construction

Overall Structure

Spring 2014 Jim Hogg - UW - CSE - P501 P-38

Find live

ranges

Build int.

graphCoalesc

eSpill

CostsFind

Coloring

Insert Spills

No Spills

More Coalescing Possible

Spills

Page 39: CSE P501 – Compiler Construction

Real-Life Complications

Need to deal with irregularities in the register set

Some operations require dedicated registers Eg: idiv in x86 Eg: split address/data registers in M68k

Register conventions like function results Eg: use of registers across calls (return in EAX)

Different register classes Eg: integer, floating-point, SIMD

Overlapping registers Eg: x86 AL, AH, AX, EAX, RAX

Model by precoloring nodes, adding constraints in the graph, etc.

Spring 2014 Jim Hogg - UW - CSE - P501 P-39

Page 40: CSE P501 – Compiler Construction

Graph Representation

Recall: Register allocation is one of the most important optimizations Finding optimum solution for typical interference graph will take a

million years So we use heuristics to find a good solution a million times each day

Interference graph representation drives the time & space requirements for the allocator (may even dominate the entire compiler)

Not unknown to have ~5k nodes and ~1M edges

Spring 2014 Jim Hogg - UW - CSE - P501 P-40