95
CMPUT 680 - Compiler Des ign and Optimization 1 CMPUT680 - Winter 2001 Register Minimization X Register Saturation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680

CMPUT680 - Winter 2001

Embed Size (px)

DESCRIPTION

CMPUT680 - Winter 2001. Register Minimization X Register Saturation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. - PowerPoint PPT Presentation

Citation preview

Page 1: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

1

CMPUT680 - Winter 2001

Register Minimization X Register Saturation

José Nelson Amaralhttp://www.cs.ualberta.ca/~amaral/courses/680

Page 2: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

2

Reading List

Touati, Sid Ahmed Ali, “Register Saturation in Superscalar and VLIW Codes,” 10th International Conference on Compiler Construction, Genova, Italy, April 2001, pp. 213-228.

Touati, S.-A.-A., Thomasset, F., “Register Saturation in Data Dependence Graphs,” Research Report RR-3978, INRIA, July 2000.

Touati, S.-A.-A., “Optimal Register Saturation in Acyclic Superscalar and VLIW Codes,” Researchh Report, INRIA, Nov. 2000.

Page 3: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

3

Minimum Register Instruction Sequence (MRIS)

Problem

Given the Data Dependence Graph G for abasic block, derive an instruction sequence S for G that is optimal in the sensethat its register requirement is minimum.

Page 4: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

4

Intuition for Our Solution

a

b c d e

f g

h

i

Our intuition is to find sub-sets ofnodes that can definitely

share a register to inform theinstruction sequencing algorithm.

Data Dependence Graph

Page 5: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

5

Instruction Lineages

a

b c d e

f g

h

i

An instruction lineage is a sequenceof instructions in which a singleregister is passed from instructionto instruction (except for the last).

How can we ensure thatinstructions a, b, f, and h will be able to share the same register?

L1 = [a, b, f, h, i)

a

b

f

h

Data Dependence Graph

Page 6: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

6

Sequencing Edges

a

b c d e

f g

h

i

The lineage formation imposed ascheduling restriction in the DDG:the selected heir of a node must be the last node listed among itssiblings.

L1 = [a, b, f, h, i)

Thus the lineage formation insertssequencing edges in the DDG.

Augmented Data Dependence Graph

Page 7: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

7

Node Height

a

b c d e

f g

h

i

L1 = [a, b, f, h, i)

If the introduction of sequencing edges was to produce a cycle in the DDG,it would be impossible to find a legalinstruction sequence.

Thus we use the height of the nodes,recomputed after each lineage formation, to select the heir. Tiesare broken arbitrarily.

Augmented Data Dependence Graph

Page 8: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

8

Lineage Formation

a

b c d e

f g

h

i

L1 = [a, b, f, h, i)

For the next lineage, the heighestnodes not in a lineage are c, d, e,all with a height of 5.

L2 = [c, f)

c

L3 = [e, g, h)

e

g

L4 = [d, g)

d

Augmented Data Dependence Graph

Page 9: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

9

Lineage Interference

L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)

Two lineages Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vm) definitely overlap if:

(i) u1 reaches vn, and (ii) v1 reaches um.

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Page 10: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

10

Lineage Interference Graph

L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)

a

b c d e

f g

h

i

L1

L3L2

L4

Lineage Interference Graph

Augmented Data Dependence Graph

Which lineages does lineage L1definely overlap with?

How about lineages L2 and L4?

Page 11: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

11

Lineage Fusion Condition

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L2

L4

Two lineagesLu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) can be fusedinto a single lineage if:

(i) u1 reaches vn, and (ii) v1 does not reach um.

L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)

Lineages

Page 12: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

12

Lineage Fusion Condition

L1 = [a, b, f, h, I)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L2

L4

Lineages

Which lineages can be fused in the example?

d reaches f, and c does not reach g

Thus L4 can be fused with L2 to formL5 = [d, g) [c, f)

Page 13: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

13

Lineage Fusion

L1 = {a, b, f, h, i}L2 = {c, f}L3 = {e, g, h}L4 = {d, g}a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L2

L4

Lineages

When Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) are fused:

(1) a scheduling edge from um to v1

is introduced in the augmented DDG(2) Lu and Lv are removed from the LIG(3) a new lineage Lw = Lu Lv is inserted in LIG

Page 14: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

14

Lineage Fusion Condition

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L5Lineages

How many colors we needto color the LIG?

Thus the fusion of L4 with L2 formL5 = [d, g) [c, f)

Page 15: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

15

Lineage Fusion Condition

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Lineage Interference Graph

L1

L3L5Lineages

We need three colors.

Can we find an instruction sequence?

Page 16: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

16

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Sequence

Page 17: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

17

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a

Sequence

Page 18: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

18

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d

Sequence

Page 19: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

19

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e

Sequence

Page 20: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

20

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g

Sequence

Page 21: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

21

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c

Sequence

Page 22: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

22

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b

Sequence

Page 23: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

23

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f

Sequence

Page 24: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

24

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f h

Sequence

Page 25: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

25

Sequencing by List Scheduling

Lineage Interference Graph

RA

RB

RC

Registers

L1

L3L5

L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f h i

Sequence

Page 26: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

26

Summary of Our Solution Method

A “good” construction algorithm for LIG (dynamic)

An effective heuristic method to calculate the HRB

An efficient scheduling method (do not backtrack)

Form Lineage Interference Graph (LIG)

Derive HRB

Extended list-scheduling guided by HRB

DDG

A good instructionsequence

Page 27: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

27

Register Saturation (Touati)

Given a data depende graph G, the register saturation (RS) of G is the maximal register need for any scheduleof G.

Touati’s strategy is to compute the RS of the G and,if RS exceeds the number of available registers, to reducethe RS by introducing new arcs in G.

The intuition is that by using either (1) all available registersor (2) the maximal registers that G can use, instruction levelparallelism is maximized.

Page 28: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

28

The HRB and the RS

Govind, Gao, Yang, Amaral, and Zhang had earlierproposed an alternative method: to find an heuristicregister bound (HRB) to be used as a guidance ina modified list scheduling. Their goal is to find aschedule that uses a minimum number of registers.

To compare both methods we will apply Touati’smethod to Govind et al.’s example, and Govind’smethod to Touati’s example.

Page 29: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

29

Potencial Killers

To find the RS(G), we need to know which operationmust kill each value generated. Touati’s define the set of operations that are potential killers of the valuegenerated by an operation u G.

pkillG(u) = { v Cons(u) / v Cons(u) = {v} }

v is the set of all descendents of v, including v.w Cons(u) iff (w,u) G

Thus a node v is a potential killer of the value generated by a node u if and only if v consumes u and no descendent of v consumes u.

Page 30: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

30

Potencial Killing Graph

The edges of the Potential Killing Graph of a DDG G, PK(G)=(V, EPK), are defined as follows:

EPK = {(u,v) / u VR v pkillG(u)}

VR is the set of operations that define a value,i.e., operations that need a register.

Page 31: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

31

Govind’s Example: Data Dependency Graph

B3a

b c d e

f g

h

i

(a) t1 := ld(x);(b) t2 := t1 + 4;(c) t3 := t1 * 8;(d) t4 := t1 - 4;(e) t5 := t1 / 2;(f) t6 := t2 * t3;(g) t7 := t4 - t5;(h) t8 := t6 * t7;(i) st(y,t8);

DDG G

Page 32: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

32

Govind’s Example: Potential Kill Graph

a

b c d e

f g

h

i

DDG G

pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}

Page 33: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

33

Govind’s Example: Potential Kill Graph

a

b c d e

f g

h

i

DDG G

a

b c d e

f g

h

i

PK(G)* In this example the DDG G and the potential kill graph PK(G) are identical. In general that is not the case.

Page 34: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

34

Choosing the Killer

If a node u has more than one potential killer, Touatidefines a killing function, k(u), that specifies which oneamong the potential killers of u will actually kill u.

A killing function imposes a scheduling order in the DDG:all other consumers of u , Cons(u), must be scheduled before k(u) is scheduled.

To represent these scheduling constraints, Touati defines an extended DAG, Gk, induced by the killingfunction k.

Page 35: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

35

Govind’s Example: Killing Function

a

b c d e

f g

h

i

PK(G)

pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}

In this example, node a is theonly node with multiple potentialkillers.

Page 36: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

36

Govind’s Example: Killing Function

Gk

pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}

If we choose k(a) = b, we obtainthe Gk on the left.

a

b c d e

f g

h

i

Page 37: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

37

Selecting a Good Set of Killers...

If the killing function for multiple nodes with multiple potential killers is choosen arbitrarily,it might induce cycles in Gk.

A valid killing function is one that does notinduce cycles in Gk.

Page 38: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

38

Avoiding Vengeance...

The descendents of k(u) cannot be simultaneouslyalive with u. Touati defines the Disjoint Value Graph,DVk(G) = (VR, EDV), by:

EDV = {(u,v) / u, v VR v Rk(u)}

An edge (u,v) in DVk(G) means that the live intervalof u is always before the live interval of v in any schedule of Gk.

A killer must kill before it has children, thus...

Page 39: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

39

Govind’s Example: Disjoint Value Graph

Gk

k(a) = {b}k(b) = {f}k(c) = {f}k(d) = {g}

a

b c d e

f g

h

i

k(e) = {g}k(f) = {h}k(g) = {h}k(h) = {i}

a

b c d e

f g

h

i

DVk(G) * simplified by transitive reduction

Page 40: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

40

Register Need and Maximal Antichains

The register need of any schedule of Gk is alwaysless than or equal to a maximal antichain in DVk(G).

An antichain in a graph G(E,V) is a set of nodes A suchthat there are no paths between the nodes in A:

A = {u, v V / (u,v) Ec (v,u) Ec}

Where Ec is the transitive closure of G: (u,v) Ec:(u,v) Ec iff a path p = (u, …, v) in G.

Page 41: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

41

Govind’s Example: Maximal Antichain

a

b c d e

f g

h

i

DVk(G)

The maximal antichain in thisexample is:

AMk = {a, c, d, e}

Thus this graph, with thiskilling function can useat most 4 registers.

Page 42: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

42

Register Saturating Scheduling

Touati proves that:

For every valid killing k(V) function, there is always a schedule that makes all the values in the maximal antichain of the disjoint value DAG DVk(G) simultaneously alive.

Page 43: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

43

Saturating Killing Function

To find the register saturation of a DDG, we need tofind a killing function that maximizes the maximalantichain in DVk(G).

In other words, we need to find a killing functionthat maximizes the number of nodes that are not connected by a path in DVk(G).

Touati calls this the maximizing maximal antichain (MMA) problem. A solution to the MMA problem isa saturating killing function. MMA is NP-complete.

Page 44: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

44

Heuristic to Compute Register Saturation

To compute the register saturation, Touati startsby decomposing the potential kill graph PK(G)into connected bipartite components.

A bipartite component, cb = (Scb, Tcb, Ecb), isa graph with a set of source nodes Scb, a setof target nodes Tcb, and a set of edges Ecb. cbmust obey the following conditions.

If e EPK e’ Ecb e, e’ share an endpoint, then e Ecb

e, e’ Ecb / target(e) = source (e’) /

Page 45: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

45

Bipartite Decomposition of PK(G)

A bipartite decomposition of the potential killing graphPK(G) is a set of bipartite components such that forevery edge e PK(G), there is a bipartite componentcb in the decomposition such that e Ecb.

Touati proves that given a DDG G, there is only onebipartite decomposition of G.

Page 46: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

46

Govind’s Example: Bipartite Decomposition

a

b c d e

f g

h

i

PK(G)

a

b c d e

b c d e

f g

f g

h

h

i

Bipartite Decomposition

Page 47: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

47

Saturating Killing Set

Touati defines the Saturating Killing Set of a connectedbipartite component cb, SKS(cb), as a subset of thetarget nodes, Tcb’ Tcb such that:

(1) All the source nodes, Scb, are contained in the union of all predecessors of the nodes in Tcb’.

(2) Tcb’ contains a minimum number of nodes.

Computing the SKS is an NP-complete problem.

Page 48: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

48

Govind’s Example: Saturating Killing Set

a

b c d e

b c d e

f g

f g

h

h

i

Bipartite Decomposition

In this example the computationof SKS is trivial. The only component with a non-unitarytarget set is the top one.

The selection of any single nodein the set Tcb = {b, c, d, e} covers the set Scb = {a}. Thus the selection can be arbitrary.

Page 49: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

49

Govind’s Example

As we seen earlier with k(a) = b, the registersaturation in Govind’s example is 4. And a schedulethat has four values alive at the same time can befound.

Using the lineage method, Govind et al. found aschedule for their example that uses three registers.What does Touati’s method does if only three registersare available?

Page 50: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

50

Reducing RS

Touati proposes an algorithm to reduce the registersaturation while trying not to increase the lengthof the critical path.

The algorithm starts by computing the maximal antichain AMk.Then it starts an interative process in which thefirst step is to construct the set Uk of alladmissible serializations between the saturatingvalues in AMk with their costs.

Page 51: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

51

Admissible Serializations

A serialization u v means that the kill of umust always be carried out before the definitionof v.

If v is one of the potential killers of u, then toproduce the serialization u v we must add arcs fromall other potential killers of u to v. This way we ensure that the live ranges of u and v will not overlap.

If v is not a potential killer of u, then to produce the serialization u v we must add arcs fromall nodes u’ pkillG(u) to v, as long as there is no path from v to u’.

Page 52: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

52

Cost of Serializations

The cost function of a serialization is defined as

(u v) = (1, 2)

1 predicts the reduction in the saturation valueproduced by the serialization, it is computed by:

1 = 1 - 2

1 is the number of saturating values serialized after u if this serialization is carried out.

2 is the number of descendents of u that can become simultaneously alive with u.

1 is the increase in the critical path.

Page 53: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

53

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

For a serialization u v to beadmissible, the following conditionmust be true:

v’ pkill(u) (v < v’ )i.e., there are no paths from v toany potential killer of u.

Page 54: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

54

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

Thus, there is no admissibleserialization from a to any ofthe other saturating values,because b pkillG(a) and there are paths fromc, d, and e to b in Gk

Page 55: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

55

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

c d and c e are notadmissible serializations

either because f pkillG(c) and d < f, e < f

Page 56: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

56

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

d e is not admissible because g pkillG(d) and e < g,

e d is not admissible because g pkillG(e) and d < g

Page 57: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

57

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

iThus the admissible serializations

in this example are:d c, e c

Page 58: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

58

Govind’s Example: Reducing RS

Gk

a

b c d e

f g

h

i

In this example bothserializations will cause

the scheduling edge (g,c) to be added to the graph.

Thus their cost is equivalent.

Note that, for this example,reducing RS is equivalent tothe lineage fusion technique

in Govind et al. approach.

Page 59: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

59

Govind’s Algorithm in Touati’s Example

Now we will apply the lineage based methodproposed by Govind et al. to the DDG presentedby Touati.

In the next slide we transcribe the code and theDDG as presented by Touati.

Page 60: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

60

A Trivial Example

x

y

k

t

z

pkillG(x) = {k}

pkillG(y) = {z}

pkillG(z) = {k}

pkillG(k) = {z}

DDG

x

y

k

t

z

PKG

Page 61: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

61

A Trivial Example (cont.)

x

y

k

t

z

pkillG(x) = {k}

pkillG(y) = {z}

pkillG(z) = {k}

pkillG(k) = {z}

DDG

x

y

k

t

z

PKG

There are no choicesto be made as eachnode has only one

potential killer.

Page 62: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

62

A Trivial Example (cont.)

x

y

k

t

z

DDG

x

y

k

t

z

DV

The DV graph is identicalto the PKG in this case,

and the solution is trivial,the maximal antichain inthe DV graph is {x,y,z}

Page 63: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

63

A Non-Trivial Example

a

f

d e

DDG

b c

pkillG(a) = {f}

pkillG(b) = {d,e}

pkillG(c) = {d,e}

pkillG(d) = {g}pkillG(e) = {f}

g

pkillG(f) = {g}

Page 64: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

64

A Non-Trivial Example

a

f

d e

DDG

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

DVk1k1={(b,d),(c,d)}

DVk2k2={(b,d),(c,e)}

DVk3k3={(b,e),(c,d)}

DVk4k4={(b,e),(c,e)}

Page 65: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

65

A Non-Trivial Example

a

f

d e

DDG

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

DVk1k1={(b,d),(c,d)}

DVk2k2={(b,d),(c,e)}

DVk3k3={(b,e),(c,d)}

DVk4k4={(b,e),(c,e)}

Page 66: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

66

There are eight killing functions (DV Graphs)

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}

k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}

Page 67: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

67

Maximal antichainsa

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}

k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}

Page 68: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

68

A More Non-Trivial Example

a

DDG

d e j k

b c g

f m

n

pkillG(a) = {b,c,g}pkillG(b) = {d,e}pkillG(c) = {e,j,k}pkillG(d) = {f}pkillG(e) = {m}pkillG(f) = {n}pkillG(g) = {d,j,k}pkillG(j) = {f}pkillG(k) = {m}

There are 3*2*3*3=18 killing functions

Page 69: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

69

Govind’s Algorithm in Touati’s Example

(a) fload [i1], fRa

(b) fload [i2], fRb

(c) fload [i3], fRc

(d) fmult fRa, fRb, fRd

(e) imultadd fRa, fRb, fRc, iRe

(g) ftoint fRc, iRg

(i) iadd iRg, 4, iRi

(f) fmultadd_setz fRb, iRi, fRc, fRf, gf

(h) fdiv fRd, iRe, fRh

(j) gf ? fadd_setbnz fRj, 1 , fRj, gj

(k) gf | gj ? fsub fRk, 1 , fRk

a b c

d e f

h k

g

i

j

fRc

iRg

iRi

gf

gj

iRe

Touati concentrates on theblue edges that represent flow

of floating point values.

fRd

gf

Page 70: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

70

Govind’s Algorithm in Touati’s Example

We will also concentrate onthe floating point value flow.Thus the simplified DDG isshown on the left.

Although the modified list schedulingrequires a souce and a sink node, the lineage formation processdoes not consider the source andthe sink node.

a b c

d e f

h k

g

i

j

Page 71: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

71

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heights

Page 72: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

72

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)

Page 73: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

73

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)Step 3: Second lineage formation

L2 = [b, f)

Page 74: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

74

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)Step 3: Second lineage formation

L2 = [b, f)Recompute heights

Page 75: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

75

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)Step 3: Second lineage formation

L2 = [b, f)Recompute heights

Step 4: Third lineage formationL3 = [c, f)

Page 76: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

76

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

Step 1: Compute the heights

1 10

23 1

4 4 3

0

2

Step 2: First lineage formationL1 = [a, e)

Step 3: Second lineage formationL2 = [b, f)

Recompute heightsStep 4: Third lineage formation

L3 = [c, f)Recompute heights

Page 77: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

77

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

Step 1: Compute the heights

1 10

23 1

4 4 3

0

2

Step 2: First lineage formationL1 = [a, e)

Step 3: Second lineage formationL2 = [b, f)

Recompute heightsStep 4: Third lineage formation

L3 = [c, f)Recompute heights

Step 5: Fourth lineage formationL4 = [d, h)

Page 78: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

78

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

Step 1: Compute the heights

1 10

23 1

4 4 3

0

2

Step 2: First lineage formationL1 = [a, e)

Step 3: Second lineage formationL2 = [b, f)

Recompute heightsStep 4: Third lineage formation

L3 = [c, f)Recompute heights

Step 5: Fourth lineage formationL4 = [d, h)

Page 79: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

79

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)

Lineage Source Nodes: S = {a, b, c, d}

Lineage End Nodes: S = {e, f, h}

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Page 80: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

80

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Because d can reach f, butc cannot reach h, we can fuselineages L4 and L3 to createa new lineage L5 = [d, h)[c,f).This fusion requires a sequencingedge from h to c.

Page 81: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

81

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Because there are no more 0’sin the Reach relation matrix,there is no more lineage fusion possible.

Page 82: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

82

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Lineage Interference Graph:

L1

L2 L5

We need three colors:L1 = RAL2 = RBL3 = RC

Page 83: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

83

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

Sequence

Page 84: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

84

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

Page 85: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

85

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b

Page 86: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

86

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d

Page 87: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

87

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h

Page 88: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

88

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c

Page 89: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

89

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e

Page 90: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

90

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e g

Page 91: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

91

Govind’s Algorithm in Touati’s Example

a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e g f

Page 92: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

92

Comparing the Methods

Touati’s method allows the creation of schedulesthat uses from 7 to 3 registers (in his CC2001 paperhe reduced from 7 to 4) according to the numberof registers available for the basic block.

Govind et al. method will always create a schedule usingthree registers for this basic block, regardless of thenumber of registers available for the basic block.

Page 93: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

93

Conjecture

If the scheduler in an out of order instruction issue processor is optimal and the register renaminghas an infinite number of hidden registers, bothmethods should be equivalent, and the lineage basedone is simpler.

With limited number of hidden registers for renaming,and a sub-optimal runtime scheduler, Touati’s methodis likely to produce better results because it makes better use of the available registers.

Page 94: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

94

Research Questions

How well do the two methods compare in anactual superscalar processor such as the MIPS R12K?

Touati’s claim that his method will work well in VLIWmachines too. How would it compare with the lineagemethod in the IA-64?

The allocation of registers to basic block by the globalregister scheduler might affect Touati’s method significantly. How can his LRA be integrated with a GRA?

Page 95: CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

95

Summary of Our Solution Method

A “good” construction algorithm for LIG (dynamic)

An effective heuristic method to calculate the HRB

An efficient scheduling method (do not backtrack)

Form Lineage Interference Graph (LIG)

Derive HRB

Extended list-scheduling guided by HRB

DDG

A good instructionsequence