CMPUT680 - Winter 2001

CMPUT 680 - Compiler Design and Optimization

1

CMPUT680 - Winter 2001

Register Minimization X Register Saturation

José Nelson Amaralhttp://www.cs.ualberta.ca/~amaral/courses/680


2

Reading List

Touati, Sid Ahmed Ali, “Register Saturation in Superscalar and VLIW Codes,” 10th International Conference on Compiler Construction, Genova, Italy, April 2001, pp. 213-228.

Touati, S.-A.-A., Thomasset, F., “Register Saturation in Data Dependence Graphs,” Research Report RR-3978, INRIA, July 2000.

Touati, S.-A.-A., “Optimal Register Saturation in Acyclic Superscalar and VLIW Codes,” Researchh Report, INRIA, Nov. 2000.


3

Minimum Register Instruction Sequence (MRIS)

Problem

Given the Data Dependence Graph G for abasic block, derive an instruction sequence S for G that is optimal in the sensethat its register requirement is minimum.


4

Intuition for Our Solution

a

b c d e

f g

h

i

Our intuition is to find sub-sets ofnodes that can definitely

share a register to inform theinstruction sequencing algorithm.

Data Dependence Graph


5

Instruction Lineages

a

b c d e

f g

h

i

An instruction lineage is a sequenceof instructions in which a singleregister is passed from instructionto instruction (except for the last).

How can we ensure thatinstructions a, b, f, and h will be able to share the same register?

L1 = [a, b, f, h, i)

a

b

f

h

Data Dependence Graph


6

Sequencing Edges

a

b c d e

f g

h

i

The lineage formation imposed ascheduling restriction in the DDG:the selected heir of a node must be the last node listed among itssiblings.

L1 = [a, b, f, h, i)

Thus the lineage formation insertssequencing edges in the DDG.

Augmented Data Dependence Graph


7

Node Height

a

b c d e

f g

h

i

L1 = [a, b, f, h, i)

If the introduction of sequencing edges was to produce a cycle in the DDG,it would be impossible to find a legalinstruction sequence.

Thus we use the height of the nodes,recomputed after each lineage formation, to select the heir. Tiesare broken arbitrarily.



8

Lineage Formation

a

b c d e

f g

h

i

L1 = [a, b, f, h, i)

For the next lineage, the heighestnodes not in a lineage are c, d, e,all with a height of 5.

L2 = [c, f)

c

L3 = [e, g, h)

e

g

L4 = [d, g)

d



9

Lineage Interference

L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)

Two lineages Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vm) definitely overlap if:

(i) u1 reaches vn, and (ii) v1 reaches um.

a

b c d e

f g

h

iAugmented Data

Dependence Graph


10

Lineage Interference Graph


a

b c d e

f g

h

i

L1

L3L2

L4



Which lineages does lineage L1definely overlap with?

How about lineages L2 and L4?


11

Lineage Fusion Condition

a

b c d e

f g

h

iAugmented Data

Dependence Graph


L1

L3L2

L4

Two lineagesLu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) can be fusedinto a single lineage if:

(i) u1 reaches vn, and (ii) v1 does not reach um.


Lineages


12


L1 = [a, b, f, h, I)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)a

b c d e

f g

h

iAugmented Data

Dependence Graph


L1

L3L2

L4

Lineages

Which lineages can be fused in the example?

d reaches f, and c does not reach g

Thus L4 can be fused with L2 to formL5 = [d, g) [c, f)


13

Lineage Fusion

L1 = {a, b, f, h, i}L2 = {c, f}L3 = {e, g, h}L4 = {d, g}a

b c d e

f g

h

iAugmented Data

Dependence Graph


L1

L3L2

L4

Lineages

When Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) are fused:

(1) a scheduling edge from um to v1

is introduced in the augmented DDG(2) Lu and Lv are removed from the LIG(3) a new lineage Lw = Lu Lv is inserted in LIG


14


L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)

a

b c d e

f g

h

iAugmented Data

Dependence Graph


L1

L3L5Lineages

How many colors we needto color the LIG?

Thus the fusion of L4 with L2 formL5 = [d, g) [c, f)


15



a

b c d e

f g

h

iAugmented Data

Dependence Graph


L1

L3L5Lineages

We need three colors.

Can we find an instruction sequence?


16

Sequencing by List Scheduling


RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph

Sequence


17



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a

Sequence


18



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d

Sequence


19



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e

Sequence


20



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g

Sequence


21



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c

Sequence


22



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b

Sequence


23



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f

Sequence


24



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f h

Sequence


25



RA

RB

RC

Registers

L1

L3L5


Lineages

a

b c d e

f g

h

iAugmented Data

Dependence Graph a d e g c b f h i

Sequence


26

Summary of Our Solution Method

A “good” construction algorithm for LIG (dynamic)

An effective heuristic method to calculate the HRB

An efficient scheduling method (do not backtrack)

Form Lineage Interference Graph (LIG)

Derive HRB

Extended list-scheduling guided by HRB

DDG

A good instructionsequence


27

Register Saturation (Touati)

Given a data depende graph G, the register saturation (RS) of G is the maximal register need for any scheduleof G.

Touati’s strategy is to compute the RS of the G and,if RS exceeds the number of available registers, to reducethe RS by introducing new arcs in G.

The intuition is that by using either (1) all available registersor (2) the maximal registers that G can use, instruction levelparallelism is maximized.


28

The HRB and the RS

Govind, Gao, Yang, Amaral, and Zhang had earlierproposed an alternative method: to find an heuristicregister bound (HRB) to be used as a guidance ina modified list scheduling. Their goal is to find aschedule that uses a minimum number of registers.

To compare both methods we will apply Touati’smethod to Govind et al.’s example, and Govind’smethod to Touati’s example.


29

Potencial Killers

To find the RS(G), we need to know which operationmust kill each value generated. Touati’s define the set of operations that are potential killers of the valuegenerated by an operation u G.

pkillG(u) = { v Cons(u) / v Cons(u) = {v} }

v is the set of all descendents of v, including v.w Cons(u) iff (w,u) G

Thus a node v is a potential killer of the value generated by a node u if and only if v consumes u and no descendent of v consumes u.


30

Potencial Killing Graph

The edges of the Potential Killing Graph of a DDG G, PK(G)=(V, EPK), are defined as follows:

EPK = {(u,v) / u VR v pkillG(u)}

VR is the set of operations that define a value,i.e., operations that need a register.


31

Govind’s Example: Data Dependency Graph

B3a

b c d e

f g

h

i

(a) t1 := ld(x);(b) t2 := t1 + 4;(c) t3 := t1 * 8;(d) t4 := t1 - 4;(e) t5 := t1 / 2;(f) t6 := t2 * t3;(g) t7 := t4 - t5;(h) t8 := t6 * t7;(i) st(y,t8);

DDG G


32

Govind’s Example: Potential Kill Graph

a

b c d e

f g

h

i

DDG G

pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}


33

Govind’s Example: Potential Kill Graph

a

b c d e

f g

h

i

DDG G

a

b c d e

f g

h

i

PK(G)* In this example the DDG G and the potential kill graph PK(G) are identical. In general that is not the case.


34

Choosing the Killer

If a node u has more than one potential killer, Touatidefines a killing function, k(u), that specifies which oneamong the potential killers of u will actually kill u.

A killing function imposes a scheduling order in the DDG:all other consumers of u , Cons(u), must be scheduled before k(u) is scheduled.

To represent these scheduling constraints, Touati defines an extended DAG, Gk, induced by the killingfunction k.


35

Govind’s Example: Killing Function

a

b c d e

f g

h

i

PK(G)


In this example, node a is theonly node with multiple potentialkillers.


36

Govind’s Example: Killing Function

Gk


If we choose k(a) = b, we obtainthe Gk on the left.

a

b c d e

f g

h

i


37

Selecting a Good Set of Killers...

If the killing function for multiple nodes with multiple potential killers is choosen arbitrarily,it might induce cycles in Gk.

A valid killing function is one that does notinduce cycles in Gk.


38

Avoiding Vengeance...

The descendents of k(u) cannot be simultaneouslyalive with u. Touati defines the Disjoint Value Graph,DVk(G) = (VR, EDV), by:

EDV = {(u,v) / u, v VR v Rk(u)}

An edge (u,v) in DVk(G) means that the live intervalof u is always before the live interval of v in any schedule of Gk.

A killer must kill before it has children, thus...


39

Govind’s Example: Disjoint Value Graph

Gk

k(a) = {b}k(b) = {f}k(c) = {f}k(d) = {g}

a

b c d e

f g

h

i

k(e) = {g}k(f) = {h}k(g) = {h}k(h) = {i}

a

b c d e

f g

h

i

DVk(G) * simplified by transitive reduction


40

Register Need and Maximal Antichains

The register need of any schedule of Gk is alwaysless than or equal to a maximal antichain in DVk(G).

An antichain in a graph G(E,V) is a set of nodes A suchthat there are no paths between the nodes in A:

A = {u, v V / (u,v) Ec (v,u) Ec}

Where Ec is the transitive closure of G: (u,v) Ec:(u,v) Ec iff a path p = (u, …, v) in G.


41

Govind’s Example: Maximal Antichain

a

b c d e

f g

h

i

DVk(G)

The maximal antichain in thisexample is:

AMk = {a, c, d, e}

Thus this graph, with thiskilling function can useat most 4 registers.


42

Register Saturating Scheduling

Touati proves that:

For every valid killing k(V) function, there is always a schedule that makes all the values in the maximal antichain of the disjoint value DAG DVk(G) simultaneously alive.


43

Saturating Killing Function

To find the register saturation of a DDG, we need tofind a killing function that maximizes the maximalantichain in DVk(G).

In other words, we need to find a killing functionthat maximizes the number of nodes that are not connected by a path in DVk(G).

Touati calls this the maximizing maximal antichain (MMA) problem. A solution to the MMA problem isa saturating killing function. MMA is NP-complete.


44

Heuristic to Compute Register Saturation

To compute the register saturation, Touati startsby decomposing the potential kill graph PK(G)into connected bipartite components.

A bipartite component, cb = (Scb, Tcb, Ecb), isa graph with a set of source nodes Scb, a setof target nodes Tcb, and a set of edges Ecb. cbmust obey the following conditions.

If e EPK e’ Ecb e, e’ share an endpoint, then e Ecb

e, e’ Ecb / target(e) = source (e’) /


45

Bipartite Decomposition of PK(G)

A bipartite decomposition of the potential killing graphPK(G) is a set of bipartite components such that forevery edge e PK(G), there is a bipartite componentcb in the decomposition such that e Ecb.

Touati proves that given a DDG G, there is only onebipartite decomposition of G.


46

Govind’s Example: Bipartite Decomposition

a

b c d e

f g

h

i

PK(G)

a

b c d e

b c d e

f g

f g

h

h

i

Bipartite Decomposition


47

Saturating Killing Set

Touati defines the Saturating Killing Set of a connectedbipartite component cb, SKS(cb), as a subset of thetarget nodes, Tcb’ Tcb such that:

(1) All the source nodes, Scb, are contained in the union of all predecessors of the nodes in Tcb’.

(2) Tcb’ contains a minimum number of nodes.

Computing the SKS is an NP-complete problem.


48

Govind’s Example: Saturating Killing Set

a

b c d e

b c d e

f g

f g

h

h

i

Bipartite Decomposition

In this example the computationof SKS is trivial. The only component with a non-unitarytarget set is the top one.

The selection of any single nodein the set Tcb = {b, c, d, e} covers the set Scb = {a}. Thus the selection can be arbitrary.


49

Govind’s Example

As we seen earlier with k(a) = b, the registersaturation in Govind’s example is 4. And a schedulethat has four values alive at the same time can befound.

Using the lineage method, Govind et al. found aschedule for their example that uses three registers.What does Touati’s method does if only three registersare available?


50

Reducing RS

Touati proposes an algorithm to reduce the registersaturation while trying not to increase the lengthof the critical path.

The algorithm starts by computing the maximal antichain AMk.Then it starts an interative process in which thefirst step is to construct the set Uk of alladmissible serializations between the saturatingvalues in AMk with their costs.


51

Admissible Serializations

A serialization u v means that the kill of umust always be carried out before the definitionof v.

If v is one of the potential killers of u, then toproduce the serialization u v we must add arcs fromall other potential killers of u to v. This way we ensure that the live ranges of u and v will not overlap.

If v is not a potential killer of u, then to produce the serialization u v we must add arcs fromall nodes u’ pkillG(u) to v, as long as there is no path from v to u’.


52

Cost of Serializations

The cost function of a serialization is defined as

(u v) = (1, 2)

1 predicts the reduction in the saturation valueproduced by the serialization, it is computed by:

1 = 1 - 2

1 is the number of saturating values serialized after u if this serialization is carried out.

2 is the number of descendents of u that can become simultaneously alive with u.

1 is the increase in the critical path.


53

Govind’s Example: Reducing RS

With the killling functionk(a) = {b}, the saturating values are:

AMk = {a, c, d, e}

pkillG(a) = {b, c, d, e}

Gk

a

b c d e

f g

h

i

For a serialization u v to beadmissible, the following conditionmust be true:

v’ pkill(u) (v < v’ )i.e., there are no paths from v toany potential killer of u.


54



AMk = {a, c, d, e}


Gk

a

b c d e

f g

h

i

Thus, there is no admissibleserialization from a to any ofthe other saturating values,because b pkillG(a) and there are paths fromc, d, and e to b in Gk


55



AMk = {a, c, d, e}


Gk

a

b c d e

f g

h

i

c d and c e are notadmissible serializations

either because f pkillG(c) and d < f, e < f


56



AMk = {a, c, d, e}


Gk

a

b c d e

f g

h

i

d e is not admissible because g pkillG(d) and e < g,

e d is not admissible because g pkillG(e) and d < g


57



AMk = {a, c, d, e}


Gk

a

b c d e

f g

h

iThus the admissible serializations

in this example are:d c, e c


58


Gk

a

b c d e

f g

h

i

In this example bothserializations will cause

the scheduling edge (g,c) to be added to the graph.

Thus their cost is equivalent.

Note that, for this example,reducing RS is equivalent tothe lineage fusion technique

in Govind et al. approach.


59

Govind’s Algorithm in Touati’s Example

Now we will apply the lineage based methodproposed by Govind et al. to the DDG presentedby Touati.

In the next slide we transcribe the code and theDDG as presented by Touati.


60

A Trivial Example

x

y

k

t

z

pkillG(x) = {k}

pkillG(y) = {z}

pkillG(z) = {k}

pkillG(k) = {z}

DDG

x

y

k

t

z

PKG


61

A Trivial Example (cont.)

x

y

k

t

z

pkillG(x) = {k}

pkillG(y) = {z}

pkillG(z) = {k}

pkillG(k) = {z}

DDG

x

y

k

t

z

PKG

There are no choicesto be made as eachnode has only one

potential killer.


62

A Trivial Example (cont.)

x

y

k

t

z

DDG

x

y

k

t

z

DV

The DV graph is identicalto the PKG in this case,

and the solution is trivial,the maximal antichain inthe DV graph is {x,y,z}


63

A Non-Trivial Example

a

f

d e

DDG

b c

pkillG(a) = {f}

pkillG(b) = {d,e}

pkillG(c) = {d,e}

pkillG(d) = {g}pkillG(e) = {f}

g

pkillG(f) = {g}


64


a

f

d e

DDG

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

DVk1k1={(b,d),(c,d)}

DVk2k2={(b,d),(c,e)}

DVk3k3={(b,e),(c,d)}

DVk4k4={(b,e),(c,e)}


65


a

f

d e

DDG

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

a

f

d e

b c

g

DVk1k1={(b,d),(c,d)}

DVk2k2={(b,d),(c,e)}

DVk3k3={(b,e),(c,d)}

DVk4k4={(b,e),(c,e)}


66

There are eight killing functions (DV Graphs)

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}

k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}


67

Maximal antichainsa

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

a

f

d e

b c

k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}

k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}


68

A More Non-Trivial Example

a

DDG

d e j k

b c g

f m

n

pkillG(a) = {b,c,g}pkillG(b) = {d,e}pkillG(c) = {e,j,k}pkillG(d) = {f}pkillG(e) = {m}pkillG(f) = {n}pkillG(g) = {d,j,k}pkillG(j) = {f}pkillG(k) = {m}

There are 3*2*3*3=18 killing functions


69


(a) fload [i1], fRa

(b) fload [i2], fRb

(c) fload [i3], fRc

(d) fmult fRa, fRb, fRd

(e) imultadd fRa, fRb, fRc, iRe

(g) ftoint fRc, iRg

(i) iadd iRg, 4, iRi

(f) fmultadd_setz fRb, iRi, fRc, fRf, gf

(h) fdiv fRd, iRe, fRh

(j) gf ? fadd_setbnz fRj, 1 , fRj, gj

(k) gf | gj ? fsub fRk, 1 , fRk

a b c

d e f

h k

g

i

j

fRc

iRg

iRi

gf

gj

iRe

Touati concentrates on theblue edges that represent flow

of floating point values.

fRd

gf


70


We will also concentrate onthe floating point value flow.Thus the simplified DDG isshown on the left.

Although the modified list schedulingrequires a souce and a sink node, the lineage formation processdoes not consider the source andthe sink node.

a b c

d e f

h k

g

i

j


71


a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heights


72


a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1

Step 1: Compute the heightsStep 2: First lineage formation

L1 = [a, e)


73


a b c

d e f

h k

g

i

j1 1

0

12 1

3 3 2

0

1


L1 = [a, e)Step 3: Second lineage formation

L2 = [b, f)


74


a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

1



L2 = [b, f)Recompute heights


75


a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

1



L2 = [b, f)Recompute heights

Step 4: Third lineage formationL3 = [c, f)


76


a b c

d e f

h k

g

i

j


1 10

23 1

4 4 3

0

2

Step 2: First lineage formationL1 = [a, e)

Step 3: Second lineage formationL2 = [b, f)

Recompute heightsStep 4: Third lineage formation

L3 = [c, f)Recompute heights


77


a b c

d e f

h k

g

i

j


1 10

23 1

4 4 3

0

2





Step 5: Fourth lineage formationL4 = [d, h)


78


a b c

d e f

h k

g

i

j


1 10

23 1

4 4 3

0

2





Step 5: Fourth lineage formationL4 = [d, h)


79


a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)

Lineage Source Nodes: S = {a, b, c, d}

Lineage End Nodes: S = {e, f, h}

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:


80


a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Because d can reach f, butc cannot reach h, we can fuselineages L4 and L3 to createa new lineage L5 = [d, h)[c,f).This fusion requires a sequencingedge from h to c.


81


a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Because there are no more 0’sin the Reach relation matrix,there is no more lineage fusion possible.


82


a b c

d e f

h k

g

i

j1 1

0

23 1

4 4 3

0

2

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).

e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1

Reach Relation:

Lineage Interference Graph:

L1

L2 L5

We need three colors:L1 = RAL2 = RBL3 = RC


83


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

Sequence


84


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence


85


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b


86


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d


87


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h


88


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c


89


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e


90


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e g


91


a b c

d e f

h k

g

i

j

L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)

L1

L2 L5

RA

RB

RC

Registers

a

Sequence

b d h c e g f


92

Comparing the Methods

Touati’s method allows the creation of schedulesthat uses from 7 to 3 registers (in his CC2001 paperhe reduced from 7 to 4) according to the numberof registers available for the basic block.

Govind et al. method will always create a schedule usingthree registers for this basic block, regardless of thenumber of registers available for the basic block.


93

Conjecture

If the scheduler in an out of order instruction issue processor is optimal and the register renaminghas an infinite number of hidden registers, bothmethods should be equivalent, and the lineage basedone is simpler.

With limited number of hidden registers for renaming,and a sub-optimal runtime scheduler, Touati’s methodis likely to produce better results because it makes better use of the available registers.


94

Research Questions

How well do the two methods compare in anactual superscalar processor such as the MIPS R12K?

Touati’s claim that his method will work well in VLIWmachines too. How would it compare with the lineagemethod in the IA-64?

The allocation of registers to basic block by the globalregister scheduler might affect Touati’s method significantly. How can his LRA be integrated with a GRA?


95

Summary of Our Solution Method

A “good” construction algorithm for LIG (dynamic)

An effective heuristic method to calculate the HRB

An efficient scheduling method (do not backtrack)

Form Lineage Interference Graph (LIG)

Derive HRB

Extended list-scheduling guided by HRB

DDG

A good instructionsequence

Documents

CMPUT680 - Winter 2001