Upload
hedy-hoover
View
18
Download
2
Embed Size (px)
DESCRIPTION
CMPUT680 - Winter 2001. Register Minimization X Register Saturation José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. - PowerPoint PPT Presentation
Citation preview
CMPUT 680 - Compiler Design and Optimization
1
CMPUT680 - Winter 2001
Register Minimization X Register Saturation
José Nelson Amaralhttp://www.cs.ualberta.ca/~amaral/courses/680
CMPUT 680 - Compiler Design and Optimization
2
Reading List
Touati, Sid Ahmed Ali, “Register Saturation in Superscalar and VLIW Codes,” 10th International Conference on Compiler Construction, Genova, Italy, April 2001, pp. 213-228.
Touati, S.-A.-A., Thomasset, F., “Register Saturation in Data Dependence Graphs,” Research Report RR-3978, INRIA, July 2000.
Touati, S.-A.-A., “Optimal Register Saturation in Acyclic Superscalar and VLIW Codes,” Researchh Report, INRIA, Nov. 2000.
CMPUT 680 - Compiler Design and Optimization
3
Minimum Register Instruction Sequence (MRIS)
Problem
Given the Data Dependence Graph G for abasic block, derive an instruction sequence S for G that is optimal in the sensethat its register requirement is minimum.
CMPUT 680 - Compiler Design and Optimization
4
Intuition for Our Solution
a
b c d e
f g
h
i
Our intuition is to find sub-sets ofnodes that can definitely
share a register to inform theinstruction sequencing algorithm.
Data Dependence Graph
CMPUT 680 - Compiler Design and Optimization
5
Instruction Lineages
a
b c d e
f g
h
i
An instruction lineage is a sequenceof instructions in which a singleregister is passed from instructionto instruction (except for the last).
How can we ensure thatinstructions a, b, f, and h will be able to share the same register?
L1 = [a, b, f, h, i)
a
b
f
h
Data Dependence Graph
CMPUT 680 - Compiler Design and Optimization
6
Sequencing Edges
a
b c d e
f g
h
i
The lineage formation imposed ascheduling restriction in the DDG:the selected heir of a node must be the last node listed among itssiblings.
L1 = [a, b, f, h, i)
Thus the lineage formation insertssequencing edges in the DDG.
Augmented Data Dependence Graph
CMPUT 680 - Compiler Design and Optimization
7
Node Height
a
b c d e
f g
h
i
L1 = [a, b, f, h, i)
If the introduction of sequencing edges was to produce a cycle in the DDG,it would be impossible to find a legalinstruction sequence.
Thus we use the height of the nodes,recomputed after each lineage formation, to select the heir. Tiesare broken arbitrarily.
Augmented Data Dependence Graph
CMPUT 680 - Compiler Design and Optimization
8
Lineage Formation
a
b c d e
f g
h
i
L1 = [a, b, f, h, i)
For the next lineage, the heighestnodes not in a lineage are c, d, e,all with a height of 5.
L2 = [c, f)
c
L3 = [e, g, h)
e
g
L4 = [d, g)
d
Augmented Data Dependence Graph
CMPUT 680 - Compiler Design and Optimization
9
Lineage Interference
L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)
Two lineages Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vm) definitely overlap if:
(i) u1 reaches vn, and (ii) v1 reaches um.
a
b c d e
f g
h
iAugmented Data
Dependence Graph
CMPUT 680 - Compiler Design and Optimization
10
Lineage Interference Graph
L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)
a
b c d e
f g
h
i
L1
L3L2
L4
Lineage Interference Graph
Augmented Data Dependence Graph
Which lineages does lineage L1definely overlap with?
How about lineages L2 and L4?
CMPUT 680 - Compiler Design and Optimization
11
Lineage Fusion Condition
a
b c d e
f g
h
iAugmented Data
Dependence Graph
Lineage Interference Graph
L1
L3L2
L4
Two lineagesLu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) can be fusedinto a single lineage if:
(i) u1 reaches vn, and (ii) v1 does not reach um.
L1 = [a, b, f, h, i)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)
Lineages
CMPUT 680 - Compiler Design and Optimization
12
Lineage Fusion Condition
L1 = [a, b, f, h, I)L2 = [c, f)L3 = [e, g, h)L4 = [d, g)a
b c d e
f g
h
iAugmented Data
Dependence Graph
Lineage Interference Graph
L1
L3L2
L4
Lineages
Which lineages can be fused in the example?
d reaches f, and c does not reach g
Thus L4 can be fused with L2 to formL5 = [d, g) [c, f)
CMPUT 680 - Compiler Design and Optimization
13
Lineage Fusion
L1 = {a, b, f, h, i}L2 = {c, f}L3 = {e, g, h}L4 = {d, g}a
b c d e
f g
h
iAugmented Data
Dependence Graph
Lineage Interference Graph
L1
L3L2
L4
Lineages
When Lu = [u1, u2, …, um) and Lv = [v1, v2, …, vn) are fused:
(1) a scheduling edge from um to v1
is introduced in the augmented DDG(2) Lu and Lv are removed from the LIG(3) a new lineage Lw = Lu Lv is inserted in LIG
CMPUT 680 - Compiler Design and Optimization
14
Lineage Fusion Condition
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
a
b c d e
f g
h
iAugmented Data
Dependence Graph
Lineage Interference Graph
L1
L3L5Lineages
How many colors we needto color the LIG?
Thus the fusion of L4 with L2 formL5 = [d, g) [c, f)
CMPUT 680 - Compiler Design and Optimization
15
Lineage Fusion Condition
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
a
b c d e
f g
h
iAugmented Data
Dependence Graph
Lineage Interference Graph
L1
L3L5Lineages
We need three colors.
Can we find an instruction sequence?
CMPUT 680 - Compiler Design and Optimization
16
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph
Sequence
CMPUT 680 - Compiler Design and Optimization
17
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a
Sequence
CMPUT 680 - Compiler Design and Optimization
18
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d
Sequence
CMPUT 680 - Compiler Design and Optimization
19
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d e
Sequence
CMPUT 680 - Compiler Design and Optimization
20
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d e g
Sequence
CMPUT 680 - Compiler Design and Optimization
21
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d e g c
Sequence
CMPUT 680 - Compiler Design and Optimization
22
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d e g c b
Sequence
CMPUT 680 - Compiler Design and Optimization
23
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d e g c b f
Sequence
CMPUT 680 - Compiler Design and Optimization
24
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d e g c b f h
Sequence
CMPUT 680 - Compiler Design and Optimization
25
Sequencing by List Scheduling
Lineage Interference Graph
RA
RB
RC
Registers
L1
L3L5
L1 = [a, b, f, h, I)L3 = [e, g, h)L5 = [d, g) [c, f)
Lineages
a
b c d e
f g
h
iAugmented Data
Dependence Graph a d e g c b f h i
Sequence
CMPUT 680 - Compiler Design and Optimization
26
Summary of Our Solution Method
A “good” construction algorithm for LIG (dynamic)
An effective heuristic method to calculate the HRB
An efficient scheduling method (do not backtrack)
Form Lineage Interference Graph (LIG)
Derive HRB
Extended list-scheduling guided by HRB
DDG
A good instructionsequence
CMPUT 680 - Compiler Design and Optimization
27
Register Saturation (Touati)
Given a data depende graph G, the register saturation (RS) of G is the maximal register need for any scheduleof G.
Touati’s strategy is to compute the RS of the G and,if RS exceeds the number of available registers, to reducethe RS by introducing new arcs in G.
The intuition is that by using either (1) all available registersor (2) the maximal registers that G can use, instruction levelparallelism is maximized.
CMPUT 680 - Compiler Design and Optimization
28
The HRB and the RS
Govind, Gao, Yang, Amaral, and Zhang had earlierproposed an alternative method: to find an heuristicregister bound (HRB) to be used as a guidance ina modified list scheduling. Their goal is to find aschedule that uses a minimum number of registers.
To compare both methods we will apply Touati’smethod to Govind et al.’s example, and Govind’smethod to Touati’s example.
CMPUT 680 - Compiler Design and Optimization
29
Potencial Killers
To find the RS(G), we need to know which operationmust kill each value generated. Touati’s define the set of operations that are potential killers of the valuegenerated by an operation u G.
pkillG(u) = { v Cons(u) / v Cons(u) = {v} }
v is the set of all descendents of v, including v.w Cons(u) iff (w,u) G
Thus a node v is a potential killer of the value generated by a node u if and only if v consumes u and no descendent of v consumes u.
CMPUT 680 - Compiler Design and Optimization
30
Potencial Killing Graph
The edges of the Potential Killing Graph of a DDG G, PK(G)=(V, EPK), are defined as follows:
EPK = {(u,v) / u VR v pkillG(u)}
VR is the set of operations that define a value,i.e., operations that need a register.
CMPUT 680 - Compiler Design and Optimization
31
Govind’s Example: Data Dependency Graph
B3a
b c d e
f g
h
i
(a) t1 := ld(x);(b) t2 := t1 + 4;(c) t3 := t1 * 8;(d) t4 := t1 - 4;(e) t5 := t1 / 2;(f) t6 := t2 * t3;(g) t7 := t4 - t5;(h) t8 := t6 * t7;(i) st(y,t8);
DDG G
CMPUT 680 - Compiler Design and Optimization
32
Govind’s Example: Potential Kill Graph
a
b c d e
f g
h
i
DDG G
pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}
CMPUT 680 - Compiler Design and Optimization
33
Govind’s Example: Potential Kill Graph
a
b c d e
f g
h
i
DDG G
a
b c d e
f g
h
i
PK(G)* In this example the DDG G and the potential kill graph PK(G) are identical. In general that is not the case.
CMPUT 680 - Compiler Design and Optimization
34
Choosing the Killer
If a node u has more than one potential killer, Touatidefines a killing function, k(u), that specifies which oneamong the potential killers of u will actually kill u.
A killing function imposes a scheduling order in the DDG:all other consumers of u , Cons(u), must be scheduled before k(u) is scheduled.
To represent these scheduling constraints, Touati defines an extended DAG, Gk, induced by the killingfunction k.
CMPUT 680 - Compiler Design and Optimization
35
Govind’s Example: Killing Function
a
b c d e
f g
h
i
PK(G)
pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}
In this example, node a is theonly node with multiple potentialkillers.
CMPUT 680 - Compiler Design and Optimization
36
Govind’s Example: Killing Function
Gk
pkillG(a) = {b, c, d, e}pkillG(b) = {f}pkillG(c) = {f}pkillG(d) = {g}pkillG(e) = {g}pkillG(f) = {h}pkillG(g) = {h}pkillG(h) = {i}
If we choose k(a) = b, we obtainthe Gk on the left.
a
b c d e
f g
h
i
CMPUT 680 - Compiler Design and Optimization
37
Selecting a Good Set of Killers...
If the killing function for multiple nodes with multiple potential killers is choosen arbitrarily,it might induce cycles in Gk.
A valid killing function is one that does notinduce cycles in Gk.
CMPUT 680 - Compiler Design and Optimization
38
Avoiding Vengeance...
The descendents of k(u) cannot be simultaneouslyalive with u. Touati defines the Disjoint Value Graph,DVk(G) = (VR, EDV), by:
EDV = {(u,v) / u, v VR v Rk(u)}
An edge (u,v) in DVk(G) means that the live intervalof u is always before the live interval of v in any schedule of Gk.
A killer must kill before it has children, thus...
CMPUT 680 - Compiler Design and Optimization
39
Govind’s Example: Disjoint Value Graph
Gk
k(a) = {b}k(b) = {f}k(c) = {f}k(d) = {g}
a
b c d e
f g
h
i
k(e) = {g}k(f) = {h}k(g) = {h}k(h) = {i}
a
b c d e
f g
h
i
DVk(G) * simplified by transitive reduction
CMPUT 680 - Compiler Design and Optimization
40
Register Need and Maximal Antichains
The register need of any schedule of Gk is alwaysless than or equal to a maximal antichain in DVk(G).
An antichain in a graph G(E,V) is a set of nodes A suchthat there are no paths between the nodes in A:
A = {u, v V / (u,v) Ec (v,u) Ec}
Where Ec is the transitive closure of G: (u,v) Ec:(u,v) Ec iff a path p = (u, …, v) in G.
CMPUT 680 - Compiler Design and Optimization
41
Govind’s Example: Maximal Antichain
a
b c d e
f g
h
i
DVk(G)
The maximal antichain in thisexample is:
AMk = {a, c, d, e}
Thus this graph, with thiskilling function can useat most 4 registers.
CMPUT 680 - Compiler Design and Optimization
42
Register Saturating Scheduling
Touati proves that:
For every valid killing k(V) function, there is always a schedule that makes all the values in the maximal antichain of the disjoint value DAG DVk(G) simultaneously alive.
CMPUT 680 - Compiler Design and Optimization
43
Saturating Killing Function
To find the register saturation of a DDG, we need tofind a killing function that maximizes the maximalantichain in DVk(G).
In other words, we need to find a killing functionthat maximizes the number of nodes that are not connected by a path in DVk(G).
Touati calls this the maximizing maximal antichain (MMA) problem. A solution to the MMA problem isa saturating killing function. MMA is NP-complete.
CMPUT 680 - Compiler Design and Optimization
44
Heuristic to Compute Register Saturation
To compute the register saturation, Touati startsby decomposing the potential kill graph PK(G)into connected bipartite components.
A bipartite component, cb = (Scb, Tcb, Ecb), isa graph with a set of source nodes Scb, a setof target nodes Tcb, and a set of edges Ecb. cbmust obey the following conditions.
If e EPK e’ Ecb e, e’ share an endpoint, then e Ecb
e, e’ Ecb / target(e) = source (e’) /
CMPUT 680 - Compiler Design and Optimization
45
Bipartite Decomposition of PK(G)
A bipartite decomposition of the potential killing graphPK(G) is a set of bipartite components such that forevery edge e PK(G), there is a bipartite componentcb in the decomposition such that e Ecb.
Touati proves that given a DDG G, there is only onebipartite decomposition of G.
CMPUT 680 - Compiler Design and Optimization
46
Govind’s Example: Bipartite Decomposition
a
b c d e
f g
h
i
PK(G)
a
b c d e
b c d e
f g
f g
h
h
i
Bipartite Decomposition
CMPUT 680 - Compiler Design and Optimization
47
Saturating Killing Set
Touati defines the Saturating Killing Set of a connectedbipartite component cb, SKS(cb), as a subset of thetarget nodes, Tcb’ Tcb such that:
(1) All the source nodes, Scb, are contained in the union of all predecessors of the nodes in Tcb’.
(2) Tcb’ contains a minimum number of nodes.
Computing the SKS is an NP-complete problem.
CMPUT 680 - Compiler Design and Optimization
48
Govind’s Example: Saturating Killing Set
a
b c d e
b c d e
f g
f g
h
h
i
Bipartite Decomposition
In this example the computationof SKS is trivial. The only component with a non-unitarytarget set is the top one.
The selection of any single nodein the set Tcb = {b, c, d, e} covers the set Scb = {a}. Thus the selection can be arbitrary.
CMPUT 680 - Compiler Design and Optimization
49
Govind’s Example
As we seen earlier with k(a) = b, the registersaturation in Govind’s example is 4. And a schedulethat has four values alive at the same time can befound.
Using the lineage method, Govind et al. found aschedule for their example that uses three registers.What does Touati’s method does if only three registersare available?
CMPUT 680 - Compiler Design and Optimization
50
Reducing RS
Touati proposes an algorithm to reduce the registersaturation while trying not to increase the lengthof the critical path.
The algorithm starts by computing the maximal antichain AMk.Then it starts an interative process in which thefirst step is to construct the set Uk of alladmissible serializations between the saturatingvalues in AMk with their costs.
CMPUT 680 - Compiler Design and Optimization
51
Admissible Serializations
A serialization u v means that the kill of umust always be carried out before the definitionof v.
If v is one of the potential killers of u, then toproduce the serialization u v we must add arcs fromall other potential killers of u to v. This way we ensure that the live ranges of u and v will not overlap.
If v is not a potential killer of u, then to produce the serialization u v we must add arcs fromall nodes u’ pkillG(u) to v, as long as there is no path from v to u’.
CMPUT 680 - Compiler Design and Optimization
52
Cost of Serializations
The cost function of a serialization is defined as
(u v) = (1, 2)
1 predicts the reduction in the saturation valueproduced by the serialization, it is computed by:
1 = 1 - 2
1 is the number of saturating values serialized after u if this serialization is carried out.
2 is the number of descendents of u that can become simultaneously alive with u.
1 is the increase in the critical path.
CMPUT 680 - Compiler Design and Optimization
53
Govind’s Example: Reducing RS
With the killling functionk(a) = {b}, the saturating values are:
AMk = {a, c, d, e}
pkillG(a) = {b, c, d, e}
Gk
a
b c d e
f g
h
i
For a serialization u v to beadmissible, the following conditionmust be true:
v’ pkill(u) (v < v’ )i.e., there are no paths from v toany potential killer of u.
CMPUT 680 - Compiler Design and Optimization
54
Govind’s Example: Reducing RS
With the killling functionk(a) = {b}, the saturating values are:
AMk = {a, c, d, e}
pkillG(a) = {b, c, d, e}
Gk
a
b c d e
f g
h
i
Thus, there is no admissibleserialization from a to any ofthe other saturating values,because b pkillG(a) and there are paths fromc, d, and e to b in Gk
CMPUT 680 - Compiler Design and Optimization
55
Govind’s Example: Reducing RS
With the killling functionk(a) = {b}, the saturating values are:
AMk = {a, c, d, e}
pkillG(a) = {b, c, d, e}
Gk
a
b c d e
f g
h
i
c d and c e are notadmissible serializations
either because f pkillG(c) and d < f, e < f
CMPUT 680 - Compiler Design and Optimization
56
Govind’s Example: Reducing RS
With the killling functionk(a) = {b}, the saturating values are:
AMk = {a, c, d, e}
pkillG(a) = {b, c, d, e}
Gk
a
b c d e
f g
h
i
d e is not admissible because g pkillG(d) and e < g,
e d is not admissible because g pkillG(e) and d < g
CMPUT 680 - Compiler Design and Optimization
57
Govind’s Example: Reducing RS
With the killling functionk(a) = {b}, the saturating values are:
AMk = {a, c, d, e}
pkillG(a) = {b, c, d, e}
Gk
a
b c d e
f g
h
iThus the admissible serializations
in this example are:d c, e c
CMPUT 680 - Compiler Design and Optimization
58
Govind’s Example: Reducing RS
Gk
a
b c d e
f g
h
i
In this example bothserializations will cause
the scheduling edge (g,c) to be added to the graph.
Thus their cost is equivalent.
Note that, for this example,reducing RS is equivalent tothe lineage fusion technique
in Govind et al. approach.
CMPUT 680 - Compiler Design and Optimization
59
Govind’s Algorithm in Touati’s Example
Now we will apply the lineage based methodproposed by Govind et al. to the DDG presentedby Touati.
In the next slide we transcribe the code and theDDG as presented by Touati.
CMPUT 680 - Compiler Design and Optimization
60
A Trivial Example
x
y
k
t
z
pkillG(x) = {k}
pkillG(y) = {z}
pkillG(z) = {k}
pkillG(k) = {z}
DDG
x
y
k
t
z
PKG
CMPUT 680 - Compiler Design and Optimization
61
A Trivial Example (cont.)
x
y
k
t
z
pkillG(x) = {k}
pkillG(y) = {z}
pkillG(z) = {k}
pkillG(k) = {z}
DDG
x
y
k
t
z
PKG
There are no choicesto be made as eachnode has only one
potential killer.
CMPUT 680 - Compiler Design and Optimization
62
A Trivial Example (cont.)
x
y
k
t
z
DDG
x
y
k
t
z
DV
The DV graph is identicalto the PKG in this case,
and the solution is trivial,the maximal antichain inthe DV graph is {x,y,z}
CMPUT 680 - Compiler Design and Optimization
63
A Non-Trivial Example
a
f
d e
DDG
b c
pkillG(a) = {f}
pkillG(b) = {d,e}
pkillG(c) = {d,e}
pkillG(d) = {g}pkillG(e) = {f}
g
pkillG(f) = {g}
CMPUT 680 - Compiler Design and Optimization
64
A Non-Trivial Example
a
f
d e
DDG
b c
g
a
f
d e
b c
g
a
f
d e
b c
g
a
f
d e
b c
g
a
f
d e
b c
g
DVk1k1={(b,d),(c,d)}
DVk2k2={(b,d),(c,e)}
DVk3k3={(b,e),(c,d)}
DVk4k4={(b,e),(c,e)}
CMPUT 680 - Compiler Design and Optimization
65
A Non-Trivial Example
a
f
d e
DDG
b c
g
a
f
d e
b c
g
a
f
d e
b c
g
a
f
d e
b c
g
a
f
d e
b c
g
DVk1k1={(b,d),(c,d)}
DVk2k2={(b,d),(c,e)}
DVk3k3={(b,e),(c,d)}
DVk4k4={(b,e),(c,e)}
CMPUT 680 - Compiler Design and Optimization
66
There are eight killing functions (DV Graphs)
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}
k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}
CMPUT 680 - Compiler Design and Optimization
67
Maximal antichainsa
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
a
f
d e
b c
k={(a,b),(b,d),(c,d)} k={(a,b),(b,d),(c,e)} k={(a,b),(b,e),(c,d)} k={(a,b),(b,e),(c,e)}
k={(a,c),(b,d),(c,d)} k={(a,c),(b,d),(c,e)} k={(a,c),(b,e),(c,d)} k={(a,c),(b,e),(c,e)}
CMPUT 680 - Compiler Design and Optimization
68
A More Non-Trivial Example
a
DDG
d e j k
b c g
f m
n
pkillG(a) = {b,c,g}pkillG(b) = {d,e}pkillG(c) = {e,j,k}pkillG(d) = {f}pkillG(e) = {m}pkillG(f) = {n}pkillG(g) = {d,j,k}pkillG(j) = {f}pkillG(k) = {m}
There are 3*2*3*3=18 killing functions
CMPUT 680 - Compiler Design and Optimization
69
Govind’s Algorithm in Touati’s Example
(a) fload [i1], fRa
(b) fload [i2], fRb
(c) fload [i3], fRc
(d) fmult fRa, fRb, fRd
(e) imultadd fRa, fRb, fRc, iRe
(g) ftoint fRc, iRg
(i) iadd iRg, 4, iRi
(f) fmultadd_setz fRb, iRi, fRc, fRf, gf
(h) fdiv fRd, iRe, fRh
(j) gf ? fadd_setbnz fRj, 1 , fRj, gj
(k) gf | gj ? fsub fRk, 1 , fRk
a b c
d e f
h k
g
i
j
fRc
iRg
iRi
gf
gj
iRe
Touati concentrates on theblue edges that represent flow
of floating point values.
fRd
gf
CMPUT 680 - Compiler Design and Optimization
70
Govind’s Algorithm in Touati’s Example
We will also concentrate onthe floating point value flow.Thus the simplified DDG isshown on the left.
Although the modified list schedulingrequires a souce and a sink node, the lineage formation processdoes not consider the source andthe sink node.
a b c
d e f
h k
g
i
j
CMPUT 680 - Compiler Design and Optimization
71
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
12 1
3 3 2
0
1
Step 1: Compute the heights
CMPUT 680 - Compiler Design and Optimization
72
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
12 1
3 3 2
0
1
Step 1: Compute the heightsStep 2: First lineage formation
L1 = [a, e)
CMPUT 680 - Compiler Design and Optimization
73
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
12 1
3 3 2
0
1
Step 1: Compute the heightsStep 2: First lineage formation
L1 = [a, e)Step 3: Second lineage formation
L2 = [b, f)
CMPUT 680 - Compiler Design and Optimization
74
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
23 1
4 4 3
0
1
Step 1: Compute the heightsStep 2: First lineage formation
L1 = [a, e)Step 3: Second lineage formation
L2 = [b, f)Recompute heights
CMPUT 680 - Compiler Design and Optimization
75
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
23 1
4 4 3
0
1
Step 1: Compute the heightsStep 2: First lineage formation
L1 = [a, e)Step 3: Second lineage formation
L2 = [b, f)Recompute heights
Step 4: Third lineage formationL3 = [c, f)
CMPUT 680 - Compiler Design and Optimization
76
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
Step 1: Compute the heights
1 10
23 1
4 4 3
0
2
Step 2: First lineage formationL1 = [a, e)
Step 3: Second lineage formationL2 = [b, f)
Recompute heightsStep 4: Third lineage formation
L3 = [c, f)Recompute heights
CMPUT 680 - Compiler Design and Optimization
77
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
Step 1: Compute the heights
1 10
23 1
4 4 3
0
2
Step 2: First lineage formationL1 = [a, e)
Step 3: Second lineage formationL2 = [b, f)
Recompute heightsStep 4: Third lineage formation
L3 = [c, f)Recompute heights
Step 5: Fourth lineage formationL4 = [d, h)
CMPUT 680 - Compiler Design and Optimization
78
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
Step 1: Compute the heights
1 10
23 1
4 4 3
0
2
Step 2: First lineage formationL1 = [a, e)
Step 3: Second lineage formationL2 = [b, f)
Recompute heightsStep 4: Third lineage formation
L3 = [c, f)Recompute heights
Step 5: Fourth lineage formationL4 = [d, h)
CMPUT 680 - Compiler Design and Optimization
79
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
23 1
4 4 3
0
2
L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)
Lineage Source Nodes: S = {a, b, c, d}
Lineage End Nodes: S = {e, f, h}
e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1
Reach Relation:
CMPUT 680 - Compiler Design and Optimization
80
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
23 1
4 4 3
0
2
L1 = [a, e)L2 = [b, f)L3 = [c, f)L4 = [d, h)
e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1
Reach Relation:
Because d can reach f, butc cannot reach h, we can fuselineages L4 and L3 to createa new lineage L5 = [d, h)[c,f).This fusion requires a sequencingedge from h to c.
CMPUT 680 - Compiler Design and Optimization
81
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
23 1
4 4 3
0
2
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).
e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1
Reach Relation:
Because there are no more 0’sin the Reach relation matrix,there is no more lineage fusion possible.
CMPUT 680 - Compiler Design and Optimization
82
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j1 1
0
23 1
4 4 3
0
2
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f).
e f h a 1 1 1 b 1 1 1 c 1 1 0 d 1 1 1
Reach Relation:
Lineage Interference Graph:
L1
L2 L5
We need three colors:L1 = RAL2 = RBL3 = RC
CMPUT 680 - Compiler Design and Optimization
83
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
Sequence
CMPUT 680 - Compiler Design and Optimization
84
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
CMPUT 680 - Compiler Design and Optimization
85
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
b
CMPUT 680 - Compiler Design and Optimization
86
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
b d
CMPUT 680 - Compiler Design and Optimization
87
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
b d h
CMPUT 680 - Compiler Design and Optimization
88
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
b d h c
CMPUT 680 - Compiler Design and Optimization
89
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
b d h c e
CMPUT 680 - Compiler Design and Optimization
90
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
b d h c e g
CMPUT 680 - Compiler Design and Optimization
91
Govind’s Algorithm in Touati’s Example
a b c
d e f
h k
g
i
j
L1 = [a, e)L2 = [b, f)L5 = [d, h) [c,f)
L1
L2 L5
RA
RB
RC
Registers
a
Sequence
b d h c e g f
CMPUT 680 - Compiler Design and Optimization
92
Comparing the Methods
Touati’s method allows the creation of schedulesthat uses from 7 to 3 registers (in his CC2001 paperhe reduced from 7 to 4) according to the numberof registers available for the basic block.
Govind et al. method will always create a schedule usingthree registers for this basic block, regardless of thenumber of registers available for the basic block.
CMPUT 680 - Compiler Design and Optimization
93
Conjecture
If the scheduler in an out of order instruction issue processor is optimal and the register renaminghas an infinite number of hidden registers, bothmethods should be equivalent, and the lineage basedone is simpler.
With limited number of hidden registers for renaming,and a sub-optimal runtime scheduler, Touati’s methodis likely to produce better results because it makes better use of the available registers.
CMPUT 680 - Compiler Design and Optimization
94
Research Questions
How well do the two methods compare in anactual superscalar processor such as the MIPS R12K?
Touati’s claim that his method will work well in VLIWmachines too. How would it compare with the lineagemethod in the IA-64?
The allocation of registers to basic block by the globalregister scheduler might affect Touati’s method significantly. How can his LRA be integrated with a GRA?
CMPUT 680 - Compiler Design and Optimization
95
Summary of Our Solution Method
A “good” construction algorithm for LIG (dynamic)
An effective heuristic method to calculate the HRB
An efficient scheduling method (do not backtrack)
Form Lineage Interference Graph (LIG)
Derive HRB
Extended list-scheduling guided by HRB
DDG
A good instructionsequence