35
CMPUT 680 - Compiler Des ign and Optimization 1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680

CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

Embed Size (px)

Citation preview

Page 1: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

1

CMPUT680 - Fall 2003

Topic J: Wavefront SchedulingJosé Nelson Amaral

http://www.cs.ualberta.ca/~amaral/courses/680

Page 2: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

2

Reading Material

Bharadwaj, J., Menezes, K., McKinsey, C., “Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs,” Proceedings of 32nd International Symposium on Microarchitecture, Dec. 1996, pp. 100-113.Bharadwaj, J., “Method and apparatus for instruction scheduling to reduce negative effects of compensation code,” Patent No. 5,894,576, April 3 1999

Page 3: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

3

New Concepts

Global Code Scheduler (GCS)

Region Formation

Wavefront Scheduling

Path Vectors

Deferred Compensation

P-ready Code Motion

Page 4: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

4

Scheduling Regions

Similar to Mahlke’s definition, here a region isa subgraph of a control flow graph that has aunique entry node that dominates all thenodes in the region.

There is a further restriction that the regions must be acyclic.

Page 5: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

5

JS-nodes

A Join-Split (JS) edge in a CFG goes from a split node to a join node.

A split node in a CFG is a node that hasmore than one immediate successor.

A join node in a CFG is a node that hasmore than one immediate predecessor.

C

B

D

B

D

Page 6: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

6

Removal of JS-nodes

C

B

D

The application of the wavefrontscheduling technique requires theremoval of al JS-nodes.

A JS-node is removed by adding an empty block (called a JS block)between the split node and the join node.

C

B

D

G

Page 7: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

7

Interface Blocks

A side entry node is a nodein the region that has at leastone immediate predecessor in the region, and at least oneimmediate predecessoroutside the region.

B

E

C D

Which nodes are side entry nodes in the example?

D

D

Page 8: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

8

Interface Blocks

A side exit node is a nodein the region that has at leastone immediate successor in the region, and at least oneimmediate successoroutside the region.

Which nodes are side exit nodes in the example?

C and D

C D

B

E

C DC D

Page 9: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

9

Interface Blocks

When control enters or leaves the region, GCS may require a block to schedule compensation code in. Thus interface blocks are inserted between two nodes x and y iff:

(i) x is outside of the region, y is a side entry node, and there is an edge (x,y), or

(ii) y is outside the region, x is a side exit node, and there is an edge (x,y).

Page 10: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

10

Interface Blocks

Where do we need interface blocks in thefollowing example?

B

E

C D

Page 11: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

11

Interface Blocks

We need three interface blocks.

B

E

C D

F

G H

Page 12: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

12

Hierarchical Regions

For the global code scheduler, regions arehierarchical:(1) First the code of an inner most loop is selected and scheduled.

(2) Then a summary of the data flow and resource usage of the loop is computed, and the loop is converted into a single node in the graph.

Page 13: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

13

Nested Regions

A

C

B

D

E

F2

F1

F3

A

C

B

D

E

F2

F1

F3

G

H J K I

G, J, and K are JS blocks H and I are interface blocks

Page 14: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

14

Path Vectors

There is a finite number of control paths inan acyclic scheduling region.

A path vector is a bit vector in which each bitin the vector represents a unique path in aregion.

A subset of paths can be represented by apath vector by writing 1 for the paths in thesubset and writing 0 for the paths not in thesubset.

Page 15: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

15

Paths in our Example

A

F

B

D

C G

E

JH

K I

Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI

We can define the subset ofall paths that include basicblock G as BP(G) = {P2, P3}

And we can represent this setby the block path vector:BPV(G) = [ 0 0 1 1 0 0]

Page 16: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

16

Paths in our Example

A

F

B

D

C G

E

JH

K I

Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI

P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0] BPV(K) = [ 0 1 0 0 0 0]

Page 17: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

17

Control Flow Relations

We can compute control flow relations such asdominance, post-dominance, control equivalence,disjointness, etc, by performing bitwise operationson these path vectors.

If BPV(x) = BPV(y), then blocks x and y arecontrol flow equivalent.

If BPV(x) is a superset of BPV(y), then block x either dominates or post-dominates block y.

Page 18: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

18

Paths in our Example

A

F

B

D

C G

E

JH

K I

Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI

P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0] BPV(K) = [ 0 1 0 0 0 0]

Example1: What is the relationbetween blocks B and D?

Blocks B and Dare control flow equivalentbecause BPV(B) = BPV(D).

Page 19: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

19

Paths in our Example

A

F

B

D

C G

E

JH

K I

Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI

P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0] BPV(K) = [ 0 1 0 0 0 0]

Either block A dominates or post-dominatesblock E because and BPV(A)is a superset of BPV(E).

Example 2: What is the relationbetween blocks B and D?

Page 20: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

20

Paths in our Example

A

F

B

D

C G

E

JH

K I

Paths:P0: ABCDHP1: ABCDJEP2: ABGDHP3: ABGDJEP4: AFKEP5: AFI

P5 P4 P3 P3 P1 P0 BPV(A) = [ 1 1 1 1 1 1] BPV(B) = [ 0 0 1 1 1 1] BPV(C) = [ 0 0 0 0 1 1] BPV(D) = [ 0 0 1 1 1 1] BPV(E) = [ 0 1 1 0 1 0] BPV(F) = [ 1 1 0 0 0 0] BPV(G) = [ 0 0 1 1 0 0] BPV(H) = [ 0 0 0 1 0 1] BPV(I) = [ 1 0 0 0 0 0] BPV(J) = [ 0 0 1 0 1 0]

BPV(K) = [ 0 1 0 0 0 0]

Example3: Likewise block E eitherdominates or post-dominatesblock K because and BPV(E)is a superset of BPV(K).

Page 21: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

21

Problems with Cross-Block Scheduling

Most cross-block scheduling techniques are notjudicious when scheduling compensation code.

Consider that the scheduling of an instruction M in block x requires compensation code in block y.

Most schedulers cannot evaluate how desirableit is to place the compensation code in y.

Some schedulers only allow M to be scheduledin x if y has not been scheduled yet.

Compensation code is code that needs to bescheduled somewhere else to compensate forthe execution of an instruction M on a block x.

Page 22: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

22

Wavefront

A scheduling region is an acyclic region withJS edges eliminated and interface blocks added.

A wavefront is a strongly independent cut set that partitions a scheduling region in three parts:

nodes above the wavefront nodes on the wavefront

nodes below the wavefront

The wavefront is strongly independent in the sensethat no control flow path flows through more than one node in the wavefront.

Page 23: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

23

Wavefront Dominance Property

The wavefront nodes collectively dominate allthe nodes below the wavefront, and collectivelypost-dominate all the nodes above the wavefront.

Consider two blocks in the region: Block k is not in the wavefront Block w is in the wavefrontThis property guarantees that when an instructionoriginally in block k is scheduled in block w,compensation code can be inserted entirely intoblocks in the wavefront.

Page 24: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

24

JS-nodes and Strongly Independent Cuts

A

F

B

D

C

E

JH

K I

Can you build a wavefrontthat includes C and satisfythe conditions of dominance,post-dominance, and nocontrol path including morethan one node in the wavefront?

First try: {C, F}

This wavefront does notpost-dominate A,B nor itdominates D, H, J, E.

Page 25: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

25

JS-nodes and Strongly Independent Cuts

A

F

B

D

C

E

JH

K I

Can you build a wavefrontthat includes C and satisfythe conditions of dominance,post-dominance, and nocontrol path including morethan one node in the wavefront?

The path ABCDH includestwo nodes in the wavefronttherefore the wavefront is not a strongly independent cut set.

Second try: {C, D, F}

Page 26: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

26

JS-nodes and Strongly Independent Cuts

A

F

B

D

C G

E

JH

K I

When the proper JS-nodeis inserted, we can easilyfind a wavefront that:(1) post-dominates all predecessors,(2) dominates all successors, and(3) is a strongly independent cut set (no control path includes more than one node in the wavefront).

Page 27: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

27

Wavefront Scheduling

In directional scheduling (either top-down or bottom-up)there is a region of code that is already scheduled,another region that is not yet scheduled, and a boundary.

In wavefront scheduling, the wavefront is this boundary.The wavefront moves up or down according to the direction of scheduling choosen.

Page 28: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

28

Example of Wavefront Scheduling

A

F

B

D

C G

E

JH

K I

W0

W2

W4

W1

W6W3

W5

Page 29: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

29

Deferred Compensation

A

B

E

C D

G

F

Consider that an instruction Mis originally in block A. If we wantto move M downward we have toschedule M in all paths that containan use of the variable defined by M.

For instance, assume that there is an use of M in G.

Page 30: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

30

Deferred Compensation

A

B

E

C D

G

F

Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG

Thus a clone of M must appearin paths P0, P1, and P2.

The compensation path vectorof an instruction M is the set ofall paths that must contain a cloneof M when M is not scheduled inits original basic block.

CPV(M) = [1 1 1]

Page 31: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

31

Deferred Compensation

A

B

E

C D

G

F

Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG

CPV(M) = [1 1 1]

W1

Assume that we decide thatit is desirable to schedule a clone of M, M’, in block F.

We update CPV(M) to: CPV(M) = CPV(M) - BPV(F)

= [1 1 1] - [0 0 1] = [1 1 0]

M’

Page 32: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

32

Deferred Compensation

A

B

E

C D

G

F

Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG

CPV(M) = [1 1 0]W2

Assume that at W2 we decide toschedule a clone of M, M’’, in block C.

CPV(M) = CPV(M) - BPV(C)= [1 1 1] - [1 0 0] = [0 1 0]

M’

Page 33: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

33

Deferred Compensation

A

B

E

C D

G

F

Path Summary:P0 = AFGP1 = ABDEGP2 = ABCEG

CPV(M) = [0 1 0]W2

Now we cannot close block Dunless we schedule M.

M’M’’

Because BPV(B) is a supersetof CPV(M) we know that this isthe last compensation copy ofM to be scheduled.

Page 34: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

34

When to Move Code?

Bharadwaj, Menezes and McKinsey define theusefulness of moving code from an origin block Oto a target block T in terms of the likelihood thatcontrol will flow through T and O given that controlreaches T.

( ) ( )( )( )( )TBPV

OBPVTBPV

Prob

Prob ∩

Page 35: CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral amaral/courses/680

CMPUT 680 - Compiler Design and Optimization

35