35
Instruction Scheduling on VLIW Architectures Spring 2011 4541.775 Topics on Compilers

Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Schedulingon

VLIW Architectures

Spring 2011

4541.775Topics on Compilers

Page 2: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Limited ILP

● Trace Scheduling

● Superblock Scheduling

● Hyperblock Scheduling

● Modulo Scheduling

Page 3: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Insufficient ILP

● “normal” code does not contain enough ILP

● ILP within basic blocks is limited for control­intensive programs

– the problem accentuates with longer latencies

unsigned int abs_sum = 0;for (int i=0; i<N; i++) { int abs = (A[i] >= 0? A[i] : -A[i]); abs_sum += abs;}

mov r0 ← #0 mov r1 ← #0 mov r2 ← N shl #2 mov r3 ← @A.loop ld r4 ← mem[r3 + r1] bge r4, #0, .skip not r4 ← r4 add r4 ← r4, #1.skip add r0 ← r0, r4 add r1 ← r1, #4 blt r1, r2, .loop

b0

b1

b2

b3

Page 4: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Insufficient ILP

● “normal” code does not contain enough ILP

● ILP within basic blocks is limited for control­intensive programs

– the problem accentuates with longer latencies

b0

b2

b3

b1

ld r4 ← … bge r4, …

ld latency: 4 cycles

ld

bge

4

ld

bge

Page 5: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● ILP within basic blocks is limited for control­intensive programs.

 → optimizations across basic blocks are needed

– trace scheduling (J.Fisher, 1981)

– superblock scheduling (P.Chang, 1991)

– hyperblock scheduling (S.Mahlke, 1992)

Page 6: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Trace Scheduling

J.A.Fisher: Trace Scheduling: A Technique for Global Microcode Compaction (IEEE Transactions on Computers, vol.30, no.7, 1981)

● basic idea: schedule the most frequently executed trace of basic blocks as one unit

● requires compensation code if the program takes another route than expected

add r4 ← r0, r1

add r4 ← r0, r1 add r4 ← r0, r1

code motioncompensationcode0.9 0.1

Page 7: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Trace Scheduling

● A trace consists of a sequence of instructions

– including branches– but not including loops

● example:● assume B1,B3,B4,B5,B7 is 

the most frequently executedpath

B2

B3

B4

B5

B1

B6

B7

Page 8: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Trace Scheduling

B2

B3

B4

B5

B1

B6

B7

B2

B3

B4

B1

B6B5

B7

add compensationcode if necessary

Page 9: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Trace Scheduling

● Compensation Code

– moving an instruction below a side exit

instr 1instr 2instr 3instr 4instr 5instr 6

instr 2instr 3instr 4instr 5instr 1instr 6

instr 1

Page 10: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Trace Scheduling

● Compensation Code

– moving an instruction above a side exit(speculative execution)

instr 1instr 2instr 3instr 4instr 5instr 6

instr 1instr 5instr 2instr 3instr 4instr 6

[undo instr 5]

Page 11: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Trace Scheduling

● Compensation Code

– moving an instruction below a side entrance– moving an instruction above a side entrance

instr 1instr 2instr 3instr 4instr 5instr 6

instr 2instr 3instr 4instr 5instr 1instr 6

instr 5instr 4

Page 12: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling Wen­Mei Hwu et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation (The Journal of Supercomputing, vol. 7, issue 1­2, 1993) 

● tries to overcome some difficulties with trace scheduling

– complicated book­keeping when moving instructions above/below a side entrance/exit

– some compiler optimizations require additional book­keeping when side entrances are present

example: copy­propagation

Page 13: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● a superblock is a trace with no side entrances control may only enter from the top, but leave at one or more exit →

points

● similar to extended basic blocks (Aho et al, 1986)

● superblock formation:

1. identify trace using profile information

2. apply tail­duplication until all side entrances have been eliminated

● tail duplication

1. copy the the tail portion of the trace from the first side entrance to the end

2. move all side entrances to the corresponding duplicated basic blocks

Page 14: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● example: superblock formation

Page 15: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● superblock ILP optimizations

optimizations that are performed before superblock formation with the goal to enlarge the superblock and increase ILP by removing dependences.

● superblock enlarging optimizations

– branch target expansion● expand target of the likely taken control transfer that ends a superblock● not applied to backedges● stops when a predefined superblock size is reached or the branch does not favor 

one direction.

Page 16: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● superblock enlarging optimizations (cont’d)

– loop peeling● applied to superblock loops (superblocks which end with a likely taken control 

transfer to itself) that only tend to iterate a few (k) times.● peel the first k iterations and insert control flow to branch to the original loop 

body if the loop is not executed k times.● after loop peeling, the superblock may be extended both at the head and the tail 

of the superblock loop

– loop unrolling● unroll the body of a superblock loop that tends to iterate many times

Page 17: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● superblock dependence removing optimizationsremove data dependences between instructions in a superblock

– register renamingi.e., in unrolled loop bodies

– operation migration● move instructions whose result is not used within a superblock to a less 

frequently superblock● decicion based on a cost function

– induction variable expansion● create a separate copy of the loop induction variable for each unrolled loop body● requires additional patch code at the loop preheader and at exits

Page 18: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● superblock dependence removing optimizations (cont’d)

– accumulator variable expansion● use a separate accumulator for each unrolled instance of loops accumulating a 

sum or product in every iteration● additional patch code at the loop preheader needed● additional patch code at the loop exits needed (summing up the individual 

accumulators)

– operation combining● for certain classes of instructions, true dependencies can be eliminated by pre­

computing new immediate values at compile time● example:

add x ← x, #4add x ← x, #4

add x ← x, #4 add x’ ← x, #8……mov x ← x’

Page 19: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● example: superblock dependence removing optimizations

accumulator variableexpansion

induction variableexpansion

Page 20: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● speculative execution

– occurs when moving an instruction up above a control transfer instruction B

– the instruction is executed in any case, even if the control transfer instruction would branch out of the superblock (i.e., speculative instructions)

– restrictions for an instruction I to be executed speculatively

1. the destination of I is not used before it is redefined when B is taken

2. I will never cause an exception that may terminate the program when B is taken

– instructions that may cause exceptions● memory load● memory store● integer divide● floating point operations

Page 21: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● speculative execution (cont’d)

– exception models● restricted percolation model

no support for disregarding exceptions generated by speculatively executed instructions

● limits performance in superblocks that contain many long­latency potentially trap­causing instructions (i.e., memory loads) above branches

● general percolation modelthe architecture provides a non­trapping version instructions that may cause exceptions

● convert speculatively executed and potentially trapping instructins to their non­trapping counterpart

● if detection of the exception is required additional architecture and compiler support is required

Page 22: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● Analysis

– implementation complexity in the IMPACT­I C compiler

total size: ~92K lines

Page 23: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● Analysis

– compilation time (IMPACT­I)

Page 24: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● Analysis

– performance improvement due to superblock ILP optimization

Page 25: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● Analysis

– effect of speculative execution support

Page 26: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Superblock Scheduling

● Analysis

– code size increase

Page 27: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Hyperblock Scheduling Scott Mahlke et al. Effective Compiler Support for Predicated Execution Using the Hyperblock (MICRO’25, 1992) 

● tries to overcome some difficulties with superblock scheduling

– superblocks end when both targets of a control flow instruction have a similar probability to be taken

● hyperblock scheduling

– combine basic blocks from multiple control paths (using if­conversion)

– for programs without heavily biased branches, hyperblocks provide a more flexible framework

Page 28: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Hyperblock Scheduling

● Predicated execution

– When the predicate is TRUE the instruction is executed normally

– When the predicate is FALSE the instruction is treated as a NOP

● Conditional branches can be eliminated with predicated execution (if­conversion)

Page 29: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Hyperblock Scheduling

● The Hyperblock

– set of predicated basic blocks in which control may only enter at the top but several exits may exists.

– very similar to superblock formation

● Building Hyperblocks

1. hyperblock block selection● decide which basic blocks in a region should be included in the hyperblock● three features of each block are examined

– execution frequency– block size– instruction characteristics

● use heuristic functions

Page 30: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Hyperblock Scheduling

● Building Hyperblocks (cont’d)

2. hyperblock formation● tail duplication● loop peeling● node splitting

– eliminate dependences created by control path merges– duplicate all blocks subsequent to the merge point for each path

● If­conversion

Page 31: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Hyperblock Scheduling

● Building Hyperblocks (cont’d)

Page 32: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Hyperblock Scheduling

● Control Flow Information

– instructions within a hyperblock are not sequential. a more complex analysis is required→

● Predicate Hierarchy Graph (PHG)

– determine if two instructions can ever be executed in a single path

– if they can, then there is a control flow path between these two instructions

Page 33: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

Instruction Scheduling

● Hyperblock Scheduling

● Predicate Hierarchy Graph (PHG) example

ANDing p4 and p5p4∙p5 = (c1∙c2) ∙(~c1+c1 ∙~c2) = 0

 → there is no viable path between p4, p5

same path: ANDp4 = c1 ∙ c2 

multiple paths meet: ORp5 = ~c1 + c1 ∙ ~c2

Page 34: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

● Hyperblock Scheduling

● Hyperblock­Specific Optimizations

– similar to optimizations for superblocks

– instruction promotion● removes the dependence between the predicated instruction and the instruction 

which sets the corresponding predicate value

– instructions merging● combine two instructions in a hyperblock with complementary predicates into a 

single instruction

Page 35: Instruction Scheduling on VLIW Architecturesaces.snu.ac.kr/.../4541.775.8.VLIW.Scheduling.pdf · Trace Scheduling can increase ILP – side entrances are too complex to handle Superblock

● Summary

● Trace Scheduling can increase ILP 

– side entrances are too complex to handle

● Superblock Scheduling removes the side entrances from the trace

– weak point: unbiased branches

● Hyperblock Scheduling

– for programs without heavily biased branches, hyperblocks provide a more flexible framework

● Modulo Scheduling next class!→