Upload
thyra
View
26
Download
0
Embed Size (px)
DESCRIPTION
CMPUT229 - Fall 2003. Topic G: IA-64 Highlights José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. Some Highlights of the EPIC Architecture. Control Speculation Data Speculation Predication Rotating Registers Hardware-Supported Software Pipelining. ld8 r3=[r5] - PowerPoint PPT Presentation
Citation preview
CMPUT 680 - Compiler Design and Optimization
1
CMPUT229 - Fall 2003
Topic G: IA-64 HighlightsJosé Nelson Amaral
http://www.cs.ualberta.ca/~amaral/courses/680
CMPUT 680 - Compiler Design and Optimization
2
Some Highlights of the EPIC Architecture
Control SpeculationData SpeculationPredicationRotating RegistersHardware-Supported Software
Pipelining
CMPUT 680 - Compiler Design and Optimization
3
Control Speculation
br.cond.dptk L1
ld8 r3=[r5]shr r7=r3,r87
Before Control Speculation
ld8 r3=[r5]br.cond.dptk L1
chks r3=recoveryshr r7=r3,r87
After Control Speculation
CMPUT 680 - Compiler Design and Optimization
4
Data Speculation
An advanced load allows a load to be movedabove a store even if it is not known wetherthe load and the store may reference overlappingmemory locations.
st8 [r55]=r45 // r55 may or may not containld8 r3=[r5] ;; // the same address as r5 shr r7=r3,r87
ld8.a r3=[r5] ;; // Advanced Load// other, unrelated instructionsst8 [r55]=r45ld8.c r3=[r5] ;;shr r7=r3,r87
CMPUT 680 - Compiler Design and Optimization
5
Moving Up Loads + Uses: Recovery Code
st8 [r4] = r12 // cycle 0: ambiguous storeld8 r6 = [r8] ;; // cycle 0: load to advanceadd r5 = r6,r7 // cycle 2st8 [r18] = r5 // cycle 3
Original Code
ld8.a r6 = [r8] ;; // cycle -3add r5 = r6,r7 // cycle -1; add that uses r6st8 [r4]=r12 // cycle 0chk.a r6, recover // cycle 0: checkback: // Return point from jump to recoverst8 [r18] = r5 // cycle 0
recover:ld8 r6 = [r8] ;; // Reload r6 from [r8] add r5 = r6,r7 // Re-execute the addbr back // Jump back to main code
SpeculativeCode
CMPUT 680 - Compiler Design and Optimization
6
If-conversion
If-conversion uses predicates to transform aconditional code into a single control stream code.
if(r4) {add r1= r2, r3ld8 r6=[r5]
}
cmp.ne p1, p0=r4, 0 ;; // Set predicate reg(p1) add r1=r2, r3(p1) ld8 r6=[r5]
if(r1)r2 = r3 + r3
elser7 = r6 - r5
cmp.ne p1, p2 = r1, 0 ;; // Set predicate reg(p1) add r2 = r3, r4(p2) sub r7 = r6,r5
CMPUT 680 - Compiler Design and Optimization
7
In the old days….
for(k=1 ; k<=5 ; k++) y[k] = x[k]+1;
MIPS Assembly:# $ao = x[]# $a1 = y[]# $t0 = k
addi $t0, $zero, 1addi $t1, $zero, 5
Loop: sll $t2, $t0, 2add $t3, $a0, $t2lw $t4, 0($t3)addi $t4, $t4, 1add $t5, $a1, $t2sw $t4, 0($t5)addi $t0, 1ble $t0, $t1, Loop
CMPUT 680 - Compiler Design and Optimization
8
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x132 33 34 35 36 37 38
General Registers (Physical)
0 0116 17 18
Predicate Registers
4
LC
3
EC
x4x5
x1x2x3
Memory
39
32 33 34 35 36 37 38 39
General Registers (Logical)
0
RRB
CMPUT 680 - Compiler Design and Optimization
9
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0 0116 17 18
Predicate Registers
4
LC
3
EC
x4x5
x1x2x3
Memory
x132 33 34 35 36 37 38
General Registers (Physical)
39
32 33 34 35 36 37 38 39
General Registers (Logical)
0
RRB
CMPUT 680 - Compiler Design and Optimization
10
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0 0116 17 18
Predicate Registers
4
LC
3
EC
x4x5
x1x2x3
Memory
x132 33 34 35 36 37 38
General Registers (Physical)
39
32 33 34 35 36 37 38 39
General Registers (Logical)
0
RRB
CMPUT 680 - Compiler Design and Optimization
11
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0 0116 17 18
Predicate Registers
4
LC
3
EC
1
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
-1
RRB
CMPUT 680 - Compiler Design and Optimization
12
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
-1
RRB
CMPUT 680 - Compiler Design and Optimization
13
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2
-1
RRB
CMPUT 680 - Compiler Design and Optimization
14
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-1
RRB
CMPUT 680 - Compiler Design and Optimization
15
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-1
RRB
CMPUT 680 - Compiler Design and Optimization
16
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-1
RRB
CMPUT 680 - Compiler Design and Optimization
17
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
1
x4x5
x1x2x3
Memory
x134 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-2
RRB
CMPUT 680 - Compiler Design and Optimization
18
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3
Memory
x134 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT 680 - Compiler Design and Optimization
19
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
y2
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3
Memory
34 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT 680 - Compiler Design and Optimization
20
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3 y1
Memory
y234 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT 680 - Compiler Design and Optimization
21
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3 y1
Memory
y234 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT 680 - Compiler Design and Optimization
22
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 11
16 17 18
Predicate Registers
1
LC
3
EC
1
x4x5
x1x2x3 y1
Memory
-3
RRB
y235 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
CMPUT 680 - Compiler Design and Optimization
23
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
1
LC
3
EC
x4x5
x1x2x3 y1
Memory
-3
RRB
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
CMPUT 680 - Compiler Design and Optimization
24
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
1
LC
3
EC
x4x5
x1x2x3 y1
Memory
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-3
RRB
CMPUT 680 - Compiler Design and Optimization
25
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
1
LC
3
EC
x4x5
x1x2x3 y1
y2
Memory
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-3
RRB
CMPUT 680 - Compiler Design and Optimization
26
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
1
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2
Memory
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-3
RRB
CMPUT 680 - Compiler Design and Optimization
27
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1
x4x5
x1x2x3 y1
y2
Memory
-4
RRB
y2 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
CMPUT 680 - Compiler Design and Optimization
28
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2
Memory
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-4
RRB
CMPUT 680 - Compiler Design and Optimization
29
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2
Memory
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-4
RRB
CMPUT 680 - Compiler Design and Optimization
30
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
-4
RRB
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
CMPUT 680 - Compiler Design and Optimization
31
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-4
RRB
CMPUT 680 - Compiler Design and Optimization
32
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 x437 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT 680 - Compiler Design and Optimization
33
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 x437 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT 680 - Compiler Design and Optimization
34
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 x437 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT 680 - Compiler Design and Optimization
35
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 y537 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT 680 - Compiler Design and Optimization
36
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y537 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT 680 - Compiler Design and Optimization
37
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y537 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT 680 - Compiler Design and Optimization
38
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y536 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
CMPUT 680 - Compiler Design and Optimization
39
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y536 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
CMPUT 680 - Compiler Design and Optimization
40
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y536 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
CMPUT 680 - Compiler Design and Optimization
41
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y536 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
CMPUT 680 - Compiler Design and Optimization
42
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y536 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
CMPUT 680 - Compiler Design and Optimization
43
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y536 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
CMPUT 680 - Compiler Design and Optimization
44
Software Pipelining Example in the IA-64
0 0016 17 18
Predicate Registers
0
LC
0
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y537 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-7
RRB
CMPUT 680 - Compiler Design and Optimization
45
The Software Pipelining Branch Instruction
LC?
EC?
RRB--
LC--
PR[16]=1
branch
RRB--
EC--
PR[16]=0
EC
PR[16]=0
RRB--
EC--
PR[16]=0
fall-thru
= 0 (epilog)
>1
=1
=0 0(prolog/kernel)
LC = Loop CounterEC = Epilog CounterRRB = Rotating Register BasePR = Predicate Register