45
CMPUT 680 - Compiler Des ign and Optimization 1 CMPUT229 - Fall 2003 Topic G: IA-64 Highlights José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680

CMPUT229 - Fall 2003

  • Upload
    thyra

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

CMPUT229 - Fall 2003. Topic G: IA-64 Highlights José Nelson Amaral http://www.cs.ualberta.ca/~amaral/courses/680. Some Highlights of the EPIC Architecture. Control Speculation Data Speculation Predication Rotating Registers Hardware-Supported Software Pipelining. ld8 r3=[r5] - PowerPoint PPT Presentation

Citation preview

Page 1: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

1

CMPUT229 - Fall 2003

Topic G: IA-64 HighlightsJosé Nelson Amaral

http://www.cs.ualberta.ca/~amaral/courses/680

Page 2: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

2

Some Highlights of the EPIC Architecture

Control SpeculationData SpeculationPredicationRotating RegistersHardware-Supported Software

Pipelining

Page 3: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

3

Control Speculation

br.cond.dptk L1

ld8 r3=[r5]shr r7=r3,r87

Before Control Speculation

ld8 r3=[r5]br.cond.dptk L1

chks r3=recoveryshr r7=r3,r87

After Control Speculation

Page 4: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

4

Data Speculation

An advanced load allows a load to be movedabove a store even if it is not known wetherthe load and the store may reference overlappingmemory locations.

st8 [r55]=r45 // r55 may or may not containld8 r3=[r5] ;; // the same address as r5 shr r7=r3,r87

ld8.a r3=[r5] ;; // Advanced Load// other, unrelated instructionsst8 [r55]=r45ld8.c r3=[r5] ;;shr r7=r3,r87

Page 5: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

5

Moving Up Loads + Uses: Recovery Code

st8 [r4] = r12 // cycle 0: ambiguous storeld8 r6 = [r8] ;; // cycle 0: load to advanceadd r5 = r6,r7 // cycle 2st8 [r18] = r5 // cycle 3

Original Code

ld8.a r6 = [r8] ;; // cycle -3add r5 = r6,r7 // cycle -1; add that uses r6st8 [r4]=r12 // cycle 0chk.a r6, recover // cycle 0: checkback: // Return point from jump to recoverst8 [r18] = r5 // cycle 0

recover:ld8 r6 = [r8] ;; // Reload r6 from [r8] add r5 = r6,r7 // Re-execute the addbr back // Jump back to main code

SpeculativeCode

Page 6: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

6

If-conversion

If-conversion uses predicates to transform aconditional code into a single control stream code.

if(r4) {add r1= r2, r3ld8 r6=[r5]

}

cmp.ne p1, p0=r4, 0 ;; // Set predicate reg(p1) add r1=r2, r3(p1) ld8 r6=[r5]

if(r1)r2 = r3 + r3

elser7 = r6 - r5

cmp.ne p1, p2 = r1, 0 ;; // Set predicate reg(p1) add r2 = r3, r4(p2) sub r7 = r6,r5

Page 7: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

7

In the old days….

for(k=1 ; k<=5 ; k++) y[k] = x[k]+1;

MIPS Assembly:# $ao = x[]# $a1 = y[]# $t0 = k

addi $t0, $zero, 1addi $t1, $zero, 5

Loop: sll $t2, $t0, 2add $t3, $a0, $t2lw $t4, 0($t3)addi $t4, $t4, 1add $t5, $a1, $t2sw $t4, 0($t5)addi $t0, 1ble $t0, $t1, Loop

Page 8: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

8

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x132 33 34 35 36 37 38

General Registers (Physical)

0 0116 17 18

Predicate Registers

4

LC

3

EC

x4x5

x1x2x3

Memory

39

32 33 34 35 36 37 38 39

General Registers (Logical)

0

RRB

Page 9: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

9

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0 0116 17 18

Predicate Registers

4

LC

3

EC

x4x5

x1x2x3

Memory

x132 33 34 35 36 37 38

General Registers (Physical)

39

32 33 34 35 36 37 38 39

General Registers (Logical)

0

RRB

Page 10: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

10

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0 0116 17 18

Predicate Registers

4

LC

3

EC

x4x5

x1x2x3

Memory

x132 33 34 35 36 37 38

General Registers (Physical)

39

32 33 34 35 36 37 38 39

General Registers (Logical)

0

RRB

Page 11: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

11

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0 0116 17 18

Predicate Registers

4

LC

3

EC

1

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

-1

RRB

Page 12: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

12

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

-1

RRB

Page 13: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

13

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2

-1

RRB

Page 14: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

14

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-1

RRB

Page 15: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

15

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-1

RRB

Page 16: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

16

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-1

RRB

Page 17: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

17

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

1

x4x5

x1x2x3

Memory

x134 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-2

RRB

Page 18: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

18

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3

Memory

x134 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

Page 19: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

19

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

y2

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3

Memory

34 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

Page 20: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

20

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3 y1

Memory

y234 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

Page 21: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

21

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3 y1

Memory

y234 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

Page 22: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

22

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 11

16 17 18

Predicate Registers

1

LC

3

EC

1

x4x5

x1x2x3 y1

Memory

-3

RRB

y235 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

Page 23: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

23

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

1

LC

3

EC

x4x5

x1x2x3 y1

Memory

-3

RRB

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

Page 24: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

24

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

1

LC

3

EC

x4x5

x1x2x3 y1

Memory

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-3

RRB

Page 25: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

25

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

1

LC

3

EC

x4x5

x1x2x3 y1

y2

Memory

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-3

RRB

Page 26: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

26

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

1

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2

Memory

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-3

RRB

Page 27: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

27

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1

x4x5

x1x2x3 y1

y2

Memory

-4

RRB

y2 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

Page 28: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

28

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2

Memory

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-4

RRB

Page 29: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

29

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2

Memory

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-4

RRB

Page 30: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

30

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

-4

RRB

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

Page 31: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

31

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-4

RRB

Page 32: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

32

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 x437 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

Page 33: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

33

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 x437 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

Page 34: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

34

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 x437 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

Page 35: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

35

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 y537 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

Page 36: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

36

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y537 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

Page 37: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

37

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y537 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

Page 38: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

38

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y536 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

Page 39: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

39

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y536 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

Page 40: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

40

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y536 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

Page 41: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

41

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y536 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

Page 42: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

42

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y536 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

Page 43: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

43

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y536 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

Page 44: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

44

Software Pipelining Example in the IA-64

0 0016 17 18

Predicate Registers

0

LC

0

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y537 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-7

RRB

Page 45: CMPUT229 - Fall 2003

CMPUT 680 - Compiler Design and Optimization

45

The Software Pipelining Branch Instruction

LC?

EC?

RRB--

LC--

PR[16]=1

branch

RRB--

EC--

PR[16]=0

EC

PR[16]=0

RRB--

EC--

PR[16]=0

fall-thru

= 0 (epilog)

>1

=1

=0 0(prolog/kernel)

LC = Loop CounterEC = Epilog CounterRRB = Rotating Register BasePR = Predicate Register