97
Resource Constraints Resource Constraints

Resource Constraints

  • Upload
    axel

  • View
    45

  • Download
    1

Embed Size (px)

DESCRIPTION

Resource Constraints. Data Paths. Overview of paths. Two General purpose register file (A and B) Eight Functional Units (.L1,.L2,.M1,.M2,.S1,.S2,.D1,.D2) Two load from memory data paths (LD1 and LD2) Two store to memory data paths (ST1 and ST2) Two data address paths(DA1 and DA2) - PowerPoint PPT Presentation

Citation preview

Page 1: Resource Constraints

Resource ConstraintsResource Constraints

Page 2: Resource Constraints

Ajay Patil

Chapter 2, Slide 2

Data PathsData Paths

Page 3: Resource Constraints

Ajay Patil

Chapter 2, Slide 3

Overview of pathsOverview of paths Two General purpose register file (A Two General purpose register file (A

and B)and B) Eight Functional Units Eight Functional Units

(.L1,.L2,.M1,.M2,.S1,.S2,.D1,.D2)(.L1,.L2,.M1,.M2,.S1,.S2,.D1,.D2) Two load from memory data paths Two load from memory data paths

(LD1 and LD2)(LD1 and LD2) Two store to memory data paths (ST1 Two store to memory data paths (ST1

and ST2)and ST2) Two data address paths(DA1 and DA2)Two data address paths(DA1 and DA2) Two register file Data cross paths(1X Two register file Data cross paths(1X

and 2X)and 2X)

Page 4: Resource Constraints

Ajay Patil

Chapter 2, Slide 4

Two basicTwo basic

operations are requiredoperations are required

for this algorithm.for this algorithm.

(1)(1) MultiplicationMultiplication

(2) (2) Addition Addition

Therefore two basicTherefore two basic

instructions are requiredinstructions are required

Implementation of Sum of Products (SOP)Implementation of Sum of Products (SOP)

Y =Y =NN

aann xxnnn = 1n = 1

**So let’s implement the SOP So let’s implement the SOP algorithm!algorithm!

The implementation in this The implementation in this module will be done in module will be done in assembly.assembly.

= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN

Page 5: Resource Constraints

Ajay Patil

Chapter 2, Slide 5

Multiply (MPY)Multiply (MPY)

The The multiplicationmultiplication of of aa11 by x by x1 1 is done in is done in assembly by the following instruction:assembly by the following instruction:

MPYMPY a1, x1, Ya1, x1, Y

This instruction is performed by a This instruction is performed by a multiplier unit that is called “.M”multiplier unit that is called “.M”

Y =Y =NN

aann xxnnn = 1n = 1

**

= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN

Page 6: Resource Constraints

Ajay Patil

Chapter 2, Slide 6

Multiply (.M unit)Multiply (.M unit)

.M.M.M.M

Y =Y =4040

aann x xnnn = 1n = 1

**

The . M unit performs multiplications in The . M unit performs multiplications in hardware hardware

MPYMPY .M.M a1, x1, Ya1, x1, Y

Note: 16-bit by 16-bit multiplier provides a 32-bit result.Note: 16-bit by 16-bit multiplier provides a 32-bit result.

32-bit by 32-bit multiplier provides a 64-bit result.32-bit by 32-bit multiplier provides a 64-bit result.

Page 7: Resource Constraints

Ajay Patil

Chapter 2, Slide 7

Addition (.?)Addition (.?)

.M.M.M.M

.?.?.?.?

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .?.? Y, prod, YY, prod, Y

Page 8: Resource Constraints

Ajay Patil

Chapter 2, Slide 8

Add (.L unit)Add (.L unit)

.M.M.M.M

.L.L.L.L

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

RISC processors such as the C6000 use registers to RISC processors such as the C6000 use registers to hold the operands, so lets change this code.hold the operands, so lets change this code.

Page 9: Resource Constraints

Ajay Patil

Chapter 2, Slide 9

Register File - ARegister File - A

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Let us correct this by replacing a, x, prod and Y by the Let us correct this by replacing a, x, prod and Y by the registers as shown above.registers as shown above.

Page 10: Resource Constraints

Ajay Patil

Chapter 2, Slide 10

Specifying Register NamesSpecifying Register Names

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

The registers A0, A1, A3 and A4 contain the values to The registers A0, A1, A3 and A4 contain the values to be used by the instructions.be used by the instructions.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Page 11: Resource Constraints

Ajay Patil

Chapter 2, Slide 11

Specifying Register NamesSpecifying Register Names

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

Register File A contains 16 registers (A0 -A15) which Register File A contains 16 registers (A0 -A15) which are 32-bits wide.are 32-bits wide.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Page 12: Resource Constraints

Ajay Patil

Chapter 2, Slide 12

Data loadingData loading

Q: How do we load the Q: How do we load the operands into the registers?operands into the registers?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Page 13: Resource Constraints

Ajay Patil

Chapter 2, Slide 13

Load Unit “.D”Load Unit “.D”

A: The operands are loaded A: The operands are loaded into the registers by loading into the registers by loading them from the memory them from the memory using the .D unit.using the .D unit.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Q: How do we load the Q: How do we load the operands into the registers?operands into the registers?

Page 14: Resource Constraints

Ajay Patil

Chapter 2, Slide 14

Load Unit “.D”Load Unit “.D”

It is worth noting at this It is worth noting at this stage that the only way to stage that the only way to access memory is through the access memory is through the .D unit..D unit.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Page 15: Resource Constraints

Ajay Patil

Chapter 2, Slide 15

Load InstructionLoad Instruction

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading operands used for loading operands from the memory to the from the memory to the registers?registers?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Page 16: Resource Constraints

Ajay Patil

Chapter 2, Slide 16

Load Instructions (LDB, LDH,LDW,LDDW)Load Instructions (LDB, LDH,LDW,LDDW)

A: The load instructions.A: The load instructions..M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading operands used for loading operands from the memory to the from the memory to the registers?registers?

Page 17: Resource Constraints

Ajay Patil

Chapter 2, Slide 17

Using the Load InstructionsUsing the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

Before using the load unit you Before using the load unit you have to be aware that this have to be aware that this processor is byte addressable, processor is byte addressable, which means that each byte is which means that each byte is represented by a unique represented by a unique address.address.

Also the addresses are 32-bit Also the addresses are 32-bit wide.wide.

addressaddress

FFFFFFFFFFFFFFFF

Page 18: Resource Constraints

Ajay Patil

Chapter 2, Slide 18

The syntax for the load The syntax for the load instruction is:instruction is:

Where:Where:

Rn is a register that contains the Rn is a register that contains the address of the operand to be address of the operand to be loaded loaded

andand

Rm is the destination register.Rm is the destination register.

Using the Load InstructionsUsing the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

a1a1x1x1

prodprod

16-bits16-bits

YY

addressaddress

FFFFFFFFFFFFFFFF

LD *Rn,RmLD *Rn,Rm

Page 19: Resource Constraints

Ajay Patil

Chapter 2, Slide 19

The syntax for the load The syntax for the load instruction is:instruction is:

The question now is how many The question now is how many bytes are going to be loaded bytes are going to be loaded into the destination register?into the destination register?

Using the Load InstructionsUsing the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

a1a1x1x1

prodprod

16-bits16-bits

YY

addressaddress

FFFFFFFFFFFFFFFF

LD *Rn,RmLD *Rn,Rm

Page 20: Resource Constraints

Ajay Patil

Chapter 2, Slide 20

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Using the Load InstructionsUsing the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

a1a1x1x1

prodprod

16-bits16-bits

YY

addressaddress

FFFFFFFFFFFFFFFF

The answer, is that it depends The answer, is that it depends on the instruction you choose:on the instruction you choose:• LDB: loads one byte (8-bit)LDB: loads one byte (8-bit)

• LDH: loads half word (16-bit)LDH: loads half word (16-bit)

• LDW: loads a word (32-bit)LDW: loads a word (32-bit)

• LDDW: loads a double word (64-LDDW: loads a double word (64-bit)bit)

Note: LD on its own does not Note: LD on its own does not existexist..

Page 21: Resource Constraints

Ajay Patil

Chapter 2, Slide 21

Using the Load InstructionsUsing the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Example:Example:

If we assume that A5 = 0x4 then:If we assume that A5 = 0x4 then:

(1) (1) LDB *A5, A7 ; gives A7 = 0x00000001LDB *A5, A7 ; gives A7 = 0x00000001

(2) (2) LDH *A5,A7; gives A7 = 0x00000201LDH *A5,A7; gives A7 = 0x00000201

(3) (3) LDW *A5,A7; gives A7 = 0x04030201LDW *A5,A7; gives A7 = 0x04030201

(4) (4) LDDW *A5,A7:A6; gives A7:A6 = LDDW *A5,A7:A6; gives A7:A6 = 0x08070605040302010x0807060504030201

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn , RmLD *Rn , Rm

0011

Page 22: Resource Constraints

Ajay Patil

Chapter 2, Slide 22

Using the Load InstructionsUsing the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Question: Question:

If data can only be accessed by the If data can only be accessed by the load instruction and the .D unit, load instruction and the .D unit, how can we load the register how can we load the register pointer Rn in the first place?pointer Rn in the first place?

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Page 23: Resource Constraints

Ajay Patil

Chapter 2, Slide 23

The instruction MVKL will allow a move of a The instruction MVKL will allow a move of a 16-bit constant into a register as shown below:16-bit constant into a register as shown below:

MVKLMVKL .?.? a, A5a, A5

(‘a’ is a constant or label)(‘a’ is a constant or label) How many bits represent a full address?How many bits represent a full address?

32 bits32 bits So why does the instruction not allow a 32-bit So why does the instruction not allow a 32-bit

move?move?

All instructions are 32-bit wide (see All instructions are 32-bit wide (see instruction opcode)instruction opcode)

Loading the Pointer RnLoading the Pointer Rn

Page 24: Resource Constraints

Ajay Patil

Chapter 2, Slide 24

To solve this problem another instruction To solve this problem another instruction is available:is available:

MVKHMVKH

Loading the Pointer RnLoading the Pointer Rn

eg.eg. MVKHMVKH .?.? a, A5a, A5

(‘a’ is a constant or label) (‘a’ is a constant or label)

ahah

ahah xx

alal aa

A5A5

MVKL MVKL a, A5a, A5

MVKH MVKH a, A5a, A5

Finally, to move the 32-bit address to a Finally, to move the 32-bit address to a register we can use:register we can use:

Page 25: Resource Constraints

Ajay Patil

Chapter 2, Slide 25

Loading the Pointer RnLoading the Pointer Rn

MVKLMVKL 0x1234FABC, A50x1234FABC, A5

A5 = 0xFFFFFABC ; WrongA5 = 0xFFFFFABC ; Wrong

Example 1 Example 1

A5 = 0x87654321 A5 = 0x87654321

MVKLMVKL 0x1234FABC, A50x1234FABC, A5

A5 = 0xFFFFFABC (sign extension)A5 = 0xFFFFFABC (sign extension)

MVKHMVKH 0x1234FABC, A50x1234FABC, A5

A5 = 0x1234FABC ; OKA5 = 0x1234FABC ; OK

Example 2Example 2

MVKHMVKH 0x1234FABC, A50x1234FABC, A5

A5 = 0x12344321A5 = 0x12344321

Always use MVKL then MVKH, look at Always use MVKL then MVKH, look at the following examples:the following examples:

Page 26: Resource Constraints

Ajay Patil

Chapter 2, Slide 26

LDH, MVKL and MVKHLDH, MVKL and MVKH

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

MVKL MVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

Page 27: Resource Constraints

Ajay Patil

Chapter 2, Slide 27

Creating a loopCreating a loop

MVKL MVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 ** x x11

So let’s create a loop So let’s create a loop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.

Page 28: Resource Constraints

Ajay Patil

Chapter 2, Slide 28

Creating a loopCreating a loop

With the C6000 processors With the C6000 processors there are no dedicated there are no dedicated

instructions such as block instructions such as block repeat. The loop is created repeat. The loop is created

using the B instruction.using the B instruction.

So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 ** x x11

So let’s create a loop So let’s create a loop so that we can so that we can implement the SOP implement the SOP for N Tapsfor N Taps

Page 29: Resource Constraints

Ajay Patil

Chapter 2, Slide 29

What are the steps for creating a loopWhat are the steps for creating a loop

1. 1. Create a label to branch to.Create a label to branch to.

2. 2. Add a branch instruction, B.Add a branch instruction, B.

3.3. Create a loop counter.Create a loop counter.

4. 4. Add an instruction to decrement the loop counter.Add an instruction to decrement the loop counter.

5. 5. Make the branch conditional based on the value in Make the branch conditional based on the value in

the loop counter.the loop counter.

Page 30: Resource Constraints

Ajay Patil

Chapter 2, Slide 30

1. Create a label to branch to1. Create a label to branch to

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4 A4, A3, A4

Page 31: Resource Constraints

Ajay Patil

Chapter 2, Slide 31

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .?.? looploop

2. Add a branch instruction, B.2. Add a branch instruction, B.

Page 32: Resource Constraints

Ajay Patil

Chapter 2, Slide 32

Which unit is used by the B instruction?Which unit is used by the B instruction?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

.S.S.S.S

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .?.? looploop

Page 33: Resource Constraints

Ajay Patil

Chapter 2, Slide 33

Data MemoryData Memory

Which unit is used by the B instruction?Which unit is used by the B instruction?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S.S pt1, A5 pt1, A5

MVKL MVKL .S.S pt2, A6pt2, A6 MVKH MVKH .S.S pt2, A6pt2, A6

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .S.S looploop

Page 34: Resource Constraints

Ajay Patil

Chapter 2, Slide 34

Data MemoryData Memory

3. Create a loop counter.3. Create a loop counter.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH .S pt1, A5 MVKH .S pt1, A5

MVKL .S MVKL .S pt2, A6pt2, A6 MVKH .S MVKH .S pt2, A6pt2, A6

MVKLMVKL .S.S count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .S.S looploop

B registers will be introduced laterB registers will be introduced later

Page 35: Resource Constraints

Ajay Patil

Chapter 2, Slide 35

4. Decrement the loop counter4. Decrement the loop counter

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH .S MVKH .S pt1, A5 pt1, A5

MVKL .S MVKL .S pt2, A6pt2, A6 MVKH .S MVKH .S pt2, A6pt2, A6

MVKLMVKL .S.S count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

BB .S.S looploop

Page 36: Resource Constraints

Ajay Patil

Chapter 2, Slide 36

What is the syntax for making instruction conditional?What is the syntax for making instruction conditional?

[[conditioncondition] ] InstructionInstruction LabelLabel

e.g.e.g.

[B1][B1] BB looploop

(1) (1) The The conditioncondition can be one of the following can be one of the following registers: registers: A1, A2, B0, B1, B2.A1, A2, B0, B1, B2.

(2) (2) Any instruction can be conditional.Any instruction can be conditional.

5. 5. Make the branch conditional based on the Make the branch conditional based on the value in the loop counter value in the loop counter

Page 37: Resource Constraints

Ajay Patil

Chapter 2, Slide 37

The condition can be inverted by adding the exclamation symbol “!” as follows:The condition can be inverted by adding the exclamation symbol “!” as follows:

[[!!condition] condition] InstructionInstruction LabelLabel

e.g.e.g.

[[!!B0]B0] BB loop ;branch if B0 = 0loop ;branch if B0 = 0

[B0][B0] BB looploop ;branch if B0 != 0 ;branch if B0 != 0

5. 5. Make the branch conditional basedothe Make the branch conditional basedothe value in the loop counter value in the loop counter

Page 38: Resource Constraints

Ajay Patil

Chapter 2, Slide 38

Data MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 MVKH .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 pt2, A6.S2 pt2, A6 MVKH MVKH .S2 pt2, A6.S2 pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

5. Make the branch conditional5. Make the branch conditional

Page 39: Resource Constraints

Ajay Patil

Chapter 2, Slide 39

Case 1: Case 1: B .S1B .S1 labellabel Relative branch.Relative branch.

Label limited to +/- 2Label limited to +/- 22020 offset. offset.

More on the Branch Instruction (1)More on the Branch Instruction (1)

With this processor all the instructions are With this processor all the instructions are encoded in a 32-bit.encoded in a 32-bit.

Therefore the label must have a dynamic range Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be of less than 32-bit as the instruction B has to be coded.coded.

21-bit relative address21-bit relative addressBB

32-bit32-bit

Page 40: Resource Constraints

Ajay Patil

Chapter 2, Slide 40

More on the Branch Instruction (2)More on the Branch Instruction (2)

By specifying a register as an operand instead By specifying a register as an operand instead of a label, it is possible to have an absolute of a label, it is possible to have an absolute branch.branch.

This will allow a dynamic range of 2This will allow a dynamic range of 23232..

Case 2: Case 2: B .SB .S22 registerregister Absolute branch.Absolute branch.

Operates on .S2 ONLY!Operates on .S2 ONLY!

5-bit register 5-bit register codecodeBB

32-bit32-bit

Page 41: Resource Constraints

Ajay Patil

Chapter 2, Slide 41

Testing the codeTesting the code

This code performs the followingThis code performs the following

operations:operations:

aa00*x*x00 + a + a00*x*x00 + a + a00*x*x00 + … + a + … + a00*x*x00

However, we would like to perform:However, we would like to perform:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + … + a + … + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 MVKH .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 42: Resource Constraints

Ajay Patil

Chapter 2, Slide 42

Modifying the pointersModifying the pointers

The solution is to modify the pointers The solution is to modify the pointers

A5 and A6.A5 and A6.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 MVKH .S2 pt1, A5 pt1, A5

MVKL .S2 MVKL .S2 pt2, A6pt2, A6 MVKH .S2 pt2, A6MVKH .S2 pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 43: Resource Constraints

Ajay Patil

Chapter 2, Slide 43

Indexing PointersIndexing Pointers

DescriptionDescription

PointerPointer

SyntaxSyntax PointerPointerModifiedModified

**RR NoNo

R R can be any registercan be any register

In this case the pointers are used but not modified.In this case the pointers are used but not modified.

Page 44: Resource Constraints

Ajay Patil

Chapter 2, Slide 44

Indexing PointersIndexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

NoNo

NoNo

NoNo

[[ disp] specifies the number of elements size in DW (64-bit), disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit).W (32-bit), H (16-bit), or B (8-bit).

disp = disp = RR or 5-bit constant. or 5-bit constant. R R can be any register.can be any register.

In this case the pointers are modified In this case the pointers are modified BEFOREBEFORE being used being used

and and RESTOREDRESTORED to their previous values. to their previous values.

Page 45: Resource Constraints

Ajay Patil

Chapter 2, Slide 45

Indexing PointersIndexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

In this case the pointers are modified In this case the pointers are modified BEFOREBEFORE being used and being used and NOTNOT RESTORED to their Previous Values. RESTORED to their Previous Values.

Page 46: Resource Constraints

Ajay Patil

Chapter 2, Slide 46

Indexing PointersIndexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

Post-incrementPost-increment

Post-decrementPost-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

*R++[*R++[dispdisp]]

*R--[*R--[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

In this case the pointers are modified In this case the pointers are modified AFTERAFTER being used being used

and and NOTNOT RESTORED to their Previous Values. RESTORED to their Previous Values.

Page 47: Resource Constraints

Ajay Patil

Chapter 2, Slide 47

Indexing PointersIndexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

Post-incrementPost-increment

Post-decrementPost-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

*R++[*R++[dispdisp]]

*R--[*R--[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

[disp] specifies # elements - size in DW, W, H, or B.[disp] specifies # elements - size in DW, W, H, or B. disp = disp = RR or 5-bit constant. or 5-bit constant. R R can be any register.can be any register.

Page 48: Resource Constraints

Ajay Patil

Chapter 2, Slide 48

Modify and testing the codeModify and testing the code

This code now performs the followingThis code now performs the following

operations:operations:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + ... + a + ... + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 MVKH .S2 pt1, A5 pt1, A5

MVKL .S2 pt2, A6MVKL .S2 pt2, A6 MVKH .S2 pt2, A6MVKH .S2 pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5++*A5++, A0, A0

LDHLDH .D.D *A6++*A6++, A1, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 49: Resource Constraints

Ajay Patil

Chapter 2, Slide 49

Store the final resultStore the final result

This code now performs the followingThis code now performs the following

operations:operations:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + ... + a + ... + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 MVKH .S2 pt1, A5 pt1, A5

MVKL .S2 MVKL .S2 pt2, A6 pt2, A6 MVKH .S2 pt2, A6MVKH .S2 pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop LDHloop LDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 50: Resource Constraints

Ajay Patil

Chapter 2, Slide 50

Store the final resultStore the final result

The Pointer A7 has not been initialised.The Pointer A7 has not been initialised.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 pt1, A5 MVKH .S2 pt1, A5

MVKL .S2 MVKL .S2 pt2, A6pt2, A6 MVKH .S2 pt2, A6MVKH .S2 pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop LDHloop LDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 51: Resource Constraints

Ajay Patil

Chapter 2, Slide 51

Store the final resultStore the final result

The Pointer A7 is now initialised.The Pointer A7 is now initialised.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 MVKH .S2 pt1, A5 pt1, A5

MVKL .S2 MVKL .S2 pt2, A6pt2, A6 MVKH .S2 MVKH .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0

loop LDHloop LDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 52: Resource Constraints

Ajay Patil

Chapter 2, Slide 52

What is the initial value of A4?What is the initial value of A4?

A4 is used as an accumulator,A4 is used as an accumulator,

so it needs to be reset to zero.so it needs to be reset to zero.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH .S2 MVKH .S2 pt1, A5 pt1, A5

MVKL .S2 MVKL .S2 pt2, A6pt2, A6 MVKH .S2 pt2, A6MVKH .S2 pt2, A6

MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0ZEROZERO .L.L A4A4

loop LDHloop LDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 53: Resource Constraints

Ajay Patil

Chapter 2, Slide 53

How can we add How can we add more processing more processing

power to this power to this processor?processor?

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

Data MemoryData Memory

A15A15

32-bits32-bits

Increasing the processing power!Increasing the processing power!

Page 54: Resource Constraints

Ajay Patil

Chapter 2, Slide 54

(1)(1) Increase the clock Increase the clock frequency.frequency.

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

Data MemoryData Memory

A15A15

32-bits32-bits

Increasing the processing power!Increasing the processing power!

(2)(2) Increase the number Increase the number of Processing units.of Processing units.

Page 55: Resource Constraints

Ajay Patil

Chapter 2, Slide 55

To increase the Processing Power, this processor has two To increase the Processing Power, this processor has two sides (A and B or 1 and 2)sides (A and B or 1 and 2)

Data MemoryData Memory

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

A15A15

32-bits32-bits

.S2.S2.S2.S2

.M2.M2.M2.M2

.L2.L2.L2.L2

.D2.D2.D2.D2

B0B0B1B1B2B2B3B3B4B4

Register File BRegister File B

..

..

..

B15B15

32-bits32-bits

Page 56: Resource Constraints

Ajay Patil

Chapter 2, Slide 56

Can the two sides exchange operands in order to increase Can the two sides exchange operands in order to increase performance?performance?

Data MemoryData Memory

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

A15A15

32-bits32-bits

B15B15

.S2.S2.S2.S2

.M2.M2.M2.M2

.L2.L2.L2.L2

.D2.D2.D2.D2

B0B0B1B1B2B2B3B3B4B4

Register File BRegister File B

..

..

..

32-bits32-bits

Page 57: Resource Constraints

Ajay Patil

Chapter 2, Slide 57

The answer is YES but there are limitations.The answer is YES but there are limitations.

To exchange operands between the two To exchange operands between the two sides, some cross paths or links are sides, some cross paths or links are required.required.

What is a cross path?What is a cross path? A cross path links one side of the CPU to A cross path links one side of the CPU to

the other.the other. There are two types of cross paths:There are two types of cross paths:

Data Data cross paths.cross paths. Address Address cross paths.cross paths.

Page 58: Resource Constraints

Ajay Patil

Chapter 2, Slide 58

Constraints on instructions using same function Constraints on instructions using same function unitsunits

No two instructions within the same No two instructions within the same execute packet can use the same execute packet can use the same resources. Also, no two instructions can resources. Also, no two instructions can write to the same register during the write to the same register during the same cycle.same cycle.

Two instructions using the same Two instructions using the same functional unit cannot be issued in the functional unit cannot be issued in the same execute packet.same execute packet.

Page 59: Resource Constraints

Ajay Patil

Chapter 2, Slide 59

ADD .S1 A0, A1, A2 ;ADD .S1 A0, A1, A2 ;

|| SHR .S1 A3, 15, A4 ; || SHR .S1 A3, 15, A4 ;

The above execute packet is invalid:The above execute packet is invalid:

Reason:- Both use same function unit (.s1)Reason:- Both use same function unit (.s1)

ADD .L1 A0, A1, A2 ; ADD .L1 A0, A1, A2 ;

|| SHR .S1 A3, 15, A4 ; || SHR .S1 A3, 15, A4 ;

The above execute packet is valid:The above execute packet is valid:

Reason:- Both use different function unit Reason:- Both use different function unit

Page 60: Resource Constraints

Ajay Patil

Chapter 2, Slide 60

Constraints on Cross Paths (1X and 2X)Constraints on Cross Paths (1X and 2X)

One unit (either a .S, .L, or .M unit) per One unit (either a .S, .L, or .M unit) per data path, per execute packet, can read a data path, per execute packet, can read a source operand from its opposite register source operand from its opposite register file via the cross paths (1X and 2X).file via the cross paths (1X and 2X).

Two instructions using the same cross path Two instructions using the same cross path between register files cannot be issued in between register files cannot be issued in the same execute packetthe same execute packet

Page 61: Resource Constraints

Ajay Patil

Chapter 2, Slide 61

ADD .L1X A0,B1,A1 ; ADD .L1X A0,B1,A1 ;

|| MPY .M2X B4,A4,B2 ;|| MPY .M2X B4,A4,B2 ;

The above execute packet is valid:The above execute packet is valid:

Reason:-Both use different cross pathReason:-Both use different cross path

ADD .L1X A0,B1,A1 ; ADD .L1X A0,B1,A1 ;

|| MPY .M1X A4,B4,A5 ; || MPY .M1X A4,B4,A5 ;

The above execute packet is invalid:The above execute packet is invalid:

Reason:-Both use same cross pathReason:-Both use same cross path

Page 62: Resource Constraints

Ajay Patil

Chapter 2, Slide 62

Constraints on Loads and StoresConstraints on Loads and Stores

Load/store instructions can use an Load/store instructions can use an address pointer from one register file address pointer from one register file while loading to or storing from the other while loading to or storing from the other register file.register file.

Two load/store instructions using a Two load/store instructions using a destination/source from the same register destination/source from the same register file cannot be issued in the same execute file cannot be issued in the same execute packetpacket

Page 63: Resource Constraints

Ajay Patil

Chapter 2, Slide 63

LDW .D1 *A0,A1 ; LDW .D1 *A0,A1 ;

|| LDW .D2 *A2,B2 ;|| LDW .D2 *A2,B2 ;

The above execute packet is invalid:The above execute packet is invalid:

Reason:-.D2 unit must use the address Reason:-.D2 unit must use the address register from the B register file.register from the B register file.

LDW .D1 *A4,A5 ; LDW .D1 *A4,A5 ;

|| STW .D2 A6,*B4 ;|| STW .D2 A6,*B4 ;

The above execute packet is invalid:The above execute packet is invalid:

Reason:-Loading to and storing from the Reason:-Loading to and storing from the same register filesame register file

Page 64: Resource Constraints

Ajay Patil

Chapter 2, Slide 64

LDW .D1 *A4,B5 ; LDW .D1 *A4,B5 ;

|| STW .D2 A6,*B4 ;|| STW .D2 A6,*B4 ;

The above execute packets are validThe above execute packets are valid

Reason:-Loading to, and storing from Reason:-Loading to, and storing from different register filesdifferent register files

LDW .D1 *A0,A1 ; LDW .D1 *A0,A1 ;

|| LDW .D2 *B0,B2 ;|| LDW .D2 *B0,B2 ;

The above execute packets are validThe above execute packets are valid

Reason:- Address registers from correct Reason:- Address registers from correct register filesregister files

Page 65: Resource Constraints

Ajay Patil

Chapter 2, Slide 65

Constraints on Long (40-Bit) DataConstraints on Long (40-Bit) Data

Because the .S and .L units share a read Because the .S and .L units share a read register port for long source operands and register port for long source operands and a write register port for long results, only a write register port for long results, only one long result may be issued per register one long result may be issued per register file in an execute packetfile in an execute packet

The .L and .S units share their long read The .L and .S units share their long read port with the store port, operations that port with the store port, operations that read a long value cannot be issued on read a long value cannot be issued on the .L and/or .S units in the same execute the .L and/or .S units in the same execute packet as a store.packet as a store.

Page 66: Resource Constraints

Ajay Patil

Chapter 2, Slide 66

ADD .L1 A5:A4,A1,A3:A2 ADD .L1 A5:A4,A1,A3:A2

|| SHL .S1 A8,A9,A7:A6 || SHL .S1 A8,A9,A7:A6

The above execute packet is invalid:The above execute packet is invalid:

Reason:- Two long writes on register file A.Reason:- Two long writes on register file A.

ADD .L1 A5:A4,A1,A3:A2 ADD .L1 A5:A4,A1,A3:A2

|| SHL .S2 B8,B9,B7:B6 || SHL .S2 B8,B9,B7:B6

The above execute packet is valid:The above execute packet is valid:

Reason:- One long write for each register fileReason:- One long write for each register file

Page 67: Resource Constraints

Ajay Patil

Chapter 2, Slide 67

ADD .L1 A5:A4,A1,A3:A2 ADD .L1 A5:A4,A1,A3:A2

|| STW .D1 A8,*A9 || STW .D1 A8,*A9

The above execute packet is invalidThe above execute packet is invalid Reason:-Long read operation and a Reason:-Long read operation and a

storestore

ADD.L1 A4, A1, A3:A2 ADD.L1 A4, A1, A3:A2

|| STW.D1 A8,*A9|| STW.D1 A8,*A9

The above execute packet is validThe above execute packet is valid Reason:- No Long read operation with a Reason:- No Long read operation with a

storestore

Page 68: Resource Constraints

Ajay Patil

Chapter 2, Slide 68

Constraints on Register ReadsConstraints on Register Reads

More than four reads of the same More than four reads of the same register cannot occur on the same cycle. register cannot occur on the same cycle. Conditional registers are not included in Conditional registers are not included in this count.this count.

MPY .M1 A1,A1,A4 MPY .M1 A1,A1,A4

|| ADD .L1 A1,A1,A5|| ADD .L1 A1,A1,A5

|| SUB .D1 A1,A2,A3|| SUB .D1 A1,A2,A3 The above code sequences are invalidThe above code sequences are invalid

Page 69: Resource Constraints

Ajay Patil

Chapter 2, Slide 69

MPY .M1 A1,A1,A4 ; MPY .M1 A1,A1,A4 ;

|| [A1] || [A1] ADD .L1 A0,A1,A5ADD .L1 A0,A1,A5

|| || SUB .D1 A1,A2,A3 SUB .D1 A1,A2,A3

This code sequence is validThis code sequence is valid

Page 70: Resource Constraints

Ajay Patil

Chapter 2, Slide 70

Constraints on Register WritesConstraints on Register Writes

Two instructions cannot write to the same Two instructions cannot write to the same register on the same cycle. register on the same cycle.

Two instructions with the same Two instructions with the same destination can be scheduled in parallel as destination can be scheduled in parallel as long as they do not write to the destination long as they do not write to the destination register on the same cycleregister on the same cycle

Page 71: Resource Constraints

Ajay Patil

Chapter 2, Slide 71

Data Data Cross PathsCross Paths

Data cross paths can also be referred to Data cross paths can also be referred to as register file cross paths.as register file cross paths.

These cross paths allow operands from These cross paths allow operands from one side to be used by the other side.one side to be used by the other side.

There are only two cross paths:There are only two cross paths: one path which conveys data from side B one path which conveys data from side B

to side A, 1X.to side A, 1X. one path which conveys data from side A one path which conveys data from side A

to side B, 2X.to side B, 2X.

Page 72: Resource Constraints

Ajay Patil

Chapter 2, Slide 72

DataData Cross Paths Cross Paths

Data cross paths only apply to the .L, .S Data cross paths only apply to the .L, .S and .M units.and .M units.

The data cross paths are very useful, The data cross paths are very useful, however there are some limitations in however there are some limitations in their use.their use.

Page 73: Resource Constraints

Ajay Patil

Chapter 2, Slide 73

DataData Cross Path Limitations Cross Path Limitations

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

(1) The destination register must be on same side as unit.

(2) Source registers - up to one cross path per execute packet per side.

Execute packetExecute packet: group of instructions that : group of instructions that execute simultaneously.execute simultaneously.

Page 74: Resource Constraints

Ajay Patil

Chapter 2, Slide 74

Data Cross Path Limitations Cross Path Limitations

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

eg:ADD .L1x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8

|| ADD .L1x A0,B0,A2

|||| Means that the SUB and ADD Means that the SUB and ADD belong to the same fetch packet, belong to the same fetch packet, therefore execute therefore execute simultaneously.simultaneously.

Page 75: Resource Constraints

Ajay Patil

Chapter 2, Slide 75

Data Data Cross Path LimitationsCross Path Limitations

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

eg:ADD .L1x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8

|| ADD .L1x A0,B0,A2

NOT VALID!NOT VALID!

Page 76: Resource Constraints

Ajay Patil

Chapter 2, Slide 76

Data Data Cross Paths for both sidesCross Paths for both sides

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

.L2.L2

.M2.M2

.S2.S2

<dst><dst>

<src><src>

<src><src>

Page 77: Resource Constraints

Ajay Patil

Chapter 2, Slide 77

AddressAddress cross paths cross paths

.D1.D1AA

AddrAddr

DataData

LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5STW.D1T1 A5,*A0STW.D1T1 A5,*A0LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5STW.D1T1 A5,*A0STW.D1T1 A5,*A0

(1) The pointer must be on the same side of the unit.

Page 78: Resource Constraints

Ajay Patil

Chapter 2, Slide 78

Load or store to either sideLoad or store to either side

.D1.D1AA

*A0*A0

BB

Data1Data1 A5A5

Data2Data2 B5B5

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5

Page 79: Resource Constraints

Ajay Patil

Chapter 2, Slide 79

Standard Parallel LoadsStandard Parallel Loads

.D1.D1AA

A5A5

*A0*A0

BBB5B5

.D2.D2

Data1Data1

*B0*B0

LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5|| LDW.D2T2 *B0,B5|| LDW.D2T2 *B0,B5 LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5|| LDW.D2T2 *B0,B5|| LDW.D2T2 *B0,B5

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 80: Resource Constraints

Ajay Patil

Chapter 2, Slide 80

Parallel Load/Store using address cross pathsParallel Load/Store using address cross paths

.D1.D1AA

A5A5

*A0*A0

BBB5B5

.D2.D2

Data1Data1

*B0*B0

LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5|| STW.D2T1 A5,*B0|| STW.D2T1 A5,*B0 LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5|| STW.D2T1 A5,*B0|| STW.D2T1 A5,*B0

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 81: Resource Constraints

Ajay Patil

Chapter 2, Slide 81

Fill the blanks ... Does this work?Fill the blanks ... Does this work?

.D1.D1AA

*A0*A0

BB

.D2.D2

Data1Data1

*B0*B0

LDW.D1__ *A0,B5LDW.D1__ *A0,B5|| STW.D2__ B6,*B0|| STW.D2__ B6,*B0 LDW.D1__ *A0,B5LDW.D1__ *A0,B5|| STW.D2__ B6,*B0|| STW.D2__ B6,*B0

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 82: Resource Constraints

Ajay Patil

Chapter 2, Slide 82

Not Allowed!Not Allowed!

.D1.D1AA

*A0*A0

BBB5B5

B6B6

.D2.D2

Data1Data1

*B0*B0

LDW.D1LDW.D1T2T2 *A0,B5 *A0,B5|| STW.D2|| STW.D2T2T2 B6,*B0 B6,*B0 LDW.D1LDW.D1T2T2 *A0,B5 *A0,B5|| STW.D2|| STW.D2T2T2 B6,*B0 B6,*B0

DA2 = T2DA2 = T2

Page 83: Resource Constraints

Ajay Patil

Chapter 2, Slide 83

Not Allowed!Not Allowed! Parallel accesses: both cross or neither cross Parallel accesses: both cross or neither cross

.D1.D1AA

*A0*A0

BBB5B5

B6B6

.D2.D2

Data1Data1

*B0*B0

LDW.D1LDW.D1T2T2 *A0,B5 *A0,B5|| STW.D2|| STW.D2T2T2 B6,*B0 B6,*B0 LDW.D1LDW.D1T2T2 *A0,B5 *A0,B5|| STW.D2|| STW.D2T2T2 B6,*B0 B6,*B0

DA2 = T2DA2 = T2

Page 84: Resource Constraints

Ajay Patil

Chapter 2, Slide 84

ConditionsConditions Don’t Use Cross Paths Don’t Use Cross Paths

If a conditional register comes from the If a conditional register comes from the opposite side, it does opposite side, it does NOTNOT use a data or use a data or address cross-path.address cross-path.

Examples:Examples:

[B2][B2] ADD .L1 A2,A0,A4 ADD .L1 A2,A0,A4 [A1][A1] LDW .D2 *B0,B5 LDW .D2 *B0,B5

Page 85: Resource Constraints

Ajay Patil

Chapter 2, Slide 85

Cross Paths - SummaryCross Paths - Summary

DataData Destination register on same side as unit.Destination register on same side as unit. Source registers - up to one cross path per Source registers - up to one cross path per

execute packet per side.execute packet per side. Use “x” to indicate cross-path.Use “x” to indicate cross-path.

AddressAddress Pointer must be on same side as unit.Pointer must be on same side as unit. Data can be transferred to/from either side.Data can be transferred to/from either side. Parallel accesses: both cross or neither cross.Parallel accesses: both cross or neither cross. Conditionals Conditionals Don’tDon’t Use Cross Paths. Use Cross Paths.

Page 86: Resource Constraints

Ajay Patil

Chapter 2, Slide 86

Code Review (using side A only)Code Review (using side A only)

MVKMVK .S1.S1 40, A240, A2 ; A2 = 40, loop count; A2 = 40, loop count

loop:loop: LDHLDH .D1.D1 *A5++, A0*A5++, A0 ; A0 = a(n); A0 = a(n)

LDHLDH .D1.D1 *A6++, A1*A6++, A1 ; A1 = x(n); A1 = x(n)

MPYMPY .M1.M1 A0, A1, A3A0, A1, A3 ; A3 = a(n) * x(n); A3 = a(n) * x(n)

ADDADD .L1.L1 A3, A4, A4A3, A4, A4 ; Y = Y + A3; Y = Y + A3

SUBSUB .L1.L1 A2, 1, A2A2, 1, A2 ; decrement loop count; decrement loop count

[A2][A2] BB .S1.S1 looploop ; if A2 ; if A2 0, branch 0, branch

STHSTH .D1.D1 A4, *A7A4, *A7 ; *A7 = Y; *A7 = Y

Y =Y =4040 aann x xnn

n = 1n = 1**

Note: Assume that A4 was previously cleared and the pointers are initialised.Note: Assume that A4 was previously cleared and the pointers are initialised.

Page 87: Resource Constraints

Ajay Patil

Chapter 2, Slide 87

Let us have a look at the final details Let us have a look at the final details concerning the functional units.concerning the functional units.

Consider first the case of the .L and .S Consider first the case of the .L and .S units.units.

Page 88: Resource Constraints

Ajay Patil

Chapter 2, Slide 88

So where do the 40-bit registers come from?So where do the 40-bit registers come from?

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

Operands can be:Operands can be: 5-bit constants (or 16-bit for MVKL and MVKH).5-bit constants (or 16-bit for MVKL and MVKH). 32-bit registers.32-bit registers. 40-bit Registers.40-bit Registers.

However, we have seen that registers are only However, we have seen that registers are only 32-bit.32-bit.

Page 89: Resource Constraints

Ajay Patil

Chapter 2, Slide 89

A 40-bit register can be obtained by A 40-bit register can be obtained by concatenating two registers. concatenating two registers.

However, there are 3 conditions that need However, there are 3 conditions that need to be respected:to be respected: The registers must be from the same side.The registers must be from the same side. The first register must be even and the second The first register must be even and the second

odd.odd. The registers must be consecutive.The registers must be consecutive.

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

Page 90: Resource Constraints

Ajay Patil

Chapter 2, Slide 90

A1:A0A1:A0

A3:A2A3:A2

A5:A4A5:A4

A7:A6A7:A6

A9:A8A9:A8

A11:A10A11:A10

A13:A12A13:A12

A15:A14A15:A14

odd odd eveneven::323288

40-bit Reg40-bit Reg

B1:B0B1:B0

B3:B2B3:B2

B5:B4B5:B4

B7:B6B7:B6

B9:B8B9:B8

B11:B10B11:B10

B13:B12B13:B12

B15:B14B15:B14

odd odd eveneven::323288

40-bit Reg40-bit Reg

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

All combinations of 40-bit registers are All combinations of 40-bit registers are shown below:shown below:

Page 91: Resource Constraints

Ajay Patil

Chapter 2, Slide 91

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L or .S.L or .S

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

Page 92: Resource Constraints

Ajay Patil

Chapter 2, Slide 92

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L or .S.L or .S

Page 93: Resource Constraints

Ajay Patil

Chapter 2, Slide 93

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

OR.L1 A0, A1, A2OR.L1 A0, A1, A2

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L or .S.L or .S

Page 94: Resource Constraints

Ajay Patil

Chapter 2, Slide 94

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

OR.L1 A0, A1, A2OR.L1 A0, A1, A2

ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L or .S.L or .S

Page 95: Resource Constraints

Ajay Patil

Chapter 2, Slide 95

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

OR.L1 A0, A1, A2OR.L1 A0, A1, A2

ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4

ADD.L1 A2, A3, A5:A4ADD.L1 A2, A3, A5:A4

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L or .S.L or .S

Page 96: Resource Constraints

Ajay Patil

Chapter 2, Slide 96

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

OR.L1 A0, A1, A2OR.L1 A0, A1, A2

ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4

ADD.L1 A2, A3, A5:A4ADD.L1 A2, A3, A5:A4

SUB.L1 A2, A5:A4, A5:A4SUB.L1 A2, A5:A4, A5:A4

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L or .S.L or .S

Page 97: Resource Constraints

Ajay Patil

Chapter 2, Slide 97

Operands - 32/40-bit Register, 5-bit ConstantOperands - 32/40-bit Register, 5-bit Constant

OR.L1 A0, A1, A2OR.L1 A0, A1, A2

ADD.L2 -5, B3, B4ADD.L2 -5, B3, B4

ADD.L1 A2, A3, A5:A4ADD.L1 A2, A3, A5:A4

SUB.L1 A2, A5:A4, A5:A4SUB.L1 A2, A5:A4, A5:A4

ADD.L2 3, B9:B8, B9:B8ADD.L2 3, B9:B8, B9:B8

instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>instr .unit <src>, <src>, <dst>

32-bit32-bitRegReg

40-bit40-bitRegReg

< src >< src > < src >< src >

32-bit32-bitRegReg

5-bit5-bitConstConst

32-bit32-bitRegReg

40-bit40-bitRegReg

< dst >< dst >

.L or .S.L or .S