Processor Architectures and Program Mapping

  • Upload
    margie

  • View
    31

  • Download
    4

Embed Size (px)

DESCRIPTION

Processor Architectures and Program Mapping. 5kk10. flexibility. efficiency. DSP. Programmable CPU. Programmable DSP. Application specific instruction set processor (ASIP). Application specific processor. efficiency. ASIC. high medium low. ASIP. DSP. - PowerPoint PPT Presentation

Citation preview

  • Processor Architectures and Program Mapping5kk10

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • DSPProgrammable CPUProgrammable DSPApplication specific instruction set processor (ASIP) Applicationspecific processor

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • low medium high high

    medium

    lowflexibilityefficiencyASICGP procFPGADSPASIP

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Programmable CPU cores introduction architecture of the MIPS core discussed as an example pipelining application examples software issues comparison between different CPU cores towards application specific architectures discussion

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • rationale: as high multiplex factor R as possibleconsequence: often manual handcrafted design optimised for clock rateproblem : fast changes in the IC process technologyexamples embedded: MIPS (first one, licensing instruction set architecture)ARM (Advanced Risc Machines, telecom, low power, small code size, most popular one, licensing alsothe micro-architecture as hard or soft IP)Sparcderivatives from general purpose CPUsIntel, NEC, Hitachi, National, PowerPC

    Introduction

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Instruction set architecturesimplicit operandsexplicit operandsIntroduction

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • C = A + BIntroduction

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    stack

    accum

    Reg-mem

    Reg-reg

    Push A

    Load A

    Load R1, A

    Load R1,A

    Push B

    Add B

    Add R1,B

    Load R2,B

    Add

    Store C

    Store C, R1

    Add R3,R1,R2

    Pop C

    Store C,R3

  • Architecture of the MIPS core[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • opoperation of the instructionrs,rt,rdsource and destination registersshamtshift amountfunctoperation of the instruction-part 2immfor program constantsaddrtarget address of a jumpMIPS instruction formats ( 32 bits )[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Example 1 : R - type : add instruction[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • PCInstructionMemoryRw Ra Rb

    32 32-bitregisters

    DataMemoryClkClkClkDataaddressData inData outInstruction addressInstructionRdRsRtImm5551632323232Critical path R-type operation[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Old valueNew valueInstruction memory access timePCRs, rt, rdop, functOld valueNew valueRFile access timeBus A,BOld valueNew valueALU delayBus WSet up + skewClock-to-QNew valueOld valueClockWrite into RFileCritical path R-type operation

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Example 2 : I-type : load wordlw rs, rt, imm16 mem[PC] addr = R[rs] + ext[imm16] R[rt] = mem[addr] PC = PC + 4[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Old valueNew valueInstruction memory access timePCRs, rt, rdop, functOld valueNew valueRFile access timeBus A,BOld valueNew valueMem access timeBus Wset up+skewClock-to-QNew valueClockCritical path load operationOld valueNew valueALU delayaddressOld value

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • beq rs, rt, imm16 mem[PC] cond = R[rs] - R[rt] if cond = 0 PC = PC + 4 + ext(imm16)*4 else PC = PC + 4Example 3 : I-type : branch[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Rw Ra Rb

    32 32-bitregisters

    ClkRsdc (Rt)55532BusA32Reg WrBus WALUctrRdRtRedDst

    32ExtenderImm 16 16ALUSrcExtOpBusB32Next AddressLogicImm 16 16BranchTo InstructionMemoryPCClkZeroExample 3 : I-type : branch[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • PCBranch Zero01SignExtImm 16 16Instruction 00AddrAddr

    InstructionMemory303030303030Clk132Instruction Example 3 : I-type : branch[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • PCBranch Zero01SignExtImm 16 16Instruction 00AddrAddr

    InstructionMemory3030Clk032Instruction 301c_inExample 3 : I-type : branch[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • problem : long critical path defined by the slowest instruction (load) solution ?= pipelining break the instruction into smaller steps all steps have about the same critical pathArchitecture of the MIPS core

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • IfetchRF readALUdmemRF writecycle 1cycle 2cycle 3cycle 4cycle 5cycle 6cycle 7IfetchRF readALUdmemRF writeIfetchRF readALUdmemRF writelwlwlwPipelining lw instructions One instructions enters the pipeline every clock cycle One instructions leaves the pipeline every clock cycle=> CPI = 1 (Cycles per Instruction)[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • IRAMWInstructionsDataCurrent CPU cyclePipelining lw instructions

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • IfetchRF readALURF writeE.g. ADD4 stages of R-type instructioncycle 1cycle 2cycle 3cycle 4[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • IfetchRF readALUdmemRF writecycle 1cycle 2cycle 3cycle 4cycle 5cycle 6cycle 7IfetchRF readALURF writelwaddPipelining lw and R-type instructions[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Solution: stretch R-type to 5 stagesIfetchRF readALUdmemRF writeDummy op (noop)[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • BusADin

    RegDstext.Imm16ALUSrcExtOpDatamemMemtoRegMemWrBusBRaRbRwDiRsRtRtRdadrProgmem+ 4DoutRfileflagsALUopbranchRegWrIfetchReg/decexecmemwrNext PC[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • R1 = ... = R1 + ... = R1 + ... = R1 + ... = R1 + ...Data dependencies : R-type instructions[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • R1 = ... = R1 + ... = R1 + ... = R1 + ... = R1 + ...Data dependencies : R-type instructionsSolution: bypasses[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • DatamemadrBypasses[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • R1 = lw... = R1 + ... = R1 + ... = R1 + ...Data dependencies : load instruction[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • R1 = lw... = R1 + ... = R1 - ... = R1 - ...Data dependencies : load instructionBypass is no solutionfor + instruction[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • IMRFDMRFIMRFDMRFIMRFDMRFR1 = lw... = R1 + ... = R1 - ... = R1 - ...Data dependencies : load instructionSolution: pipeline interlock = detects a data hazard and stallsthe pipeline until the hazard is cleared[Hennessy&Patterson]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • IR(interlocked)AMWInstructionsi1) lw r10, r2, r0i2) add r8, r9, r10Data available from data cachei1i2

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • IR(interlocked)AMWInstructionsi1) MULT r3, r2, r1i2) ADD r5, r4, r3i1i2

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • BusADin

    ext.Imm16DatamemBusBRaRbRwDiRsRtRtRdadrProgmem+ 4DoutRfileflagsbranchNext PC[Hennessy&Patterson]Control hazards

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • BusADin

    ext.Imm16DatamemBusBRaRbRwDiRsRtRtRdadrProgmem+ 4DoutRfileflagsbranchNext PC[Hennessy&Patterson]Control hazards0?

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • i1i2i3Address available for instr. fetchi1) beq r10, r2, 1bi2) nop/independent instructionsi3) add r8, r9, r10Control hazardsSolution: compiler action possibly filling the branch delay slot

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • PR3930 CPU

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    caches

    I$ 8K, 2-way

    D$ 4K, 4-way

    Process

    0.35(, 5M

    voltage

    2.7-3.6 V

    frequency

    81/100 MHz

    Tj = 125/90 C

    2.7V, wcp

    area

    20 mm2

    Power dissipation

    4 mW/MHz

  • PR3930 + peripheralsGfx, SDRAM controller,Serial interconnect bus,I2C, UART, timers PI bus architecture80 mm2352 pins0.35 micron process48 MHz (96 for gfx)TCP chip: TV controllerD$I$

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Programmable CPU cores introduction architecture of the MIPS core discussed as an example pipelining application examples software issues comparison between different CPU cores towards application specific architectures discussion

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Application examples (1)

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    #define NTAPS 4

    int fir(int in)

    (

    int i;

    static int state[NTAPS];

    static int coeff[NTAPS];

    int out[NTAPS];

    state[NTAPS] = in;

    out[0] = state[0] * coeff[0];

    for ( i = 1; i < NTAPS+1; i++) (

    out[i] = out[i-1] + state[i] * coeff[i];

    state[i-1] = state[i];

    (

    return(out[NTAPS]);

    (

  • Application examples (1)19 instructions per tap!!

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    .L1000006

    sll $3, $2, 2

    R3=R2>>2

    R3=i-1

    addu$14, $15, $3

    R14=R15+R3

    lw$24, 0($14)

    R24=load(*R14)R24=coeff[i-1]

    addiu$12, $6, -4

    R12=R6-4

    addu$11, $12, $3

    R11=R12+R3

    lw$13, 0($11)

    R13=load(*R11)R13=state[i-1]

    nop

    mult$24, $13

    R24=R24*R13

    addu$25, $sp, $3

    R25=sp+R3

    lw$9, -4($25)

    R9=load(R25-4)R9=out[i-1]

    addiu$2, $2, 1

    R2=R2+1

    i=i+1

    mflo $13

    R13=move from low mpy reg

    addu$10, $9, $13

    R10=R9+R13

    R10=out[i]

    sw$10, 0($25)

    mem(*R25)=R10

    addu$25, $7, $3

    R25=R7+R3

    sw$24, 0($25)

    mem(*R25)=R24

    slti$24, $2, 10

    bne $24, $0, .L100006

    addiu$15, $7, -4

  • Bit level operations:finite field arithmeticApplication examples (2)10 instructions!!Very simple in hardware

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    temp1 = input

  • Bit level operations : DES exampleApplication examples (2)

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Bit level operations : A5 example (GSM encryption)Application examples (2)

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • CIF format = 352 * 288 px, 2:1:1, 8 bits/sampleQCIF = 1/4 CIFSQCIF = 96*128Process = 0.25 micronpower consumption = 100 mW @ 10 HzVideo conferencing H26396*128*1.5*10Hz= 180 KB/s20Kb/s:72Compare 852*576*2B/p *50 =49MB/sApplication examples (3)

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • H.263 video encoderApplication examples (3)

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • PR3940I$D$memory10 Hz => 140 MHz CPUApplication examples (3)

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    indicator

    value

    Code size

    249 kB

    Data size

    189 kB

    I-frame

    8.8 Mcc

    P-frame

    13.8 Mcc

    Motion Est.

    2.1 Mcc

    Bus load

    18 %

    I$ misses

    0.8 %

  • Application examples (3)In which process can the H263 video encoder be executedon a single MIPS processor ?Conclude: power consumption is limiting factor!!

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    95

    97

    99

    01

    03

    06

    09

    12

    gatelength ((m)

    350

    250

    180

    150

    130

    100

    70

    50

    VDD (V)

    2.7

    2.5

    1.8

    1.5

    1.5

    1.2

    0.9

    0.6

    s

    1

    0.71

    0.51

    0.43

    0.37

    .29

    0.20

    0.14

    p

    1

    0.93

    0.67

    0.56

    0.56

    0.44

    0.33

    0.22

    area

    s2

    20

    10.2

    5.3

    3.67

    2.76

    1.63

    0.8

    0.41

    max. clock freq (MHz)

    p/s2

    81

    147

    204

    245

    326

    441

    675

    882

    energy/ins (nJ)

    sp2

    4

    2.45

    0.91

    0.53

    0.46

    0.23

    .09

    0.03

  • Application examples: conclusionsCPUs offer flexibility, butnot efficient in performancenot efficient in code sizenot efficient in power consumption

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • func() { a=x.value & 0x3; if (a != 0) { b = a * c + d; } else { b = ; } y.post(b);} a=x.value & 0x3;b = a * c + d;b = ;y.post(b);a != 0a == 0BB1BB2BB4BB3parserldi #0x3, R5and R4,R5,R6cmp R0,R6,R7br R7,trueba falseArch. Modelldi=2 cyclesnop =1 cycle...func() { a=x.value & 0x3; DelayCycles(7); if (a != 0) { b = a * c + d; DelayCycles(8); } else { b = ; DelayCycles(5); } y.post(b); DelayCycles(4);} compile each BBto instructionsgenerate new Cwith delay countscompileand run

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Comparison between different CPU cores

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    processor

    Size

    Process

    Inter-connect

    Clock

    Specint

    Int

    Specint

    FP

    Power

    Watt

    microsparc

    225

    0.8

    3

    50

    23

    18

    4

    I486DX2

    82

    0.8

    3

    66

    32

    16

    7

    Power PC 601

    121

    0.6

    4

    66

    60

    80

    7

    Pentium

    294

    0.8

    3

    66

    85

    63

    16

    R4200

    76

    0.6

    3

    80

    55

    30

    1.5

    R4000SC

    184

    0.8

    2

    100

    62

    63

    12

    R4400SC

    184

    0.6

    2

    150

    88

    97

    15

    Alpha

    238

    0.75

    3

    200

    130

    184

    30

  • Comparison between different CPU coreshttp://bwrc.eecs.berkeley.edu/cic

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    Sheet1

    processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W

    Inteli386SX336.215112431.1035405919906.170.14418604653.1

    Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35

    IntelP56678250.81.3219280949162960.801246.090.16864864864.875

    Motorola680402521150.81.321928094961640.511976.890.08195121953.5

    SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6

    Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6

    PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538

    SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338

    PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429

    SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667

    IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214

    PowerPC604e2009.342.50.352.514573172816961.59630.9000

    Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889

    Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515

    PowerPC604e30012.941.90.25312471.89530.2100

    Alpha2136410007041.50.183.47393118831003501.49671.9900

    PowerPC740040031.80.153.73696559428832.06484.9300

    TMPNX850016641.80.183.47393118831.416.92.56390.31

    ARM610251615110.5711.59630.90

    71033150.81.32192809490.5461.71586.30

    8107513.30.520.52.06484.93

    SA-11016011.650.352.51457317280.5502.39418.20

    SA-110013311.50.352.51457317280.32.53394.82

    940T150130.352.51457317280.68152.23448.50

    Sheet1

    technology in micron

    Specit92

    Specint92 vs technology

    Sheet2

    technology in micron

    MOPS/W

    MOPS/W vs technology

    Sheet3

    technology in micron

    Specint92/mm^2

    Specint92/mm^2 vs technology

    technology in micron

    Specint92/W

    Specint92/W vs technology

  • Comparison between different CPU cores

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    Chart1

    6.2

    33.4

    78

    21

    26.4

    138

    40

    89

    128

    269

    293

    0.35

    143

    500

    0.25

    0.18

    0.15

    technology in micron

    Specit92

    Specint92 vs technology

    Sheet1

    processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W

    Inteli386SX336.215112431.1035405919906.170.14418604653.1

    Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35

    IntelP56678250.81.3219280949162960.801246.090.16864864864.875

    Motorola680402521150.81.321928094961640.511976.890.08195121953.5

    SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6

    Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6

    PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538

    SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338

    PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429

    SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667

    IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214

    PowerPC604e2009.342.50.352.514573172816961.59630.9000

    Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889

    Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515

    PowerPC604e30012.941.90.25312471.89530.2100

    Alpha2136410007041.50.183.47393118831003501.49671.9900

    PowerPC740040031.80.153.73696559428832.06484.9300

    TMPNX850016641.80.183.47393118831.416.92.56390.31

    ARM610251615110.5711.59630.90

    71033150.81.32192809490.5461.71586.30

    8107513.30.520.52.06484.93

    SA-11016011.650.352.51457317280.5502.39418.20

    SA-110013311.50.352.51457317280.32.53394.82

    940T150130.352.51457317280.68152.23448.50

    Sheet1

    technology in micron

    Specit92

    Specint92 vs technology

    Sheet2

    technology in micron

    MOPS/W

    MOPS/W vs technology

    Sheet3

    technology in micron

    Specint92/mm^2

    Specint92/mm^2 vs technology

    technology in micron

    Specint92/W

    Specint92/W vs technology

  • Comparison between different CPU cores

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    Chart2

    0.1441860465

    0.2639012346

    0.1686486486

    0.0819512195

    0.0750933333

    0.3317307692

    0.1190082645

    0.12515625

    0.1624365482

    0.1886415873

    0.1840641026

    0

    0.1327083333

    0.293062201

    technology in micron

    Specint92/mm^2

    Specint92/mm^2 vs technology

    Sheet1

    processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W

    Inteli386SX336.215112431.1035405919906.170.14418604653.1

    Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35

    IntelP56678250.81.3219280949162960.801246.090.16864864864.875

    Motorola680402521150.81.321928094961640.511976.890.08195121953.5

    SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6

    Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6

    PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538

    SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338

    PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429

    SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667

    IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214

    PowerPC604e2009.342.50.352.514573172816961.59630.9000

    Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889

    Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515

    PowerPC604e30012.941.90.25312471.89530.2100

    Alpha2136410007041.50.183.47393118831003501.49671.9900

    PowerPC740040031.80.153.73696559428832.06484.9300

    TMPNX850016641.80.183.47393118831.416.92.56390.31

    ARM610251615110.5711.59630.90

    71033150.81.32192809490.5461.71586.30

    8107513.30.520.52.06484.93

    SA-11016011.650.352.51457317280.5502.39418.20

    SA-110013311.50.352.51457317280.32.53394.82

    940T150130.352.51457317280.68152.23448.50

    Sheet1

    technology in micron

    Specit92

    Specint92 vs technology

    Sheet2

    technology in micron

    MOPS/W

    MOPS/W vs technology

    Sheet3

    technology in micron

    Specint92/mm^2

    Specint92/mm^2 vs technology

    technology in micron

    Specint92/W

    Specint92/W vs technology

  • Comparison between different CPU cores

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

    Chart3

    3.1

    8.35

    4.875

    3.5

    6.6

    4.6

    6.1538461538

    6.2676056338

    9.1428571429

    8.9666666667

    12.5213675214

    0

    15.8888888889

    15.1515151515

    technology in micron

    Specint92/W

    Specint92/W vs technology

    Sheet1

    processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W

    Inteli386SX336.215112431.1035405919906.170.14418604653.1

    Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35

    IntelP56678250.81.3219280949162960.801246.090.16864864864.875

    Motorola680402521150.81.321928094961640.511976.890.08195121953.5

    SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6

    Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6

    PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538

    SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338

    PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429

    SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667

    IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214

    PowerPC604e2009.342.50.352.514573172816961.59630.9000

    Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889

    Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515

    PowerPC604e30012.941.90.25312471.89530.2100

    Alpha2136410007041.50.183.47393118831003501.49671.9900

    PowerPC740040031.80.153.73696559428832.06484.9300

    TMPNX850016641.80.183.47393118831.416.92.56390.31

    ARM610251615110.5711.59630.90

    71033150.81.32192809490.5461.71586.30

    8107513.30.520.52.06484.93

    SA-11016011.650.352.51457317280.5502.39418.20

    SA-110013311.50.352.51457317280.32.53394.82

    940T150130.352.51457317280.68152.23448.50

    Sheet1

    technology in micron

    Specit92

    Specint92 vs technology

    Sheet2

    technology in micron

    MOPS/W

    MOPS/W vs technology

    Sheet3

    technology in micron

    Specint92/mm^2

    Specint92/mm^2 vs technology

    technology in micron

    Specint92/W

    Specint92/W vs technology

  • Power Consumption in microprocessorsPower consumption is (becoming) the limiting factor in processor design

    Solution in direction ofHardware accelerationInstruction Level Parallelism instead of clock speedCode size efficiency

    source: ISSCC2001, Patrick Gelsinger, Intel

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Towards application specific architecturesConCISe [Bernardo Kastrup]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Example equation for one output bit (12) is shown!Towards application specific architectures

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Towards application specific architectures

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Hardware/softwarepartitioningTranslatorhardwarecompilerAssembler/linkerModified assemblywith ASIsHardwarenetlistDoes it fit? Y/NHardware partitionHDL fileSource codeConCISe integratedtool-setProfiledataCorecompilerSimulatorexecutableAssemblycodeTowards application specific architectures

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Advantages: faster execution, smaller code size, lower powerThe Configurable Functional Unit (CFU) can be:Standard cellField-Programmable Logic (FPL)Considerably bigger in silicon (4 to 5mm2 in C075)But its reconfigurable = reprogrammable for different application programsTowards application specific architecturesConCISe [Bernardo Kastrup]

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Some benchmarks

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Amdahls lawImpact of an improvement on the execution time of a program depends on 2 parameters:f = fraction of the original computation time that is affected by the improvements = speedup factor (local)exec_time_new = exec_time_old * (1-f) + exec_time_old * f / sspeedup_overall = exec_time_old / exec_time_new = 1 / ( 1 f + f / s)if s >> 1 then speedup_overall = 1 / ( 1 f )Example: 40 % of program can be executed 10 x faster speedup_overall = 1 / ( 0.6 + 0.4 / 10 ) = 1.56

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • www.tensilica.comTowards application specific architectures

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Programmable CPU cores are important for the control parts of the application. They are well supported with tools to support the development of end-user software. ( vs. deeply embedded sw) Keep it Simple heuristic (RISC vs. CISC) Make frequent cases fast and rare cases correct. Regular (orthogonal) instruction set No special features that match a high level language construct. At least 16 registers to ease register allocation. Embedded cores are often light cores which are a compromise between performance, area and power dissipation. (vs. stand-alone CPU cores which are optimised for performance)

    Conclusions

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Hands-onImplement a FIR filter in assembly and simulate

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  • Hands-onSPIM (MIPS assembly simulator) link from PAM websiteUse appendix A (same site)example assembly file on PAM website1 or 2 page report in 2 weeks:Engineering decisions (eg. Addressing of samples)Verify that C-code and assembly matchAssembly in appendix# instructions/tab? Conclusions?

    Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman