42
Modular Multiplication Algorithms for FPGAs Mustafa Parlak

Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

  • Upload
    trannga

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

ModularMultiplicationAlgorithmsforFPGAs

MustafaParlak

Page 2: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

Outline• WhatisanFPGA?• FPGAvs.ASIC&Microprocessors• FPGADesignMetrics• FPGAsinCryptography• Adders:BasicoperatorofModularMultiplications

• ModularMultiplications– InterleavedModularMultiplications– MontgomeryModularMultiplications

• ComparisonofModularMultiplicationalgorithms

Page 3: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

WhatisanFPGA• FPGA =FieldProgrammableGateArray• AsemiconductorICthatcanbeconfiguredbytheuser(designer)aftermanufacturing

• Twodimensionalarrayofcustomizablelogicblockplacedinaninterconnectframework

• Theusertoconfigure:1. Thefunctionofeachlogicblock2. Theinterconnectionbetweenthelogicblocks,

• Canbeprogrammedusingalogiccircuitdiagram(schematic)orsourcecodeinVHDLorVerilog

Page 4: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

WhatisanFPGA• Logicblocks

– toimplementcombinationalandsequentiallogic

• Interconnect– wirestoconnect inputsand

outputstologicblocks• I/Oblocks

– speciallogicblocksatperipheryofdevice forexternalconnections

• Keyquestions:– howtomakelogicblocks

programmable?– howtoconnect thewires?– afterthechiphasbeenfabricated

Page 5: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

FPGALogicBlocks

• 4-inputlookuptable(LUT)– implementscombinationallogicfunctions

• Register– optionallystoresoutputofLUT

4-LUT FF1

0

latchLogic Block set by configuration

bit-stream

4-input "look up table"

OUTPUTINPUTS

Page 6: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

FPGAInterconnect

Page 7: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

LUTs(LookUp Tables)• LUTcontainsMemoryCellstoimplementsmalllogic

functions• Eachcellholds‘0’or‘1’.• ProgrammedwithoutputsofTruthTable• Inputsselectcontentofoneofthecellsasoutput

16-bit SR

flip-flop

clock

muxy

qe

abcd

16x1 RAM4-input

LUT

clock enable

set/reset

3 Inputs LUT -> 8 Memory Cells

SRAM

SRAM

3 – 6 Inputs

Multiplexer MUX Static Random Access MemorySRAM cells

Page 8: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

ConfiguringFPGA• MillionsofSRAMcellsholdingLUTsandInterconnectRouting• VolatileMemory.Losesconfigurationwhenboardpoweris

turnedoff.• KeepBitPatterndescribingtheSRAMcellsinnon-Volatile

Memorye.g.Flash• Configurationtakes~secs

Configuration data in

Configuration data out

= I/O pin/pad

= SRAM cell

Page 9: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

GenericFPGADesignFlow

• DesignEntry:– Createyourdesign files using:

• Schematic editoror• Hardware description language

(Verilog, VHDL)• Design“implementation”onFPGA:

– Synthesis, Partition,place,androute tocreatebit-stream file

• Designverification:– UseSimulator tocheckfunction,– othersoftwaredetermines maxclock

frequency.– LoadontoFPGAdevice (cableconnects

PCtodevelopment board)• Checkoperation atfullspeed inreal

environment.

Page 10: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

FPGAvs.ASIC/Microprocessors

–ASICgiveshighperformanceatcostofinflexibility.–Processorisveryflexiblebutnottunedtotheapplication.–Reconfigurablehardwareisanicecompromise.

Microprocessor ReconfigurableHardware

ASIC

Software Firmware Hardware

Page 11: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

FPGAvs.ASIC

FPGA• Reconfigurable• Lowthroughput• Shortdesigncycle• Suitableforlowvolume

production– Lowcostatsmallnumber

• Highpower• Highsiliconarea

– Utilizationproblem• Notestingcost• Alreadyfabricated

ASIC• Noreconfiguration• Highthroughput• Longdesigncycle• Suitableforhighvolume

production(>1Million)– Lowcostatlargenumber

• Lowpower• Lowsiliconarea

– Fullyutilized• Hightestingcost• Needtobefabricated

Page 12: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

FPGAvs.ProcessorsFPGA• Longdesigncycle• Expensive• Highthroughput

– (morethan20~100x)

Processor• Shortdesigncycle• Cheap• Lowthroughput

– Significantlyslower

Page 13: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

FPGABasedApplications• Cryptography• Networkprocessors• Evolvableandbiologically-inspired hardware• RapidASICprototyping• Real-timesystems• Embeddedapplications• Custom-computinghardware• Reconfigurablecomputing• Special-purpose computationengines

– Hardwarededicatedtosolvingoneproblem(orclassofproblems)

– Acceleratorsattachedtogeneral-purposecomputers

Page 14: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

FPGADesignMetrics• TimeComplexity– Throughputisthenumberofprocesseddataperunittime(bits/sec)

– Thehigherthethroughputofadesignthebetteritsefficiency

• AreaComplexity– #ofLUT,FF,RAMetc.

• Designmetriccombiningtimeandareatogether– Throughput/Area– Theratioishigherincaseofhighthroughputandlessspace

• Anotherimportantdesignmetric:Power

Page 15: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

Area-Speedoptimization

Loopunrolling&pipelining

Ingeneralthereisatrade-offbetween• Speed• Area

• Speedboosters• Parallelexecution• Loopunrollingand

pipelining• Inallcasesarea

increaseswithincreasingspeed

Page 16: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

WhyFPGA?• Flexibilityfromgeneralpurposecomputingandspeedfromreconfigurable logic

• Duetotheinherentfine-grainedgranularitytheparallelismtendstobeveryhigh

• Registers,latchesandevendistributedRAMblockscanbecreatedanddistributedwhereverneededbythedatapath

• LackofafixedarchitectureofFPGA,allowsthedesignerstotailordesign'sdatapathandcontrolflowarbitrarily

• Highlyregularanditerativeapplicationswithnon-standardwordlengths.

Page 17: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

WhyFPGAsuitswellinCryptography

• Speed&realtimeexecution– Encryption/decryptiondaterateupto1Gb/secforIPseccrypto

devices• RNGintegrity

– RObasedRNG• COMSECCriteria

– Red-BlackSeparation.– HardertoattackandbreakthecryptosystemrunningonFPGAas

comparedtoGPPs• TheeffectivenessoftheFPGA’scellstructureforimplementingbit-

wiselogicaloperationstypicaltomanycryptographicalgorithms• ThelargeamountofmemoryinsideFPGA

– Easetheimplementationofmemoryintensivesubstitutionoperation– Localstorageworkingasacachewheneverneeded

• Lowpower(ascomparedtoGPP?)

Page 18: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

ModularMultiplicationAlgorithms• Whymodularmultiplicationisimportant?

– Mostcommon operation of• RSA• Finitefieldarithmetic• DSA• Diffie-Hellmankeyexchange• ECC

• ModularMultiplicationalgorithmsinGF(p)– Multiply anddivide

• Naïvemethod– Interleavedmodularmultiplication

• Multiplicationandreduction areinterleaved– Montgomerymodularmultiplication

• Transformationandoperations inresiduedomain– Otheralgorithms

• Brickell’s method• …etc

Page 19: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

Adders:BasicBuildingBlockofMultiplication

• Fulladder(FA)iscombinational circuitwith3inputsandtwooutputs

• Computes sum(Si)andcarry(Ci+1)forthenextstage• FAisone-bit adder.Whathappens ifFAscascaded to

maken-bitadder§ Carryhastobepropagated§ Problem: propagationdelay § Canwegetridofcarrypropagation ordecrease

it?§ Number ofmethod proposedtoefficiently

implement addition• Ripple Carry(obviousone)• CarryLookAhead• CarrySave• DelayedCarry• Brent-Kung• etc….

Page 20: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

RippleCarryandCarrySaveAddersRippleCarryAdder• EachFAreceivesCin from

previousFA• Advantages

– Signdetectioniseasy• Disadvantage

– Delayishigh– LetdelayofanFAisT(FA)– Delayofn-bitadderisn*T(FA)

Carry-SaveAdder• ParallelEnsembleofFAs• Advantages

– DelayisconstantandoneFA• Disadvantages

– Addsthreenumberandproducestwo

– Thesigndetectionishard– Needconventionaladdertoget

finalresult

Page 21: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

OtherAddersandComparison

CarryLookAhead• Improvesspeedby

reducingcarrypropagation

CarryDelayedAdder• Twolevelcarrysave

adder

Page 22: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

ModularAddition• GivenA,B<PcomputeA+B(modP)

1. FindSʹ=A+B2. If(Sʹ>P)3. S=Sʹ- P4. elseS=Sʹ

• Omura’s Method:Anefficientmethodcomputingthemodularaddition– Usefulformultioperandmodularaddition– Eliminatestheneedforsubtraction– Foran-bitoperands,thismethodalwayskeepstheintermediate

resultswithinn-bit.Nevergrowsbeyondthat– Wheneveritexceedsn-bit,thecarry-outisignoredandacorrection

isperformed.

Page 23: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

Omura’s Method1. Computecorrection

factorm=2n-P2. FirstcomputeS'=A

+B.3. Ifthereisacarry-

out(nth bit),thenS=S'+m,elseS=S'.

Ex:AssumeP=39m=26-39=25=(011001)

WeobtaintheresultasS=31whichis70(mod39)

Page 24: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedModularMultiplication

Atmosttwosubtraction isneeded toreducepartial product

Page 25: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithOmura’s MethodObservations withstandard interleavedmethod• 3addition (orsubtraction) periteration• Twocomparison andtworeduction per iteration• Partialaddition result goesbeyondn-bit• UseOmura’s method togetridofsubtractions andcomparisons

Advantages• Comparisons andsubtractions

eliminated• PartialproductRnevergrow

beyondn-bitDisadvantages• Pre-computation increases

execution time• Still3addition periteration• ExtramemoryforstoringM• Onefinalcorrection subtraction

mayberequired

Page 26: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithPre-computation• Aclevermethodtoreduce3addition/reductionto1addition:

– Idea:Reductionofith iterationcanbecalculatedandgetreadyfornextiteration(i+1)th.(correctionstep)

– Correctioncanbeaddedtothenextiterationintermediateproduct– InsteadofreducingwithPreducewith2nwhichisselectingnleast

significantbits– Thesepossiblecorrectionvaluescanbepre-computedbefore

multiplicationstartsandstoredinalook-uptable• Atith iteration,assumepartialproductiscalculatedR=A•Bi +2R

andreadyfornextiteration.• PartialproductR,maygrowonly2morebit,fromnton+2as

R=(Rn+1 RnRn-1 …R0)• AssumethatRgrowonly1bit,R=(RnRn-1….R0).

– NowRisn+1bitlong

Page 27: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithPre-computation• InsteadofreducingRtoP,reduceitto2n.

– Rʹ=R(mod2n)selectsnleastsignificantbitsofR.ThenRʹ=R– 2n isreadyfori+1th iteration

– Addcorrectionfactoratnextiteration(i+1th) torestorethesamepartialproductinoriginalinterleavedalgorithm

• At(i+1th)iteration,AssumeBi+1 =0– OriginalinterleavedalgorithmfindsRʹʹ=A•Bi +2R(modP)=0+2R

(modP)=2R(modP)• Verification

– Shiftleft(doublesthepartialproduct)Rʹʹ=2Rʹ=2(R– 2n)=2R- 2n+1– Reducethepartialproductbyadding2n+1 (modP).(correctionfactor)– Rʹʹ=2R- 2n+1+2n+1 (modP)=2R(modP)whichisdesiredresultfor

i+1th iteration.• Onlyafewpossiblecorrectionfactormayoccur.

– 0,B,2n+1 (modP),B+2n+1 (modP),2n+2 (modP),B+2n+2 (modP)

Page 28: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithPre-computation• Advantages

– Oneadditionineachiteration– Almost2xincreaseinspeed

• Disadvantages– Requirepre-computation(breakstheregularity)

– Requireoneextraiteration– Requireextralocalstorage(4xoperandbitlength=4x2n)• Ex:2048-RSAmodularmultiplication(4x2048=8kbit

– comparisonandsubtractionattheend

Page 29: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithPre-computationDatapath

Page 30: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithCSAUtilization

• ConventionaladderisreplacedwithCSAadder(redundantrepresentation)

• ReductiontoMininterleavedalgorithmisreplacedwithreductionwith2n

• Afterwards,thevalueofk*2n(modM)isaddedinordertoreconstructthecorrectintermediateresultatnextiteration

• AttheendS,Careaddedtofindcorrectresult

Page 31: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithCSAUtilizationAdvantages• Twoaddition periteration (?)• Additions inconstanttime (No

carrypropagation)

Disadvantages• Theresult isinredundant form

(C,S)whichhastobecalculatedwithconventional adder. (Onemoreadder)

• Calculation ofAisnotstraightforwardandneed subtractionsandcomparisons

• NeedmorestoragetosaveS,Cinstead ofone.

• Datapath requiremorelogic• Complex FSMandaddress

generation

Page 32: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithCSAUtilizationandPre-computation

• Thementionedproblemsmakesthealgorithminfeasible

• Samepre-computationideaisapplied– TheintermediateresultIhasonlytwopossiblevalues(0,Y)

– IncorrectionphaseAalsohasafewpossiblevalues

– Thesetwocancombinedas2A+Iandpre-computedandstored

Page 33: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

InterleavedwithCSAUtilizationandPre-computation

Advantages– Onlyoneaddition periteration

inconstant time– Nocomparison andreduction

Disadvantages– Require pre-computation

(breakstheregularity)– Require oneextraiteration– Require extrastorage(6x

operandbit length)• Ex:2048-RSAmodular

multiplication• 6x2048=12kbit localstorage

– Attheendofiterations• Requireconventionaladderto

calculate(C+S)• Mayrequireoneextrareduction

(subtraction)– Require 3operandmemory

bandwidth percycle

Page 34: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

MontgomeryModularMultiplication

• In1985,P.L.MontgomeryintroducedanefficientalgorithmforcomputingA·B(modP)

• Itperformsmoduloreductionwithoutdivision• AlgorithmreplacesdivisionbyPoperationwithdivisionbyapower

of2– Wellsuitscomputersystemsbecausedivisionbypowerof2issimply

theshiftoperation• DefineanP-residuetoberesidueclassmoduloP.

– GivenA,Basn-bitoperand.Aʹ=A·R(modP),Bʹ=B·R(modP)• SelectRco-primetoP.NaturalchoiceisRbeingtheoperandsize

(2n).• Montgomerymultiplicationcomputes

– MonPro(A,B)=A·B·R-1 (modP)• GivenAʹ=A·R(modP),B

– MonPro(Aʹ,B)=Aʹ·B·R-1 (modP)=A·R·B·R-1 (modP)=A·B(modP)

Page 35: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

BinaryMontgomeryModularMultiplication

• A,B,Paren-bitnumbers (A,B,P<2n)• LetA=(An-1An-2 •••A0)bethebinaryrepresentation ofA.• Choose R=2n• MonPro(A,B)=A·B·2-n (modP)• Startfromthe leastsignificant bit,andobtainthefollowingbinaryadd-shift

algorithm tocomputeT=A·B·2-n

Page 36: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

BinaryMontgomeryModularMultiplication

• WeareinterestedinT=A•B•2-n (modP)notT=A•B•2-n

• ReducepartialproductTineachiteration– IfTiseventhen

• T/2(modP)=T/2• Reducebyjustrightshiftedbyonebit

– IfTisoddthenT+Pmustbeeven• WeknowT<P• T(modP)=T+P(modP)• (T+P)<2P=>(T+P)/2<P• ResultisalreadyreducedmoduloP• ReducebyaddingPandthenrightshiftingbyonebit

Page 37: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

BinaryMontgomeryModularMultiplication

Advantages• Onaveragemorethanone

addition foreach iteration• Onlyone-bit comparison is

performed todecide thePaddition

Disadvantages• Oneextrasubtraction is

needed attheend• Require conversionto

residue domain• Notabigproblem if

multiple multiplicationsrequiredforthesamemodulus

Page 38: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

MontgomeryMultiplicationwithPre-computation

• Beforecomputingpartialproductitisknownthateither0,P,B,B+Pneedtobeadded.

• Followingtruthtableshowswhattoadded

R0 Ai B0 Precomp

0 0 0 0

0 0 1 0

0 1 0 B

0 1 1 B+P

1 0 0 P

1 0 1 P

1 1 0 B+P

1 1 1 B

Page 39: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

MontgomerywithPre-computation

Advantages• Lessthanoneadditionper

iteration– Latencydecreased

• Simplerdatapath

Disadvantages• Storageisrequiredtosave

B+P• B+Phastobecalculated

beforeiterationsstart.• Littlebitmorecomplexloop

controlcomparedtosimpleMontgomerymultiplication– Negligible

Page 40: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

MontgomerywithCSAutilization

Page 41: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

MontgomerywithCSAutilizationAdvantages• AdditionsisdonebyCSAwhichhas1FAdelay

– Improvesoperation frequency• Almostoneadditionperiteration

Disadvantages• Memorybandwidthis3operandpercycle(C,S,I)• Require1extraiterationtorestoretheresult• Storageincreases

– X,Y,P,Y+P,C,Sneed tobestored• Complexdatapath(2xlargerbecauseofredundantrepresentation{C,S})

– Conventional adderneeded togetC+S• Directlyaffectsoperationfrequency(think ofRCAn*FAdelay)

• Conventionaladditionneedtobereduced(finalreduction)

Page 42: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic

ComparisonsofMMAlgorithmsAlgorithms # ofAddition/

iteration# ofAdder Storageneeded

Interleaved Greater than2 1 3xoperand length

InterleavedwithPre-computation

Slightlygreaterthan1(oneextra

iteration)

1 7xoperand length

InterleavedwithCSA

Slightlygreaterthan1 (oneextra

iteration)

2(1CSA,1RCA)Complex datapath(redundant rep)

9xoperand length

Montgomery Greaterthan1lessthan1.5

1 3xoperand length

MontgomerywithPre-computation

Less than1 1 4x operand length

MontgomerywithCSA

Slightlygreaterthan1 (oneextra

iteration)

2(1CSA,1RCA)Complex datapath(redundant rep)

4xoperand length