Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
2
Inthislastlectureofthecourse,Iwillbelookingathowaddercircuitswork.Inparticular,IwillbeexaminingtheimplementationofaddersontheCycloneVFPGAs.Furthermore,sinceCycloneValsocontainalargenumberofmultipliersinsideDSPblocks,Iwillbriefexplainhowthesecouldbeinstantiated.
3
Thebuildingblockforann-bitadderisaone-bitfulladder.ThisisessentiallyacomponentthataddsTHREEone-bitvaluestogether:PQandCI.Itproducestwooutputs:SumS,andCarryC.
Notethatallthreeinputsaresymmetrical– theycanbeswappedwithoutanychangeintheoutputs.
Further,ifyouinvertallinputs,theoutputsarealsoallinverted.Thisisknownas“self-dual”.
TheBooleanequationforCandSareshownhere.Thisissomethingthathasbeencoveredfromthefirstyear.
4
Tobuilda4-bitadderfrom1-bitfulladders,wecanconnectfouroftheseseriallyasshownhere.Itisimportanttoappreciatethatthis4-bitadderworksforbothsignedandunsignedinput,providedthatsignednumbersareusing2’scomplementrepresentation.Inotherwords,ifyouinterpretinputasunsigned4-bitnumbers,thentheadderproducesunsigned5-bitoutput(includingthecarryoutsignal).
Ifyouinterprettheinputas2’scomplementsignednumbers,thentheoutputiscorrectasa4-bitSIGNEDoutput.(Inthiscase,youcannotuseC3asthe5thbit).
5
Ifyouaddtwo4-bitnumberstogether,theinputsareunsigned,thentheoutputwillbeintherangeof0to30.However,ifyouwanttousetheadderforsignednumberaddition,theinputrangeis-8to+7,thereforetheoutputrangeis-16to+14.Inbothcases,wewouldneed5-bitaddertoavoidanyoverflow.
Fortheunsignedcase,youcouldusethecarryoutasthe5th outputbit.Fornow,letususeonlythesumoutput,andweneeda5-bitadder.
Forsignedaddition,wecannotusethecarryoutasthe5th bit.WeMUSTusea5-bitadder.So,weneedexpandtheinputnumbersfrom4-btito5-bit.WhatdowedowiththeMSB?
6
Forunsignednumbers,examplinga4-bitnumbertoan8-bitnumbersimplyrequiresadding4extra‘0’ totheMSBpartofthenumber.
Forsignednumbers,weneedtoextendtotheleftfoursignbitsinordertopreservethesignedofthenumberandmaintainthecorrectvalue.Thisisknownas“signextension”.
7
Toshrinkabinarynumbertosmallernumberofbits,itiseasyforunsignednumbers– simplydeleteallleading‘0’sforMSB.Thevalueoftheunsignednumberwillnotchange.
Forsignednumbers,iftheMSBis0,youcandeleteall‘0’ fromMSBdownexceptthelast‘0’.IftheMSBis‘1’,youcandeleteall‘1’sfromMSBdownexceptthelast‘1’.Inthatway,youpreservethecorrectsignofthenumber.
YoucanalsoshrinkthenumberofbitsbyremovingunwantedLSBsthroughtruncation.Alternativelyyoucanperformrounding.
Truncationiseasy– toreducean8-bitnumberto4-bit,justremovetheleastsignificant4-bit.
Roundingisharder,andtherearevariousmethodofroundingthatcanbeused.Thesimplestmethodinthecaseof8-bitroundedto4-bitsistoadd8’b00001000tothenumber,thantruncatethebottom4-bit.Basically,youaddhalfoftheLSBofthefinalnumbertotheoriginalnumberbeforechoppingoffthelowerbits.Thisisthesameashowweroundwithdecimalnumbers.
8
Inordertoavoidoverflowwhenaddingtwo4-bitnumberstogether,weneedtousea5-bitadder.Forunsignedadd,wezeroextendtheinputsto5-bitandthenusea5-bitaddertoproduceS4:0asshwonhere.
Ofcourse,wecouldhaveusedthe4-bitadderandusethecarryoutC3asS4.
9
Tobuilda4-bitsignedadderWITHOUToverflow,weneedtofirstextendtheinputto5bitswithsignextensionsashownhere.Thenusea5-bitaddercircuittoproducea5-bitsignedresult.
10
Letusconsiderthepropagationdelaythrougha4-bitadder.TheworstcasepathisfromP0orQ0input,thenpassthroughthecarrychaintotheMSBsumoutput.
AssumingthegatedelaytoCis2andtoSis3,thentheworstcasedelayis9.
ThisiscalledaripplecarryadderbecausethecarrysignalhastopropagateallthewayfromtheLSBstagetotheMSBstage.
Anexamplescenariofortheworstcasepropagationdelayisshownhere.IfinitiallyP=4’b0000andQ=4’b1111,theS=5’b10000.
NowifPchangesto4’b0001,thenthis‘1’ intheLSBispropagatedallthewaytoS4.Theworst-casepathisexercised.
11
12
13
WehavealreadydiscussedtheinsidestructureoftheFPGAinLecture2.Hereisareminder.TheAlteraCycloneVFPGAhasamoreadvancedprogrammablelogicelementthanthesimple4-inputLUTthatwehaveconsidereduptonow.ThecallthisaAdaptiveLogicModuleorALM.
AnALMcantakeupto8Booleaninputsignalsandproducesfouroutputswithorwithoutaregister.Additionally,eachALMalsocanperformthefunctionofa2-bitbinaryfulladder.Thisiswhatinterestusmostforthislecture.
AsauseroftheCycloneVFPGA,youdon’tactuallyneedtoworrytoomuchaboutexactlyhowtheALMisconfiguredtoimplementyourdesign.TheCADsoftwarewilltakecareofthemappingbetweenyourdesignandthephysicalimplementationusingtheALMs.Itishoweverusefultoknowthatasthetechnologyevolves,moreandmorecomplicatedprogrammablelogicelementsarebeingdevelopedbythemanufacturersinordertoimprovetheareautilizationoftheFPGAs.
TheCycloneVontheDE1-SOCboardhas32,000ALMs,whichcouldbeestimatedtobeequivalentto85K+theoldstyleLEs.Puttingthisincontext,
youcouldputontothisonechip2,00032-bitbinaryaddercircuits!
13
14
15
GiventhattheFPGAhasspecialaddermode,youshouldneverspecifyyouradderasindividualfulladdercircuitsconnectedtogether.ThesynthesissystemWILLNOTbeabletoexploitthededicatedaddermodeconfigurationshowninthepreviousslide.Insteadusethe‘+’ operatorinVerilogasshownhere.Itissimplerandwillproduceaveryfastadder.
16
HowfastareadderswithinatypicalFPGA?Hereisann-bitaddercircuitsandwichedbetweenregisters.TheplotisbasedonaCycloneIIIFPGA.(Idon’thavethedataforCycloneV.)
Wecanusethetiminganalyzertoestimatehowfastwecanclockthiscircuitwithouterrorasthenumberofbitsnisincreasedfrom1to64.Theequationoftheredfittedline.Thisshowsthateachadderbitaddsaround57psdelay.Inaddition,thereisa1.8nsdelayfromclocktoQ+registersetuptime.IexpectthetimingfortheCycloneVdevicestobefaster.
17
OlderFPGA(s.g.CycloneIII)hasembeddedmultiplierstomakeimplementationofdigitalsignalprocessingalgorithmsmoreefficientontheFPGA.CycloneVhasDSPsupportfarsuperiortojustsimpleconfigurablemultipliers.TheDSPblocksontheCycloneVcanbeusedforacombinationofdifferentfunctions.Itcandomultiplicationofdifferentprecisionandalsotoperformmultiply-accumulatefunction.Accumulatorisanadderwhoseoutputisusedasoneofthetwoinputsoftheadderonthenextclockcycle.Thereforeanaccumulatorusuallyonlyhasoneinput,whosevalueget“accumulated” cycle-by-cycle.TheDSPblockalsohasinternalstoragetostoreaconstantvalue.Typicallythisisafiltercoefficientforimplementingafinite-impulseresponse(FIR)filter.DetailofhowtouseDSPblocktoimplementonetap(oronestage)ofaFIRfilterisbeyondthescopeofthiscourse.ThoseinterestedcanreadCycloneVDevicesHandbook,Vol.1,p.3-17.
18
Forthislecture,weconcentrateontheconfigurablemultipliersintheDSPblock.EachDSPblockcanbeconfiguredasthree9x9multipliers.Thisisparticularlyusefulforreal-timevideoprocessingsinceeachpixelvaluesareoftenrepresentedasan8-bitunsignedvalue(or3x8bitunsignednumberifweareusingcolour).Alternatively,eachblockcanprovideTWO18x18bit,orone27x27bitmultiplier(s).Ofcourseifyouneedlowernumberofbits(say14x14),youcanalwayseitherzero-extend,orsign-extendtheoperandstofitandusethe18x18multiplier.Eachmultipliercanalsobeconfiguredtooperatewiththeadderortheaccumulatorattheoutput.ItisimportanttounderstandthatwhenyoumultiplytwoNxMbitsunsignednumberstogether,yougetaproductwhichisN+Mbits.Thesamealsoappliesifyoumultiplyonesignedandoneunsignednumbertogether.Youcantrythisyourselffortwofour-bitnumbers!However,ifyoumultiplytwosigned2’scomplementnumberstogether,yougetaproductthatisonlyN+M-1bits.Thetoptwo-bitsarealwaysthesameandtheybothprovidethesignoftheproductvalue(i.e.yougettwoidenticalsignbitsintheproduct).Againtrythisyourselfwithtwo4-bitsignedmultiplication.
19
IfyouconfigureaDSPblocktobeTHREE9x9multipliers,thetwoinputoperandsaxandayarebit-cascadetoforma27bitinputvaluestotheDSPblockasshownabove.Similar,thethreeproductsareprovidedasthree18-bitRESULTvalueasa54-bitvalue.
20
ToinstantiatemultipliersinaCycloneV,wecanusetheIPCatalogtoolunderQuartus.MultiplierisshownunderLibrary>BasicFunctions>Arithmeticcategory.YoushouldchooseLPM_MULT.Adialogueformwillpopup.Youcanthencreatethetypeofmultiplieryouneedforyourdesign.Shownaboveisa10bitx14bitmultiplierusedinex15inPart3ofVERI.The10-bitinputisthefrequencyvaluespecifiedbytheswitchesortheADC,andthe14-bitisthemultiplyingconstant14’d10000.