CS 152 Computer Architecture and Engineering Lecture 1 ...cs152/sp16/lectures/L01-Intro.pdf · CS...

Preview:

Citation preview

CS152,Spring2016

CS152ComputerArchitectureandEngineering

Lecture1-Introduc:on

Dr.GeorgeMichelogiannakisEECS,UniversityofCaliforniaatBerkeley

CRD,LawrenceBerkeleyNaFonalLaboratory!

http://inst.eecs.berkeley.edu/~cs152!

CS152,Spring2016

Pronuncia:on

Miheloyannakis

(opFonal)

2

CS152,Spring2016

WhatisComputerArchitecture?

3

ApplicaFon

Physics

Gaptoolargetobridgeinonestep

InitsbroadestdefiniFon,computerarchitectureisthedesignoftheabstrac0onlayersthatallowustoimplementinformaFonprocessingapplicaFonsefficientlyusingavailablemanufacturingtechnologies.

(butthereareexcep0ons,e.g.magne0ccompass)

CS152,Spring2016

WhatisComputerArchitecture?

•  AsetofrulesandmethodsthatdescribethefuncFonality,organizaFonandimplementaFonofcomputersystems.

•  ComputerArchitectureisthescienceandartofselecFngandinterconnecFnghardwarecomponentstocreatecomputersthatmeetfuncFonal,performanceandcostgoals.

•  Computerarchitectureactsastheintermediatebetweenprogrammersanddevices(e.g.,VLSI).

•  Whatareyouheretolearn?

4

CS152,Spring2016 5

Abstrac:onLayersinModernSystems

Algorithm

Gates/Register-TransferLevel(RTL)

ApplicaFon

InstrucFonSetArchitecture(ISA)

OperaFngSystem/VirtualMachines

Microarchitecture

Devices

ProgrammingLanguage

Circuits

Physics

EE141CS150

CS162

CS170CS164

EE143

CS152

UCBEECSCourses

CS152,Spring2016

Costofso\waredevelopmentmakescompaFbilityamajorforceinmarket

ArchitectureCon:nuallyChanging

6

ApplicaFons

Technology

ApplicaFonssuggesthowtoimprovetechnology,providerevenuetofunddevelopment

ImprovedtechnologiesmakenewapplicaFonspossible

CS152,Spring2016

Example:x86BackwardsCompa:bility

•  Intel’s8086wasreleasedin1978with~50instrucFons•  Today,x86has~650withallextensions

–  Mostarerarelyemidedbycompilers

7

CS152,Spring2016 8

Compu:ngDevicesThen…

EDSAC,UniversityofCambridge,UK,1949

CS152,Spring2016 9

Compu:ngDevicesNow

Robots

Supercomputers Automobiles

Laptops

Set-top boxes

Smart phones

Servers Media Players

Sensor Nets

Routers

Cameras Games

CS152,Spring2016

Moore’sLaw

•  TheobservaFonthat,overthehistoryofcompuFnghardware,thenumberoftransistorsinadenseintegratedcircuit(chip)hasdoubledapproximatelyeverytwoyears.

10

CS152,Spring2016

DesignComplexity

11

CS152,Spring2016

DesignCapacity

•  In1978,Intelcoulddesignachip(8086)with29,000transistors

•  In2012,2,104million(IvyBridge)

•  Rocket(RISC-V)whichyou’llbeusinghas75+milliontransistors

•  DoeshumanitygetsmarterwithFme?

12

CS152,Spring2016

ComputerArchitectsThen

13

CS152,Spring2016

ComputerArchitectsNow

14

CS152,Spring2016

TechnologyTrends

15

CS152,Spring2016

PowerDissipa:on

16

CS152,Spring2016

PowerWallinModernProcessors

17

While at the same time chips keep getting larger.

Therefore, not all of the chip can be powered on at the same time

CS152,Spring2016 18

TheEndoftheUniprocessorEra

Singlebiggestchangeinthehistoryofcompu0ngsystems

CS152,Spring2016

WeWentFromThis

•  Cray-1

•  Singleprocessor

19

CS152,Spring2016

ToThis

•  Titan,anXK7supercomputeratOakRidgeNaFonalLaboratory(CrayXT3)(299,008AMDOpteroncores)

20

CS152,Spring2016

Result:SimpleCores

21

J. Huh, D.C. Burger, and S.W. Keckler. Exploring the Design Space of Future CMPs.

In International Conference on Parallel Architectures and Compilation Techniques (PACT), September, 2001

CS152,Spring2016

Result:SimpleCores

22

CS152,Spring2016

Result:Specializa:on

23

CS152,Spring2016

BeforeThat:DennardScaling

•  Power=AxCxFxV2

–  A:AcFvityfactor–  C:Capacitance–  F:Frequency–  V:Voltage

•  Capacitanceisrelatedtoarea–  So,asthesizeofthetransistorsshrunk,andthevoltagewasreduced,circuitscouldoperateathigherfrequenciesatthesamepower

•  Butleakagecurrentandthresholdvoltageoftransistorssetalowerboundforvoltage

•  Transistorsgetsmaller,theirpoweristhesame->Powerdensityincreases

24

CS152,Spring2016

ALITTLEHISTORYLearnfromthemistakesofothers

25

CS152,Spring2016

An:kytheraMechanism

•  FoundinaGreekshipbelievedtohavesankaround80B.C.

•  Itaccuratelypredictedlunarandsolareclipses,aswellassolar,lunarandplanetaryposiFons–  Size:8inchesacross

26

CS152,Spring2016 27

DifferenceEngine

1855.Cancomputeany6thdegreepolynomialbycalculaFngthedifferencebetween2Dmatrixelements

Speed:33to4432-digitnumbers

perminute!

Now the machine is at the Smithsonian

n

d2(n)

d1(n)

f(n)

0

41

1

2

2

2

3

2

4

2

4 6 8

43 47 53 61

CS152,Spring2016 28

HarvardMarkI

• Builtin1944inIBMEndico`laboratories– HowardAiken–ProfessorofPhysicsatHarvard– Essen:allymechanicalbuthadsomeelectro-magne:callycontrolledrelaysandgears

– Weighed5tonsandhad750,000components– Asynchronizingclockthatbeatevery0.015seconds(66Hz)–  InspiredbyCharlesBabbage’sanaly:cengine

Performance: 0.3 seconds for addition 6 seconds for multiplication 1 minute for a sine calculation Decimal arithmetic No Conditional Branch!

Broke down once a week!

CS152,Spring2016 29

ElectronicNumericalIntegratorandComputer(ENIAC)•  InspiredbyAtanasoffandBerry,EckertandMauchlydesignedand

builtENIAC(1943-45)attheUniversityofPennsylvania•  Thefirst,completelyelectronic,operaFonal,general-purpose

analyFcalcalculator!–  30tons,72squaremeters,200KW

•  Performance–  Readin120cardsperminute–  AddiFontook200µs,Division6ms–  1000FmesfasterthanMarkI

•  Notveryreliable!

Application: Ballistic calculations angle = f (location, tail wind, cross wind, air density, temperature, weight of shell, propellant charge, ... )

WW-2 Effort

CS152,Spring2016 30

Computersinmid50’s

•  Hardwarewasexpensive•  StoreinstrucFonsweresmall(1000words)

⇒Noresidentsystemso\ware!

•  MemoryaccessFmewas10to50Fmesslowerthantheprocessorcycle⇒InstrucFonexecuFonFmewastotallydominatedbythememory

reference0me.

•  TheabilitytodesigncomplexcontrolcircuitstoexecuteaninstrucFonwasthecentraldesignconcernasopposedtothespeedofdecodingoranALUoperaFon

•  Programmer’sviewofthemachinewasinseparablefromtheactualhardwareimplementaFon

•  MTBF20minuteswasstateoftheart

CS152,Spring2016 31

Compa:bilityProblematIBM

By early 60’s, IBM had 4 incompatible lines of computers!

701 → 7094 650 → 7074 702 → 7080 1401 → 7010

Each system had its own

•  Instruction set •  I/O system and Secondary Storage: magnetic tapes, drums and disks •  assemblers, compilers, libraries,... •  market niche

business, scientific, real time, ...

⇒ IBM 360

CS152,Spring2016 32

IBM360:DesignPremisesAmdahl,BlaauwandBrooks,1964

• Thedesignmustlenditselftogrowthandsuccessormachines• GeneralmethodforconnecFngI/Odevices• Totalperformance-answerspermonthratherthanbitspermicrosecond⇒ programmingaids

• MachinemustbecapableofsupervisingitselfwithoutmanualintervenFon

• Built-inhardwarefaultcheckingandlocaFngaidstoreducedownFme

• SimpletoassemblesystemswithredundantI/Odevices,memoriesetc.forfaulttolerance

• SomeproblemsrequiredfloaFng-pointlargerthan36bits

CS152,Spring2016 33

IBM360:AGeneral-PurposeRegister(GPR)Machine• ProcessorState

–  16General-Purpose32-bitRegisters» maybeusedasindexandbaseregister

» Register0hassomespecialproper0es–  4FloaFngPoint64-bitRegisters–  AProgramStatusWord(PSW)

» PC,Condi0oncodes,Controlflags•  A32-bitmachinewith24-bitaddresses

–  ButnoinstrucFoncontainsa24-bitaddress!•  DataFormats

–  8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words

The IBM 360 is why bytes are 8-bits long today!

CS152,Spring2016 34

IBM360:Ini:alImplementa:ons

Model30 ... Model70Storage 8K-64KB 256K-512KBDatapath 8-bit 64-bitCircuitDelay 30nsec/level 5nsec/levelLocalStore MainStore TransistorRegistersControlStore Readonly1µsec ConvenFonalcircuits

IBM360instruc0onsetarchitecture(ISA)completelyhidtheunderlyingtechnologicaldifferencesbetweenvariousmodels.Milestone:ThefirsttrueISAdesignedasportablehardware-soRwareinterface!

Withminormodifica0onsits0llsurvivestoday!

CS152,Spring2016 35

IBM360:47yearslater…ThezSeriesz11Microprocessor

•  5.2GHzinIBM45nmPD-SOICMOStechnology•  1.4billiontransistorsin512mm2•  64-bitvirtualaddressing

–  originalS/360was24-bit,andS/370was31-bitextension

• Quad-coredesign•  Three-issueout-of-ordersuperscalarpipeline• Out-of-ordermemoryaccesses•  Redundantdatapaths

–  everyinstrucFonperformedintwoparalleldatapathsandresultscompared

•  64KBL1I-cache,128KBL1D-cacheon-chip•  1.5MBprivateL2unifiedcachepercore,on-chip• On-Chip24MBeDRAML3cache•  Scalesto96-coremulFprocessorwith768MBofsharedL4eDRAM[ IBM, HotChips, 2010]

CS152,Spring2016

StorageDevicesAlsoProgressed

36

CS152,Spring2016

Magne:cStorageDevices

37

7.25 MB

CS152,Spring2016

LOGISTICS

38

CS152,Spring2016 39

RelatedCourses

CS61C CS152

CS150

Basiccomputerorganiza:on,firstlookatpipelines+caches

ComputerArchitecture,Firstlookatparallelarchitectures

DigitalLogicDesign,FPGAs

Strong

Prerequisite

CS250

VLSI Systems Design

CS252

GraduateComputerArchitecture,Advanced

Topics

CS152,Spring2016 40

CS61CvsCS152vsCS252

•  CS152focusesoninteracFonofso\wareandhardware–  morearchitectureandlessdigitalengineering–  moreusefulforOSdevelopers,compilerwriters,performanceprogrammers

•  Muchofthematerialyou’lllearnthistermwaspreviouslyinCS252–  SomeofthecurrentCS61CwasinCS252over20yearsago!–  Maybeevery10years,shi\CS252->CS152->CS61C?

•  CS152beginswhereCS61Cle\off(withoverlap)

•  CS252delvesintomoredetailandhasaresearchproject

CS152,Spring2016 41

CS152Execu:veSummary

TheprocessoryoubuiltinCS61C

Plus,thetechnologybehindchip-scalemulFprocessors(CMPs)andgraphicsprocessingunits(GPUs)

Whatyou’llunderstandandexperimentwithinCS152

CS152,Spring2016 42

CS152StructureandSyllabusFivemodules

1.  Simplemachinedesign(ISAs,microprogramming,unpipelinedmachines,IronLaw,simplepipelines)

2. Memoryhierarchy(DRAM,caches,opFmizaFons)plusvirtualmemorysystems,excepFons,interrupts

3.  Complexpipelining(score-boarding,out-of-orderissue)4.  Explicitlyparallelprocessors(vectormachines,VLIW

machines,mulFthreadedmachines)5. MulFprocessorarchitectures(memorymodels,cache

coherence,synchronizaFon)

CS152,Spring2016 43

CS152AdministriviaInstructor:GeorgeMichelogiannakis,mihelog@eecs!

OfficeHours:A\erlectures,Wednesdays11-12:30pm341ASodaT.A.: ColinSchmidt,colins@eecs

OfficeHours:Tuesday2-4pm651SodaLectures: M/W,9-10:30AM,306Soda

SecFon: Th2PM-4PM,9105LaFmerText: ComputerArchitecture:AQuan0ta0veApproach,

HennesseyandPaWerson,5thEdi0on(2012)ReadingsassignedfromthisediFon,somereadingsavailableinolder

ediFons–seewebpage.

Webpage:http://inst.eecs.berkeley.edu/~cs152!Lecturesavailableonline

Piazzza: http://piazza.com/berkeley/spring2016/cs152

CS152,Spring2016 44

CS152CourseComponents

•  15%Problemsets(onepermodule)–  Intendedtohelpyoulearnthematerial.Feelfreetodiscusswithotherstudentsandinstructors,butmustturninyourownsoluFons.Gradingbasedmostlyoneffort,butquizzesassumethatyouhaveworkedthroughallproblems.SoluFonsreleaseda\erPSshandedin

•  45%Quizzes(onepermodule)–  In-class,closed-book,nocalculators,nosmartphones,nolaptops,...–  Basedonlectures,readings,problemsets,andlabs

•  40%Labs(onepermodule)–  Labsuseadvancedprocessorandsystemsimulators–  Directedplusopen-endedsecFonstoeachlab

•  SecFonswillrevieweachoftheabove•  Checkthewebsitefordeadlines!•  SignupforPiazza!

CS152,Spring2016 45

CS152Labs•  Eachlabhasdirectedplusopen-endedassignments•  DirectedporFon(2/7)isintendedtoensurestudentslearnmainconceptsbehindlab–  Eachstudentmustperformownlabandhandintheirownlabreport

•  Open-endedassignment(5/7)istoallowyoutoshowyourcreaFvity–  Roughlya“mini-project”

»  E.g.,tryanarchitecturalideaandmeasurepotenFal,ortrytoimproveadesign.NegaFveresultsOK(ifexplainable!)

–  Studentscanworkindividuallyoringroupsoftwo–  Groupopen-endedlabreportsmustbehandedinseparately(butstatewhoyouworkedwith)

–  Studentscanworkindifferentgroupsfordifferentassignments

•  LabreportsmustbereadableEnglishsummaries•  Twofreetwo-dayextensionsperstudent•  YoumayhavetolearnscripFnglanguages

CS152,Spring2016

RISC-VISA

•  RISC-Visanewsimple,clean,extensibleISAwedevelopedatBerkeleyforeducaFonandresearch–  RISC-I/II,firstBerkeleyRISCimplementaFons–  BerkeleyresearchmachinesSOAR/SPURconsideredRISC-III/IV

•  BothofthedominantISAs(x86andARM)aretoocomplextouseforteaching

•  RISC-VISAmanualavailableonwebpage–  See“resources”onclasswebsite

•  FullGCC-basedtoolchainavailable

46

CS152,Spring2016

Chiselsimulators

•  ChiselisanewhardwaredescripFonlanguagewedevelopedatBerkeleybasedonScala–  ConstrucFngHardwareinaScalaEmbeddedLanguage

•  LabswilluseRISC-VprocessorsimulatorsderivedfromChiselprocessordesigns–  GivesyoumuchmoredetailedinformaFonthanothersimulators–  CanmaptoFPGAorrealchiplayout

•  YouneedtolearnsomeminimalChiselinCS152,butwe’llmakeChiselRTLsourceavailablesoyoucanseeallthedetailsofourprocessors

•  CandolabprojectsbasedonmodifyingtheChiselRTLcodeifdesired

47

CS152,Spring2016

ChiselDesignFlow

48

ChiselDesignDescripFon

C++code

FPGAVerilog

ASICVerilog

C++Simulator

C++Compiler

ChiselCompiler

FPGAEmulaFon

FPGATools

GDSLayout

ASICTools

CS152,Spring2016

FAMILIARITYQUIZ

49

CS152,Spring2016

PipelinedProcessor

50

CS152,Spring2016

VirtualAddresses

51

CS152,Spring2016

Caches

52

CS152,Spring2016

BirdsCache(hoard)too!

•  Sameidea.Bringvaluableobjectsclose•  AcornWoodpeckersstoretheirfoodinholesdrilledintrees

53

CS152,Spring2016 54

InConclusion

•  ComputerArchitecture>>ISAsandRTL•  CS152isaboutinteracFonofhardwareandso\ware,anddesignofappropriateabstracFonlayers

•  ComputerarchitectureisshapedbytechnologyandapplicaFons–  Historyprovideslessonsforthefuture

•  ComputerScienceatthecrossroadsfromsequenFaltoparallelcompuFng–  SalvaFonrequiresinnovaFoninmanyfields,includingcomputerarchitecture

•  ReadChapter1&AppendixAfornextFme!

CS152,Spring2016 55

Acknowledgements

•  Theseslidescontainmaterialdevelopedandcopyrightby:–  Arvind(MIT)–  KrsteAsanovic(MIT/UCB)–  JoelEmer(Intel/MIT)–  JamesHoe(CMU)–  JohnKubiatowicz(UCB)–  DavidPaderson(UCB)–  Variouswebsitesandpapers

•  MITmaterialderivedfromcourse6.823•  UCBmaterialderivedfromcourseCS252

Recommended