86
Computer Architecture ELE 475 / COS 475 Slide Deck 1: Introduction and Instruction Set Architectures David Wentzlaff Department of Electrical Engineering Princeton University 1

Computer Architecture ELE 475 / COS 475 Slide Deck 1

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ComputerArchitectureELE475/COS475

SlideDeck1:IntroductionandInstructionSetArchitectures

DavidWentzlaffDepartmentofElectricalEngineering

PrincetonUniversity

1

Page 2: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhatisComputerArchitecture?Application

2

Page 3: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhatisComputerArchitecture?

Physics

Application

3

Page 4: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhatisComputerArchitecture?

Physics

Application

Gaptoolargetobridgeinonestep

4

Page 5: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhatisComputerArchitecture?

Initsbroadestdefinition,computerarchitectureisthedesignoftheabstraction/implementationlayersthatallowustoexecuteinformationprocessingapplicationsefficientlyusingmanufacturingtechnologies

Physics

Application

Gaptoolargetobridgeinonestep

5

Page 6: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhatisComputerArchitecture?

Initsbroadestdefinition,computerarchitectureisthedesignoftheabstraction/implementationlayersthatallowustoexecuteinformationprocessingapplicationsefficientlyusingmanufacturingtechnologies

Physics

Application

Gaptoolargetobridgeinonestep

6

Page 7: Computer Architecture ELE 475 / COS 475 Slide Deck 1

AbstractionsinModernComputingSystems

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

7

Page 8: Computer Architecture ELE 475 / COS 475 Slide Deck 1

AbstractionsinModernComputingSystems

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

ComputerArchitecture(ELE475)

8

Page 9: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ComputerArchitectureisConstantlyChanging

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application ApplicationRequirements:•  Suggesthowtoimprovearchitecture•  Providerevenuetofunddevelopment

TechnologyConstraints:•  Restrictwhatcanbedoneefficiently•  Newtechnologiesmakenewarch

possible

9

Page 10: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ComputerArchitectureisConstantlyChanging

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application ApplicationRequirements:•  Suggesthowtoimprovearchitecture•  Providerevenuetofunddevelopment

TechnologyConstraints:•  Restrictwhatcanbedoneefficiently•  Newtechnologiesmakenewarch

possible

Architectureprovidesfeedbacktoguideapplicationandtechnologyresearchdirections

10

Page 11: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ComputersThen…

IASMachine.DesigndirectedbyJohnvonNeumann.FirstbootedinPrincetonNJin1952SmithsonianInstitutionArchives(SmithsonianImage95-06151) 11

Page 12: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ComputersNow

12

Robots

SupercomputersAutomobiles

Laptops

Set-topboxes

Smartphones

ServersMediaPlayers

SensorNets

Routers

CamerasGames

AudioAssistants

Page 13: Computer Architecture ELE 475 / COS 475 Slide Deck 1

[fromKurzweil]

MajorTechnologyGenerations Bipolar

nMOS

CMOS

pMOS

Relays

VacuumTubes

Electromechanical

13

Page 14: Computer Architecture ELE 475 / COS 475 Slide Deck 1

SequentialProcessorPerformance

14FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.

Page 15: Computer Architecture ELE 475 / COS 475 Slide Deck 1

SequentialProcessorPerformance

RISC

15FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.

Page 16: Computer Architecture ELE 475 / COS 475 Slide Deck 1

SequentialProcessorPerformance

RISC

Move to multi-processor

16FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.

Page 17: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CourseAdministrationInstructor: Prof.DavidWentzlaff([email protected]) Office:EQuadB228

OfficeHours:Mon.1:30-2:30pmEQuadB228TA: AngLi([email protected])

OfficeHours:TBDEQuadF210Lectures: Monday&Wednesday11am-12:20pmEQuadB205Text: ComputerArchitecture:AQuantitativeApproach

HennesseyandPatterson,5thEdition(2012) ModernProcessorDesign:FundamentalsofSuperscalar Processors(2004) JohnP.ShenandMikkoH.Lipasti

Prerequisite: ELE375&ELE206CourseWebpage:https://eleclass.princeton.edu/classes/ele475/spring_2018/ 17

Page 18: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CourseStructure

•  Midterm(20%)•  FinalExam(30%)•  Labs(25%)

–  4Designlabs(Verilog)•  DesignProject(20%)

– UsingOpenPiton–  Insmallgroups

•  ClassParticipation(5%)•  UngradedProblemSets(5PS’s)(0%)

–  Veryusefulforexampreparation18

Page 19: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CourseContentComputerOrganization(COS/ELE375)

ComputerOrganization•  BasicPipelinedProcessor

~50,000Transistors

PhotoofBerkeleyRISCI,©UniversityofCalifornia(Berkeley)19

Page 20: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

20

Page 21: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

~700,000,000Transistors21

Page 22: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

~700,000,000Transistors

ComputerOrganization(ELE375)Processor

22

Page 23: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

~700,000,000Transistors

ComputerOrganization(ELE375)Processor

•  InstructionLevelParallelism–  Superscalar–  VeryLongInstructionWord(VLIW)

•  LongPipelines(PipelineParallelism)

•  AdvancedMemoryandCaches•  DataLevelParallelism

–  Vector–  GPU

•  ThreadLevelParallelism–  Multithreading–  Multiprocessor–  Multicore–  Manycore

23

Page 24: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Architecturevs.Microarchitecture

“Architecture”/InstructionSetArchitecture:•  Programmervisiblestate(Memory&Register)•  Operations(Instructionsandhowtheywork)•  ExecutionSemantics(interrupts)•  Input/Output•  DataTypes/SizesMicroarchitecture/Organization:•  TradeoffsonhowtoimplementISAforsomemetric(Speed,Energy,Cost)

•  Examples:Pipelinedepth,numberofpipelines,cachesize,siliconarea,peakpower,executionordering,buswidths,ALUwidths

24

Page 25: Computer Architecture ELE 475 / COS 475 Slide Deck 1

SoftwareDevelopments

25

upto1955 Librariesofnumericalroutines-Floatingpointoperations-Transcendentalfunctions-Matrixmanipulation,equationsolvers,...

1955-60 HighlevelLanguages-Fortran1956OperatingSystems--Assemblers,Loaders,Linkers,Compilers-Accountingprogramstokeeptrackofusageandcharges

Page 26: Computer Architecture ELE 475 / COS 475 Slide Deck 1

SoftwareDevelopments

26

upto1955 Librariesofnumericalroutines-Floatingpointoperations-Transcendentalfunctions-Matrixmanipulation,equationsolvers,...

1955-60 HighlevelLanguages-Fortran1956OperatingSystems--Assemblers,Loaders,Linkers,Compilers-Accountingprogramstokeeptrackofusageandcharges

Machinesrequiredexperiencedoperators•  Mostuserscouldnotbeexpectedtounderstandtheseprograms,muchlesswritethem•  Machineshadtobesoldwithalotofresidentsoftware

Page 27: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CompatibilityProblematIBM

27

Byearly1960’s,IBMhad4incompatiblelinesofcomputers!

701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010

Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...

Page 28: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CompatibilityProblematIBM

28

Byearly1960’s,IBMhad4incompatiblelinesofcomputers!

701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010

Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...

Page 29: Computer Architecture ELE 475 / COS 475 Slide Deck 1

CompatibilityProblematIBM

29

Byearly1960’s,IBMhad4incompatiblelinesofcomputers!

701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010

Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...

⇒ IBM360

Page 30: Computer Architecture ELE 475 / COS 475 Slide Deck 1

30

IBM360:AGeneral-PurposeRegister(GPR)Machine

• ProcessorState–  16General-Purpose32-bitRegisters

• maybeusedasindexandbaseregister• Register0hassomespecialproperties

–  4FloatingPoint64-bitRegisters–  AProgramStatusWord(PSW)

•  PC,Conditioncodes,Controlflags

•  A32-bitmachinewith24-bitaddresses–  Butnoinstructioncontainsa24-bitaddress!

•  DataFormats–  8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words

Page 31: Computer Architecture ELE 475 / COS 475 Slide Deck 1

31

IBM360:AGeneral-PurposeRegister(GPR)Machine

• ProcessorState–  16General-Purpose32-bitRegisters

• maybeusedasindexandbaseregister• Register0hassomespecialproperties

–  4FloatingPoint64-bitRegisters–  AProgramStatusWord(PSW)

•  PC,Conditioncodes,Controlflags

•  A32-bitmachinewith24-bitaddresses–  Butnoinstructioncontainsa24-bitaddress!

•  DataFormats–  8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words

TheIBM360iswhybytesare8-bitslongtoday!

Page 32: Computer Architecture ELE 475 / COS 475 Slide Deck 1

32

IBM360:InitialImplementations Model 30 . . . Model 70

Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bit Circuit Delay 30 nsec/level 5 nsec/level Local Store Main Store Transistor Registers Control Store Read only 1µsec Conventional circuits

IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!

Page 33: Computer Architecture ELE 475 / COS 475 Slide Deck 1

33

IBM360:InitialImplementations Model 30 . . . Model 70

Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bit Circuit Delay 30 nsec/level 5 nsec/level Local Store Main Store Transistor Registers Control Store Read only 1µsec Conventional circuits

IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!

With minor modifications it still survives today!

Page 34: Computer Architecture ELE 475 / COS 475 Slide Deck 1

IBM360:Over50yearslater…ThezSeriesz14Microprocessor

•  5.2GHzinIBM14nmSOICMOStechnology•  6.1billiontransistorsin696mm2•  64-bitvirtualaddressing

–  originalS/360was24-bit,andS/370was31-bitextension•  10-coredesign•  6-fetch/cycle•  10-issue/cycleout-of-ordersuperscalarpipeline•  Out-of-ordermemoryaccesses•  Redundantdatapaths

–  everyinstructionperformedintwoparalleldatapathsandresultscompared

•  128KBL1I-cache,128KBL1D-cacheon-chip•  2MBprivateI-cacheL2percore•  4MBprivateD-cacheL2percore•  On-Chip128MBeDRAML3cache•  Upto672MBeDRAML4

ImageCredit:IBMCourtesyofInternationalBusiness

MachinesCorporation,©InternationalBusinessMachinesCorporation. 34

Page 35: Computer Architecture ELE 475 / COS 475 Slide Deck 1

SameArchitectureDifferentMicroarchitecture

AMDPhenomX4•  X86InstructionSet•  QuadCore•  125W•  Decode3Instructions/Cycle/Core•  64KBL1ICache,64KBL1DCache•  512KBL2Cache•  Out-of-order•  2.6GHz

IntelAtom•  X86InstructionSet•  SingleCore•  2W•  Decode2Instructions/Cycle/Core•  32KBL1ICache,24KBL1DCache•  512KBL2Cache•  In-order•  1.6GHz

ImageCredit:Intel

ImageCredit:AMD35

Page 36: Computer Architecture ELE 475 / COS 475 Slide Deck 1

DifferentArchitectureDifferentMicroarchitecture

AMDPhenomX4•  X86InstructionSet•  QuadCore•  125W•  Decode3Instructions/Cycle/Core•  64KBL1ICache,64KBL1DCache•  512KBL2Cache•  Out-of-order•  2.6GHz

IBMPOWER7•  PowerInstructionSet•  EightCore•  200W•  Decode6Instructions/Cycle/Core•  32KBL1ICache,32KBL1DCache•  256KBL2Cache•  Out-of-order•  4.25GHz

ImageCredit:IBMImageCredit:AMD

CourtesyofInternationalBusinessMachinesCorporation,©InternationalBusinessMachinesCorporation.

36

Page 37: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

37

Page 38: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

38

ALU

Page 39: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

39

ALU

Mem

ory

Page 40: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Processor

WhereDoOperandsComefromAndWhereDoResultsGo?

40

ALU

Mem

ory

Page 41: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

41

Page 42: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

42

…TOS

ALU

Processor

Mem

ory

Stack

Page 43: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

43

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

Stack Accumulator

Page 44: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

44

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Page 45: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

45

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

Page 46: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhereDoOperandsComefromAndWhereDoResultsGo?

46

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

0 1 2or3NumberExplicitlyNamedOperands:

ALU

Processor

Mem

ory

2or3

Page 47: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Stack-BasedInstructionSetArchitecture(ISA)

•  Burrough’sB5000(1960)•  Burrough’sB6700•  HP3000•  ICL2900•  Symbolics3600•  InmosTransputerModern•  Forthmachines•  JavaVirtualMachine•  Intelx87FloatingPointUnit

47

…TOS

ALU

Processor

Mem

ory

Page 48: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

48

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Page 49: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

49

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Page 50: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

50

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack

Page 51: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

51

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack push a

a

Page 52: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

52

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

Page 53: Computer Architecture ELE 475 / COS 475 Slide Deck 1

b

EvaluationofExpressions

53

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

push b

Page 54: Computer Architecture ELE 475 / COS 475 Slide Deck 1

b

EvaluationofExpressions

54

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

Page 55: Computer Architecture ELE 475 / COS 475 Slide Deck 1

c b

EvaluationofExpressions

55

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

push c

Page 56: Computer Architecture ELE 475 / COS 475 Slide Deck 1

c b

EvaluationofExpressions

56

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

Page 57: Computer Architecture ELE 475 / COS 475 Slide Deck 1

c b

EvaluationofExpressions

57

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

multiply

* b * c

Page 58: Computer Architecture ELE 475 / COS 475 Slide Deck 1

c b

EvaluationofExpressions

58

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

b * c

Page 59: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

59

a

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

add Evaluation Stack

b * c

Page 60: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

60

a

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

add

+

Evaluation Stack

b * c

Page 61: Computer Architecture ELE 475 / COS 475 Slide Deck 1

EvaluationofExpressions

61

a

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

add

+

Evaluation Stack

b * c a + b * c

Page 62: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Hardwareorganizationofthestack

•  Stackispartoftheprocessorstate⇒ stackmustbeboundedandsmall≈ numberofRegisters,notthesizeofmainmemory

•  Conceptuallystackisunbounded

⇒apartofthestackisincludedintheprocessorstate;therestiskeptinthemainmemory

62

Page 63: Computer Architecture ELE 475 / COS 475 Slide Deck 1

StackOperationsandImplicitMemoryReferences

• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.

Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference

NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.

Issue-whentoLoad/Unloadregisters?

63

Page 64: Computer Architecture ELE 475 / COS 475 Slide Deck 1

StackOperationsandImplicitMemoryReferences

• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.

Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference

NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.

Issue-whentoLoad/Unloadregisters?

64

Page 65: Computer Architecture ELE 475 / COS 475 Slide Deck 1

StackOperationsandImplicitMemoryReferences

• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.

Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference

NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.

Issue-whentoLoad/Unloadregisters?

65

Page 66: Computer Architecture ELE 475 / COS 475 Slide Deck 1

StackSizeandMemoryReferences

66

program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0

a b c * + a d c * + e - /

Page 67: Computer Architecture ELE 475 / COS 475 Slide Deck 1

StackSizeandMemoryReferences

67

program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0

a b c * + a d c * + e - /

4 stores, 4 fetches (implicit)

Page 68: Computer Architecture ELE 475 / COS 475 Slide Deck 1

StackSizeandExpressionEvaluation

68

program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0

a b c * + a d c * + e - /

Page 69: Computer Architecture ELE 475 / COS 475 Slide Deck 1

StackSizeandExpressionEvaluation

69

program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0

a b c * + a d c * + e - /

a and c are “loaded” twice ⇒not the best use of registers!

Page 70: Computer Architecture ELE 475 / COS 475 Slide Deck 1

MachineModelSummary

70

…TOS

ALU

Processor

Mem

ory

ALU

Processor

…Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

Page 71: Computer Architecture ELE 475 / COS 475 Slide Deck 1

MachineModelSummary

71

…TOS

ALU

Processor

Mem

ory

ALU

Processor

…Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

C=A+B

Page 72: Computer Architecture ELE 475 / COS 475 Slide Deck 1

MachineModelSummary

72

…TOS

ALU

Processor

Mem

ory

ALU

Processor

…Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

C=A+B

PushAPushBAddPopC

LoadAAddBStoreC

LoadR1,AAddR3,R1,BStoreR3,C

LoadR1,ALoadR2,BAddR3,R1,R2StoreR3,C

Page 73: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ClassesofInstructions•  DataTransfer

–  LD,ST,MFC1,MTC1,MFC0,MTC0•  ALU

–  ADD,SUB,AND,OR,XOR,MUL,DIV,SLT,LUI•  ControlFlow

–  BEQZ,JR,JAL,TRAP,ERET•  FloatingPoint

–  ADD.D,SUB.S,MUL.D,C.LT.D,CVT.S.W,•  Multimedia(SIMD)

–  ADD.PS,SUB.PS,MUL.PS,C.LT.PS•  String

–  REPMOVSB(x86)73

Page 74: Computer Architecture ELE 475 / COS 475 Slide Deck 1

AddressingModes:HowtoGetOperandsfromMemory

74

AddressingMode

Instruction Function

Register AddR4,R3,R2 Regs[R4]<-Regs[R3]+Regs[R2]**

Immediate AddR4,R3,#5 Regs[R4]<-Regs[R3]+5**

Displacement AddR4,R3,100(R1) Regs[R4]<-Regs[R3]+Mem[100+Regs[R1]]

RegisterIndirect

AddR4,R3,(R1) Regs[R4]<-Regs[R3]+Mem[Regs[R1]]

Absolute AddR4,R3,(0x475) Regs[R4]<-Regs[R3]+Mem[0x475]

MemoryIndirect

AddR4,R3,@(R1) Regs[R4]<-Regs[R3]+Mem[Mem[R1]]

PCrelative AddR4,R3,100(PC) Regs[R4]<-Regs[R3]+Mem[100+PC]

Scaled AddR4,R3,100(R1)[R5] Regs[R4]<-Regs[R3]+Mem[100+Regs[R1]+Regs[R5]*4]

**Maynotactuallyaccessmemory!

Page 75: Computer Architecture ELE 475 / COS 475 Slide Deck 1

DataTypesandSizes•  Types

–  BinaryInteger–  BinaryCodedDecimal(BCD)–  FloatingPoint

•  IEEE754•  CrayFloatingPoint•  IntelExtendedPrecision(80-bit)

–  PackedVectorData–  Addresses

•  Width–  BinaryInteger(8-bit,16-bit,32-bit,64-bit)–  FloatingPoint(32-bit,40-bit,64-bit,80-bit)–  Addresses(16-bit,24-bit,32-bit,48-bit,64-bit)

75

Page 76: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ISAEncodingFixedWidth:EveryInstructionhassamewidth•  Easytodecode(RISCArchitectures:MIPS,PowerPC,SPARC,ARM…)Ex:MIPS,everyinstruction4-bytesVariableLength:Instructionscanvaryinwidth•  Takeslessspaceinmemoryandcaches(CISCArchitectures:IBM360,x86,Motorola68k,VAX…)Ex:x86,instructions1-byteupto17-bytesMostlyFixedorCompressed:•  Ex:MIPS16,THUMB(onlytwoformats2and4bytes)•  PowerPCandsomeVLIWs(Storeinstructionscompressed,

decompressintoInstructionCache(Very)LongInstructionWord:•  Multipleinstructionsinafixedwidthbundle•  Ex:Multiflow,HP/STLx,TIC6000 76

Page 77: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ISAEncodingFixedWidth:EveryInstructionhassamewidth•  Easytodecode(RISCArchitectures:MIPS,PowerPC,SPARC,ARM…)Ex:MIPS,everyinstruction4-bytesVariableLength:Instructionscanvaryinwidth•  Takeslessspaceinmemoryandcaches(CISCArchitectures:IBM360,x86,Motorola68k,VAX…)Ex:x86,instructions1-byteupto17-bytesMostlyFixedorCompressed:•  Ex:MIPS16,THUMB(onlytwoformats2and4bytes)•  PowerPCandsomeVLIWs(Storeinstructionscompressed,

decompressintoInstructionCache(Very)LongInstructionWord:•  Multipleinstructionsinafixedwidthbundle•  Ex:Multiflow,HP/STLx,TIC6000 77

Page 78: Computer Architecture ELE 475 / COS 475 Slide Deck 1

x86(IA-32)InstructionEncoding

78

ImmediateDisplacementScale,Index,BaseModR/MOpcodeInstruction

Prefixes

0,1,2,or4bytes

0,1,2,or4bytes

1byte(ifneeded)

1byte(ifneeded)

1,2,or3bytes

UptofourPrefixes(1byteeach)

x86andx86-64InstructionFormatsPossibleinstructions1to18byteslong

Page 79: Computer Architecture ELE 475 / COS 475 Slide Deck 1

MIPS64InstructionEncoding

79ImageCopyright©2011,ElsevierInc.AllrightsReserved.

Page 80: Computer Architecture ELE 475 / COS 475 Slide Deck 1

RealWorldInstructionSets

80

Arch Type #Oper #Mem DataSize #Regs AddrSize Use

Alpha Reg-Reg 3 0 64-bit 32 64-bit Workstation

ARM Reg-Reg 3 0 32/64-bit 16 32/64-bit CellPhones,Embedded

MIPS Reg-Reg 3 0 32/64-bit 32 32/64-bit Workstation,Embedded

SPARC Reg-Reg 3 0 32/64-bit 24-32 32/64-bit Workstation

TIC6000 Reg-Reg 3 0 32-bit 32 32-bit DSP

IBM360 Reg-Mem 2 1 32-bit 16 24/31/64 Mainframe

x86 Reg-Mem 2 1 8/16/32/64-bit

4/8/24 16/32/64 PersonalComputers

VAX Mem-Mem 3 3 32-bit 16 32-bit Minicomputer

Mot.6800 Accum. 1 1/2 8-bit 0 16-bit Microcontroler

Page 81: Computer Architecture ELE 475 / COS 475 Slide Deck 1

WhytheDiversityinISAs?

TechnologyInfluencedISA•  Storageisexpensive,tightencodingimportant•  ReducedInstructionSetComputer

–  Removeinstructionsuntilwholecomputerfitsondie•  Multicore/Manycore

–  TransistorsnotturningintosequentialperformanceApplicationInfluencedISA•  InstructionsforApplications

–  DSPinstructions•  CompilerTechnologyhasimproved

–  SPARCRegisterWindowsnolongerneeded–  Compilercanregisterallocateeffectively

81

Page 82: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Recap

82

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

ComputerArchitecture(ELE475)

Page 83: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Recap

•  ISAvsMicroarchitecture•  ISACharacteristics

– MachineModels–  Encoding–  DataTypes–  Instructions–  AddressingModes

83

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

Page 84: Computer Architecture ELE 475 / COS 475 Slide Deck 1

ComputerArchitectureLecture1

NextClass:MicrocodeandReviewofPipelining

84

Page 85: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Acknowledgements•  Theseslidescontainmaterialdevelopedandcopyrightby:

–  Arvind(MIT)–  KrsteAsanovic(MIT/UCB)–  JoelEmer(Intel/MIT)–  JamesHoe(CMU)–  JohnKubiatowicz(UCB)–  DavidPatterson(UCB)–  ChristopherBatten(Cornell)

•  MITmaterialderivedfromcourse6.823•  UCBmaterialderivedfromcourseCS252&CS152•  CornellmaterialderivedfromcourseECE4750

85

Page 86: Computer Architecture ELE 475 / COS 475 Slide Deck 1

Copyright©2018DavidWentzlaff

86