52
CS 110 Computer Architecture Lecture 24: More I/O: DMA, Disks, Networking Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University 1 Slides based on UC Berkley's CS61C

CS 110 Computer Architecture Lecture 24

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 110 Computer Architecture Lecture 24

CS110ComputerArchitecture

Lecture24:MoreI/O:DMA,Disks,Networking

Instructor:SörenSchwertfeger

http://shtech.org/courses/ca/

School of Information Science and Technology SIST

ShanghaiTech University

1Slides based on UC Berkley's CS61C

Page 2: CS 110 Computer Architecture Lecture 24

VirtualMemoryReview

2

Page 3: CS 110 Computer Architecture Lecture 24

3

ModernVirtualMemorySystemsIllusionofalarge,private,uniformstore

Protection & Privacy* Many processes, each with their own private address space and one or more shared address spaces

Demand Paging* Many processes share DRAM. * Provides ability to run programs with large address space. Pages that aren’t yet allocated or pages that don’t fit swap to secondary storage.* Hides differences in machine configurations

The price is address translation on each memory reference

OS

Prog1

PrimaryMemory

Swapping Store (Disk)

VA PAmappingTLB

Page 4: CS 110 Computer Architecture Lecture 24

4

Private(Virtual)AddressSpaceperProgram

• Each prog has a page table • Page table contains an entry for each prog page

VA1Prog 1

Page Table

VA1Prog 2

Page Table

VA1Prog 3

Page Table

Phys

ical

Mem

ory

free

OSpages

Page 5: CS 110 Computer Architecture Lecture 24

5

TranslationLookaside Buffers(TLB)Address translation is very expensive!

In a two-level page table, each reference becomes several memory accesses

Solution: Cache translations in TLBTLB hit => Single-Cycle TranslationTLB miss => Page-Table Walk to refill

VPN offset

V R W D tag PPN

physical address PPN offset

virtual address

hit?

(VPN = virtual page number)

(PPN = physical page number)

Page 6: CS 110 Computer Architecture Lecture 24

6

Page-BasedVirtual-MemoryMachine(HardwarePage-TableWalk)

PCInst.TLB

Inst.Cache D Decode E M

DataCache W+

PageFault?Protectionviolation?

PageFault?Protectionviolation?

• Pagetablesheldinuntranslated physicalmemory

DataTLB

MainMemory (DRAM)

MemoryController PhysicalAddress

PhysicalAddress

PhysicalAddress

PhysicalAddress

Page-Table Base Register

VirtualAddress Physical

Address

VirtualAddress

HardwarePageTableWalker

Miss? Miss?

Page 7: CS 110 Computer Architecture Lecture 24

7

AddressTranslation:puttingitalltogetherVirtual Address

TLBLookup

Page TableWalk

Update TLBPage Fault(OS loads page)

ProtectionCheck

PhysicalAddress(to cache)

miss hit

the page is notinmemory inmemory denied permitted

ProtectionFault

hardwarehardware or softwaresoftware

SEGFAULTWhere?

Page 8: CS 110 Computer Architecture Lecture 24

Review:I/O• “MemorymappedI/O”:Devicecontrol/dataregistersmappedtoCPUaddressspace

• CPUsynchronizeswithI/Odevice:– Polling– Interrupts

• “ProgrammedI/O”:– CPUexecslw/sw instructionsforalldatamovementto/fromdevices

– CPUspendstimedoing2things:1. Gettingdatafromdevicetomainmemory2. Usingdatatocompute

8

Page 9: CS 110 Computer Architecture Lecture 24

Workingwithrealdevices• “MemorymappedI/O”:Devicecontrol/dataregistersmappedtoCPUaddressspace

• CPUsynchronizeswithI/Odevice:– Polling– Interrupts

• “ProgrammedI/O”:DMA– CPUexecslw/sw instructionsforalldatamovementto/fromdevices

– CPUspendstimedoing2 things:1. Gettingdatafromdevicetomainmemory2. Usingdatatocompute

9

Page 10: CS 110 Computer Architecture Lecture 24

Agenda

• DirectMemoryAccess(DMA)• Disks• Networking

10

Page 11: CS 110 Computer Architecture Lecture 24

What’swrongwithProgrammedI/O?

• Notidealbecause…1. CPUhastoexecutealltransfers,couldbedoing

otherwork2. Devicespeedsdon’talignwellwithCPUspeeds3. Energycostofusingbeefygeneral-purposeCPU

wheresimplerhardwarewouldsuffice• UntilnowCPUhassolecontrolofmainmemory

11

Page 12: CS 110 Computer Architecture Lecture 24

PIOvs.DMA

12

Page 13: CS 110 Computer Architecture Lecture 24

DirectMemoryAccess(DMA)

• AllowsI/Odevicestodirectlyread/writemainmemory

• NewHardware:theDMAEngine• DMAenginecontainsregisterswrittenbyCPU:

– Memoryaddresstoplacedata– #ofbytes– I/Odevice#,directionoftransfer– unitoftransfer,amounttotransferperburst

13

Page 14: CS 110 Computer Architecture Lecture 24

OperationofaDMATransfer

14

[FromSection5.1.4DirectMemoryAccessinModernOperatingSystemsbyAndrewS.Tanenbaum,HerbertBos,2014]

Page 15: CS 110 Computer Architecture Lecture 24

DMA:IncomingData

1. Receiveinterruptfromdevice2. CPUtakesinterrupt,beginstransfer

– InstructsDMAengine/devicetoplacedata@certainaddress

3. Device/DMAenginehandlethetransfer– CPUisfreetoexecuteotherthings

4. Uponcompletion,Device/DMAengineinterrupttheCPUagain

15

Page 16: CS 110 Computer Architecture Lecture 24

DMA:OutgoingData

1. CPUdecidestoinitiatetransfer,confirmsthatexternaldeviceisready

2. CPUbeginstransfer– InstructsDMAengine/devicethatdataisavailable@certainaddress

3. Device/DMAenginehandlethetransfer– CPUisfreetoexecuteotherthings

4. Device/DMAengineinterrupttheCPUagaintosignalcompletion

16

Page 17: CS 110 Computer Architecture Lecture 24

DMA:Somenewproblems

• WhereinthememoryhierarchydoweplugintheDMAengine?Twoextremes:– BetweenL1andCPU:

• Pro:Freecoherency• Con:TrashtheCPU’sworkingsetwithtransferreddata

– BetweenLast-levelcacheandmainmemory:• Pro:Don’tmesswithcaches• Con:Needtoexplicitlymanagecoherency

17

Page 18: CS 110 Computer Architecture Lecture 24

DMA:Somenewproblems

• HowdowearbitratebetweenCPUandDMAEngine/Deviceaccesstomemory?Threeoptions:– BurstMode

• Starttransferofdatablock,CPUcannotaccessmemoryinthemeantime

– CycleStealingMode• DMAenginetransfersabyte,releasescontrol,thenrepeats- interleavesprocessor/DMAengineaccesses

– TransparentMode• DMAtransferonlyoccurswhenCPUisnotusingthesystembus

18

Page 19: CS 110 Computer Architecture Lecture 24

Agenda

• DirectMemoryAccess(DMA)• Disks• Networking

19

Page 20: CS 110 Computer Architecture Lecture 24

ComputerMemoryHierarchy

20Today

Page 21: CS 110 Computer Architecture Lecture 24

MagneticDisk– commonI/Odevice• Akindofcomputermemory

– Informationstoredbymagnetizingferritematerialonsurfaceofrotatingdisk

• similartotaperecorderexceptdigitalratherthananalogdata

• Atypeofnon-volatilestorage– retainsitsvaluewithoutapplyingpowertodisk.

• TwoTypesofMagneticDisk1. HardDiskDrives(HDD)– faster,moredense,non-removable.2. Floppydisks– slower,lessdense,removable(nowreplacedbyUSB

“flashdrive”).

• Purposeincomputersystems(HardDrive):1. Workingfilesystem+long-termbackupforfiles2. Secondary“backingstore”formain-memory.Large,inexpensive,

slowlevelinthememoryhierarchy(virtualmemory)

21

Page 22: CS 110 Computer Architecture Lecture 24

PhotoofDiskHead,Arm,Actuator

Arm

Head

Spindle

22

Page 23: CS 110 Computer Architecture Lecture 24

DiskDeviceTerminology

• Severalplatters,withinformationrecordedmagneticallyonbothsurfaces(usually)

• Bitsrecordedintracks,whichinturndividedintosectors (e.g.,512Bytes)

• Actuator moveshead (endofarm)overtrack(“seek”),waitforsector rotateunderhead,thenreadorwrite

OuterTrack

InnerTrackSector

Actuator

HeadArmPlatter

23

Page 24: CS 110 Computer Architecture Lecture 24

HardDrivesareSealed.Why?• Theclosertheheadtothedisk,the

smallerthe“spotsize”andthusthedensertherecording.– MeasuredinGbit/in^2– ~900Gbit/in^2isstateoftheart– Startedoutat2Kbit/in^2– ~450,000,000ximprovementin~60

years• Disksaresealedtokeepthedust

out.– Headsaredesignedto“fly”ataround

3-20nmabovethesurfaceofthedisk.– 99.999%ofthehead/armweightis

supportedbytheairbearingforce(aircushion)developedbetweenthediskandthehead.

24

3-20nm

Page 25: CS 110 Computer Architecture Lecture 24

DiskDevicePerformance(1/2)

• Disk Access Time = Seek Time + Rotation Time + Transfer Time + Controller Overhead– SeekTime=timetopositiontheheadassemblyatthepropercylinder– RotationTime=timeforthedisktorotatetothepointwherethefirst

sectorsoftheblocktoaccessreachthehead– TransferTime=timetakenbythesectorsoftheblockandanygaps

betweenthemtorotatepastthehead

Platter

Arm

Actuator

HeadSectorInnerTrack

OuterTrack

ControllerSpindle

25

Page 26: CS 110 Computer Architecture Lecture 24

DiskDevicePerformance(2/2)

• Averagevaluestoplugintotheformula:• RotationTime:Averagedistanceofsectorfromhead?– 1/2timeofarotation

• 7200RevolutionsPerMinute=> 120Rev/sec• 1revolution=1/120sec=> 8.33milliseconds• 1/2rotation(revolution)=> 4.17ms

• Seektime:Averageno.trackstomovearm?– Numberoftracks/3– Then,seektime=numberoftracksmoved× timetomoveacrossonetrack

26

Page 27: CS 110 Computer Architecture Lecture 24

Butwait!

• Performanceestimatesaredifferentinpractice:

• Manydiskshaveon-diskcaches,whicharecompletelyhiddenfromtheoutsideworld– Previousformulacompletelyreplacedwithon-diskcacheaccesstime

27

Page 28: CS 110 Computer Architecture Lecture 24

WheredoesFlashmemorycomein?• ~10yearsago:Microdrives andFlashmemory(e.g.,CompactFlash)wenthead-to-head– Bothnon-volatile(retainscontentswithoutpowersupply)

– Flashbenefits:lowerpower,nocrashes(nomovingparts,needtospinµdrivesup/down)

– Diskcost=fixedcostofmotor+armmechanics,butactualmagneticmediacostverylow

– Flashcost=mostcost/bitofflashchips– Overtime,cost/bitofflashcamedown,becamecostcompetitive

28

Page 29: CS 110 Computer Architecture Lecture 24

FlashMemory/SSDTechnology

• NMOStransistorwithanadditional conductorbetweengateandsource/drainwhich“traps”electrons.Thepresence/absenceisa1or0

• Memorycellscanwithstandalimitednumber ofprogram-erasecycles.ControllersuseatechniquecalledwearlevelingtodistributewritesasevenlyaspossibleacrossalltheflashblocksintheSSD.

29

Page 30: CS 110 Computer Architecture Lecture 24

WhatdidAppleputinitsiPods?Samsung flash

16 GB

shuffle nano classic touch

Toshiba 1.8-inch HDD80, 120, 160 GB

Toshiba flash2 GB

Toshiba flash32, 64 GB

30

Page 31: CS 110 Computer Architecture Lecture 24

FlashMemoryinSmartPhones

31

iPhone6:upto128GB

Page 32: CS 110 Computer Architecture Lecture 24

FlashMemoryinLaptops– SolidStateDrive(SSD)

32

capacitiesupto1TB

Page 33: CS 110 Computer Architecture Lecture 24

HDDvsSSDspeed

33

Page 34: CS 110 Computer Architecture Lecture 24

34

Page 35: CS 110 Computer Architecture Lecture 24

Question• Wehavethefollowingdisk:

– 15000Cylinders,1ms tocross1000Cylinders– 15000RPM=4ms perrotation– Wanttocopy1MB,transferrateof1000MB/s– 1ms controllerprocessingtime

• Whatistheaccesstimeusingourmodel?

DiskAccessTime=SeekTime+RotationTime+TransferTime+ControllerProcessingTime

35

A B C D E

10.5 ms 9 ms 8.5ms 11.4ms 12 ms

Page 36: CS 110 Computer Architecture Lecture 24

Question

• Wehavethefollowingdisk:– 15000Cylinders,1ms tocross1000Cylinders– 15000RPM=4ms perrotation– Wanttocopy1MB,transferrateof1000MB/s– 1ms controllerprocessing time

• Whatistheaccesstime?Seek=#cylinders/3*time=15000/3*1ms/1000cylinders=5msRotation=timefor½rotation=4ms /2=2msTransfer=Size/transferrate=1MB/(1000MB/s)=1msController=1msTotal=5+2+1+1=9ms

36

Page 37: CS 110 Computer Architecture Lecture 24

Agenda

• DirectMemoryAccess(DMA)• Disks• Networking

37

Page 38: CS 110 Computer Architecture Lecture 24

Networks:TalkingtotheOutsideWorld

• OriginallysharingI/Odevicesbetweencomputers– E.g.,printers

• Thencommunicatingbetweencomputers– E.g.,filetransferprotocol

• Thencommunicatingbetweenpeople– E.g.,e-mail

• Thencommunicatingbetweennetworksofcomputers– E.g.,filesharing,www,…

38

Page 39: CS 110 Computer Architecture Lecture 24

• History– 1963:JCRLicklider,whileat

DoD’s ARPA,writesamemodescribingdesiretoconnectthecomputersatvariousresearchuniversities:Stanford,Berkeley,UCLA,...

– 1969:ARPAdeploys4“nodes”@UCLA,SRI,Utah,&UCSB

– 1973RobertKahn&Vint CerfinventTCP,nowpartoftheInternetProtocolSuite

• Internetgrowthrates– Exponentialsincestart!

TheInternet(1962)www.computerhistory.org/internet_history

www.greatachievements.org/?id=3736en.wikipedia.org/wiki/Internet_Protocol_Suite

“Lick”

Vint Cerf“Revolutions like this don't

come along very often”

39

Page 40: CS 110 Computer Architecture Lecture 24

• “SystemofinterlinkedhypertextdocumentsontheInternet”

• History– 1945:Vannevar Bushdescribes

hypertextsystemcalled“memex” inarticle

– 1989:SirTimBerners-Leeproposedandimplemented thefirstsuccessfulcommunicationbetweenaHypertextTransferProtocol(HTTP)clientandserverusingtheinternet.

– ~2000Dot-comentrepreneursrushedin,2001bubbleburst

• Today:Accessanywhere!

TheWorldWideWeb(1989)en.wikipedia.org/wiki/History_of_the_World_Wide_Web

Tim Berners-LeeWorld’s First web

server in 1990

40

Page 41: CS 110 Computer Architecture Lecture 24

Sharedvs.Switch-BasedNetworks

• Sharedvs.Switched:• Shared:1atatime(CSMA/CD)

• Switched: pairs(“point-to-point”connections)communicateatsametime

• Aggregatebandwidth(BW)inswitchednetworkismanytimesthatofshared:• point-to-pointfastersincenoarbitration,simplerinterface

Node Node Node

Shared

CrossbarSwitch

Node

Node

Node

Node

41

Page 42: CS 110 Computer Architecture Lecture 24

Whatmakesnetworkswork?• Links connectingswitchesand/orrouters toeachotherandtocomputersordevices

Computer

networkinterface

switch

switch

switch

• Abilitytonamethecomponentsandtoroutepacketsofinformation- messages- fromasourcetoadestination

• Layering,redundancy,protocols,andencapsulationasmeansofabstraction(bigideainComputerArchitecture)

42

Page 43: CS 110 Computer Architecture Lecture 24

SoftwareProtocoltoSendandReceive• SWSendsteps

1:ApplicationcopiesdatatoOSbuffer2:OScalculateschecksum,startstimer3:OSsendsdatatonetworkinterfaceHWandsaysstart

• SWReceivesteps3:OScopiesdatafromnetworkinterfaceHWtoOSbuffer2:OScalculateschecksum,ifOK,sendACK;ifnot,deletemessage (senderresendswhentimerexpires)

1:IfOK,OScopiesdatatouseraddressspace,&signalsapplicationtocontinue

Header Payload

Checksum

TrailerCMD/ Address /DataNet ID Net ID Len ACK

INFO

Dest Src

43

Page 44: CS 110 Computer Architecture Lecture 24

Whatdoesittaketosendpacketsacrosstheglobe?• Bitsonwireorair• Packetsonwireorair• Deliverypacketswithinasinglephysicalnetwork

• Deliverpacketsacrossmultiplenetworks• Ensurethedestinationreceivedthedata• Createdataatthesenderandmakeuseofthedataatthereceiver

Protocols for Networks of Networks?

44

Page 45: CS 110 Computer Architecture Lecture 24

Lotstodoandatmultiplelevels!

Useabstraction tocopewithcomplexityofcommunication

• Hierarchyoflayers:- Application(chatclient,game,etc.)

- Transport(TCP,UDP)- Network(IP)

- DataLinkLayer(Ethernet)

- PhysicalLink(copper,wireless,etc.)

ProtocolforNetworksofNetworks?

45

Page 46: CS 110 Computer Architecture Lecture 24

ProtocolFamilyConcept• Protocol:packetstructureandcontrolcommandstomanagecommunication

• Protocolfamilies(suites):asetofcooperatingprotocolsthatimplementthenetworkstack

• Keytoprotocolfamilies isthatcommunicationoccurslogically atthesameleveloftheprotocol,calledpeer-to-peer…

…butisimplementedviaservicesatthenextlowerlevel

• Encapsulation:carryhigherlevelinformationwithinlowerlevel“envelope”

46

Page 47: CS 110 Computer Architecture Lecture 24

Dear John,

Your days are numbered.

--Pat

Inspiration…

• CEO A writes letter to CEO B– Folds letter and hands it to assistant

• Assistant:– Puts letter in envelope with CEO B’s full name– Takes to FedEx

• FedEx Office– Puts letter in larger envelope– Puts name and street address on FedEx envelope– Puts package on FedEx delivery truck

• FedEx delivers to other company

47

Page 48: CS 110 Computer Architecture Lecture 24

CEO

Aide

FedEx

CEO

Aide

FedExLocationFedex Envelope(FE)

The Path of the Letter

Letter

Envelope

SemanticContent

Identity

“Peers”oneachsideunderstandthesamethingsNooneelseneedsto

Lowestlevelhasmostpackaging

48

Page 49: CS 110 Computer Architecture Lecture 24

ProtocolFamilyConcept

Message Message

TH Message TH Message TH TH

Actual Actual

Physical

Message TH Message THActual ActualLogical

Logical

49

Eachlowerlevelofstack“encapsulates”informationfromlayerabovebyaddingheaderandtrailer.

Page 50: CS 110 Computer Architecture Lecture 24

MostPopularProtocolforNetworkofNetworks

• TransmissionControlProtocol/InternetProtocol(TCP/IP)

• ThisprotocolfamilyisthebasisoftheInternet,aWAN(wideareanetwork)protocol• IPmakesbestefforttodeliver

• Packetscanbelost,corrupted

• TCPguaranteesdelivery• TCP/IPsopopularitisusedevenwhencommunicatinglocally:evenacrosshomogeneousLAN(localareanetwork)

50

Page 51: CS 110 Computer Architecture Lecture 24

Message

TCP/IPpacket,Ethernetpacket,protocols

• Applicationsendsmessage

TCP data

TCP HeaderIP Header

IP DataEH

Ethernet Hdr

Ethernet Hdr• TCPbreaksinto64KiBsegments,adds20Bheader

• IPadds20Bheader,sendstonetwork• IfEthernet,brokeninto1500Bpacketswithheaders,trailers

51

Page 52: CS 110 Computer Architecture Lecture 24

“Andinconclusion…”• I/Ogivescomputerstheir5senses• I/Ospeedrangeis100-milliontoone• Pollingvs.Interrupts• DMAtoavoidwastingCPUtimeondatatransfers• Disksforpersistentstorage,replacedbyflash• Networks:computer-to-computerI/O

– Protocolsuitesallownetworkingofheterogeneouscomponents.Abstraction!!!

52