Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
DistributedsystemsLecture1:Introductiontodistributedsystems;RPC
Lent2016Dr RobertN.M.Watson
(WiththankstoDr StevenHand)
1
RecommendedReading
• “DistributedSystems:ConceptsandDesign”,(5th Ed)Coulouris etal,Addison-Wesley2012
• “Distributed Systems: Principles and Paradigms”(2nd Ed),Tannenbaum etal,PrenticeHall,2006
• “OperatingSystems,ConcurrentandDistributedS/WDesign“,Bacon&Harris,Addison-Wesley2003– or“ConcurrentSystems”,(2nd Ed),JeanBacon,Addison-Wesley1997
2
WhatareDistributedSystems?
• Asetofdiscretecomputers(“nodes”)thatcooperatetoperformacomputation– Operates“asif”itwereasinglecomputingsystem
• Examplesinclude:– Computeclusters(e.g.CERN,HPCF)– BOINC(akaSETI@Homeandfriends)– Distributedstoragesystems(e.g.NFS,Dropbox,…)– TheWeb(client/server;CDNs;andback-endtoo!)– Peer-to-peersystemssuchasTor– Vehicles,factories,buildings(?)
3
Concurrentsystemsreminder• Foundationsofconcurrency:processor(s),ISAs,threads• Mutualexclusion:locks,semaphores,monitors,etc.• Producer-consumer,activeobjects,messagepassing• Races,deadlock,livelock,starvation,priorityinversion• Transactions,ACID,isolation,serialisability,schedules• 2-phaselocking,rollback,time-stampordering(TSO),optimisticconcurrencycontrol(OCC)
• Durability,write-aheadlogging,crashrecovery• Lock-freealgorithms,transactionalmemory• Operating-systemcasestudy
4
Theseproblemswerenotdifficultenough– distributedsystemsadd:lossofglobalvisibility;lossofglobalordering;newfailuremodes
DistributedSystems:Advantages• Scaleandperformance– Cheapertobuy100PCsthanasupercomputer…– …andeasiertoincrementallyscaleuptoo!
• SharingandCommunication– Allowaccesstosharedresources(e.g.aprinter)andinformation(e.g.distributedFSorDBMS)
– Enableexplicitcommunicationbetweenmachines(e.g.EDI,CDNs)orpeople(e.g.email,twitter)
• Reliability– Canhopefullycontinuetooperateevenifsomepartsofthesystemareinaccessible,orsimplycrash
5
DistributedSystems:Challenges
• DistributedSystemsareConcurrentSystems– Needtocoordinateindependentexecutionateachnode(c/ffirstpartofcourse)
• Failureofanycomponents(nodes,network)– Atanytime,foranyreason
• Networkdelays– Can’tdistinguishcongestionfromcrash/partition
• Noglobaltime– Trickytocoordinate,orevenagreeonordering!
6
Kernel
Localnetwork/OSservices
Kernel
Localnetwork/OSservices
Middleware
• Middleware helpsapplicationauthorswritesoftwareintendedtorunonmorethanonemachineatatime. 7
E.g.,TCP/IP,Ethernet
MachineBMachineA MachineB
Kernel
Localnetwork/OSservices
Middlewareservices
Distributedapplications
Network
E.g.,Linux,BSD,
Windows
E.g.,Javaruntime
E.g.,JavaRMI
Whatyouactuallywantedto
do!
Transparency&Middleware• Recalladistributedsystemshouldappear“asif”itwereexecutingonasinglecomputer
• Weoftencallthistransparency:– Userisunawareofmultiplemachines– Programmerisunawareofmultiplemachines
• How“unaware”canvaryquiteabit– e.g.webuserawarethatthere’snetworkcommunication...butnotthenumberorlocationofthemachinesinvolved
– e.g.programmermayexplicitlycodecommunication,ormayhavelayersofabstraction:middleware
8
ClassicaltypesofTransparencyTransparency Description
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use
Replication Hide that a resource may be provided by multiple cooperating systems
Concurrency Hide that a resource may be simultaneously shared by several competitive users
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
9Scalability increasinglyimportant– “performancetransparency”?
InthisCourse• Wewilllookattechniques,protocols&algorithmsusedindistributedsystems– inmanycases,thesewillbeprovidedforyoubyamiddlewaresoftwaresuite
– butknowinghowthingsworkwillstillbeuseful!• AssumeOS&networkingsupport– processes,threads,synchronization– basiccommunicationviamessages– (willseelaterhowassumptionsaboutmessageswillinfluencethesystemswe[can]build)
• Let’sstartwithasimpleclient-serversystems
10
Client-ServerModel• 1970s:developmentofLocalAreaNetworks(LANs)• 1980s:standarddeploymentinvolvessmallnumberofservers,plusmanyworkstations– Servers:always-on,powerfulmachines– Workstations:personalcomputers
• Workstationsrequest‘service’fromserversoverthenetwork,e.g.accesstoasharedfile-system:
11
Request-ReplyProtocols• Basicscheme:– Clientissuesarequestmessage– Serverperformsoperation,andsendsreply
• Simplestversionissynchronous:– clientblocksawaitingreply
• Example:HTTP1.0– Client(browser)sends“GET/index.html”– Webserverfetchesfileandreturnsit– BrowserdisplaysHTMLwebpage
• Laterwewilltalkaboutasynchronousmodels:– Clientscancontinueworkwithoutblockingawaitingreply
12
HandlingErrors&Failures
• Errors areapplication-level things=>easy;-)– E.g.clientrequestsnon-existentwebpage– Needspecialreply(e.g.“404NotFound”)
• Failures aresystem-level things,e.g.:– lostmessage,client/servercrash,networkdown,…
• Tohandlefailure,clientmusttimeout ifitdoesn’treceiveareplywithinacertaintimeT– Ontimeout,clientcanretry request– (Q:whatshouldwesetTto?)
13
RetrySemantics• Clientcouldtimeoutbecause:
1. Requestwaslost2. Requestwassent,butservercrashedonreceipt3. Requestwassent&received,andserverperformedoperation
(orsomeofit?),butcrashedbeforereplying4. Requestwassent&received,andserverperformedoperation
correctly,andsentreply…whichwasthenlost5. As#4,butreplyhasjustbeendelayedforlongerthanT
• Forread-onlystatelessrequests(likeHTTPGET),canretryinallcases,butwhatifrequestwasanorderwithAmazon?– Incase#1,weprobablywanttore-order…andincase#5we
wanttowaitforalittlebitlonger,andotherwisewe…erm?• Worse:wedon’tknowwhatcaseitactuallywas!
14
IdealSemantics
• Whatwewantisexactly-once semantics:– Ourrequestoccursoncenomatterhowmanytimesweretry(orifthenetworkduplicatesourmessages)
• E.g.addauniqueIDtoeveryrequest– ServerremembersIDs,andassociatedresponses– Ifseesaduplicate,justreturnsoldresponse– Clientignoresduplicateresponses
• Prettytrickytoensureexactly-onceinpractice– e.g.ifserverexplodes;-)
15
PracticalSemantics• Inpractice,protocolsguaranteeoneof:• All-or-nothing (atomic)semantics
– Useschemeonpreviouspage;persistentlog– (similarideatotransactionprocessing).
• At-most-once semantics– Requestcarriedoutonce,ornotatall– Ifnoreply,wedon’tknowwhichoutcomeitwas– e.g.sendonerequest;giveupontimeout
• At-least-once semantics– Retryontimeout; riskoperationoccurringagain– Okiftheoperationisread-only,oridempotent
• Note:Assumptionofnonetworkduplication
16
Serverstatenotrequired
Serverstaterequiredtosuppressretries
RemoteProcedureCall(RPC)• Request/responseprotocolsareuseful– andwidelyused– butratherclunkytouse– e.g.needtodefinethesetofrequests,includinghowtheyarerepresentedinnetworkmessages
• AnicerabstractionisRemoteProcedureCall(RPC)– Programmersimplyinvokesaprocedure…– …butitexecutesonaremotemachine(theserver)– RPCsubsystemhandlesmessageformats,sending&receiving,handlingtimeouts,etc
• Aimistomakedistribution(mostly)transparent– Certainfailurecaseswouldn’thappenlocally– Distributedandlocalfunctioncallperformancedifferent
17
MarshallingArguments
• RPCisintegratedwiththeprogramminglanguage– Someadditionalmagictospecifythingsareremote
• RPClayermarshals parameterstothecall,aswellasanyreturnvalue(s),e.g.
Caller RPCService RPCService RemoteFunction
call(…)
1)Marshalargs2)GenerateID4)Starttimer 5)Unmarshal args
6)RecordID
7)Marshalreturnvalues
9)Settimer10)Unmarshal
returnvalues11)Acknowledge
fun(…)
3)Sendmessage
18
8)Sendreply
IDLsandStubs• Tomarshal,theRPClayer(onbothsides!)mustknow:
– howmanyargumentstheprocedurehas,– howmanyresultsareexpected,and– thetypesofalloftheabove
• TheprogrammermustspecifythisbydescribingthingsinanInterfaceDefinitionLanguage(IDL)– Inhigher-levellanguages,thismayalreadybeincludedas
standard(e.g.C#,Java)– Inothers(e.g.C),IDLispartofthemiddleware
• TheRPClayercanthenautomaticallygeneratestubs– Smallpiecesofcodeatclientandserver(seeprevious)– Mayalsoprovideauthentication,encryption– Providesintegrity,confidentiality
19
Example:SunRPC• Developedmid80’sforSunUnixsystems• Simplerequest/responseprotocol:– Serverregistersoneormore“programs”(services)– Clientissuesrequeststoinvokespecificprocedureswithinaspecificservice
• Messagescanbesentoveranytransportprotocol(mostcommonlyUDP/IPandlaterTCP/IP)– RequestshaveauniquetransactionIDthatcanbeusedtodetect&handleretransmissions
– At-least-once semantics– Varioustypesofaccesstransparency includingbyte-order
20
XDR:ExternalDataRepresentation
• SunRPC usedXDR fordescribinginterfaces:
21
// file: test.xprogram test {
version testver { int get(getargs) = 1; // procedure numberint put(putargs) = 2; // procedure number
} = 1; // version number} = 0x12345678; // program number
• rpcgen generates[un]marshalingcode,stubs• Singlearguments…butrecursivelyconvertvalues• Somesupportforfollowingpointerstoo
• Dataonthewirealwaysinbig-endianformat(oops!)
UsingSunRPC1. WriteXDR,anduserpcgen togenerateskeletoncode2. Fillinblanks(i.e.writeclient/serverparts),compilecode3. Runserverprogram®isterwithportmapper (now:
rpcbind)– Mappingsfrom{prog#,ver#,proto}->port– (onLinux/UNIX,try“/usr/sbin/rpcinfo –p”)– Portmapper isitselfanRPCserviceonawell-knownport
4. Serverprocesswillthenlisten(),awaitingclients5. Whenaclientstarts,clientstubcallsclnt_create()
– Sends{prog#,ver#,proto}toportmapper onserver,receivesappropriateportnumbertouseforactualRPCconnection
– Clientinvokesremoteproceduresasneeded6. Recently:GSSauthentication/encryption– e.g.,Kerberos
22
Summary+nexttime• Aboutthiscourse• Advantagesandchallengesofdistributedsystems• Typesoftransparency(+scalability)• Middleware,theclient-servermodel• Errorsandretrysemantics• RPC,marshalling,SunRPC,andXDR
• Sun’sNetworkFileSystem(NFS)• Object-OrientedMiddleware(OOM)
23