Programming Patterns and Tools for Cloud Computing...• HDFS (Hadoop, Big Data) • Does not...

Preview:

Citation preview

Programming Patterns and Tools for Cloud

Computing SeSe ACC

fdsf

AmazonWebServicesHPHelionBackspaceSNICCloud

GoogleAppEngineOpenShiftCloudFoundry……

GoogleDocsOffice365DropboxStochSS

2SamJohnson,http://creativecommons.org/licenses/by-sa/3.0/

Basicservicemodels:IaaS,PaaS,SaaS

Virtual Appliances

3

Installationofcompletesoftwarestackpackagedasvirtualmachineimages.

• Distributedasfiles(imageformats)ordirectlyinacloudenvironment

• TypicallyusedtopackageupasingleapplicationasaVMI,simpledistribution

• CommonwaytomakeSaaSoutoflegacyapplication• Caneasymovebetweendifferenthostingenvironments(Public/PrivateCloud,VirtualBox,VMWareetc)

• Userstypicallydeployandusethemasvirtualprivateresources

https://en.wikipedia.org/wiki/Virtual_appliance

Platform as a Service (PaaS)

4

Thecapabilityprovidedtotheconsumeristodeployontothecloudinfrastructureconsumer-createdoracquiredapplica\onscreatedusingprogramminglanguages,libraries,services,andtoolssupportedbytheprovider.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructureincludingnetwork,servers,opera\ngsystems,orstorage,buthascontroloverthedeployedapplica\onsandpossiblyconfigura\onse^ngsfortheapplica\on-hos\ngenvironment.

AmajoraimofPaaSistoaidindevelopmentof“cloudy”applicationsby:

• ReducingtheburdenofexplicitlymanagingIaaS• Cloudinteroperability(agoalforsomeofthem)• Enforcingprogrammingmodels/frameworksthatleadstodesignsthatmakegooduseofthecloudmodel.

5

Platform as a Service (PaaS)DirectlyconsumingIaaSgivesgreatflexibilityandcontrolbutmaybetoocomplicatedfortheaverageuser.PlatformasaServiceaimsatsimplifyingapplicationdevelopmentbyabstractingawayIaaS.Also,insomecases,byenforcingcertainenvironmentsandprogrammingmodels,theplatformcanprovideaddedvalue,particularlyautoscaling(seeJohanTordsson’slectures).

Example: Google App Engine

6

• Letsyoueasilybuildscalablewebapplications• DevelopinalocalSDKsandbox,thendeploytoGoogleCloudwithnocodemodification

• Theplatformwillautoscaleyouwebservicetomeetincreasedloadandtraffic• Supportsdifferentlanguages,e.g.Python,Java,GoandyourfavoriteframeworkslikeDjango,Jinja2etc.

• ButnotallpossiblePythonpackages(sandbox)• Worksreallywellformany(traditional)webapplications• Sandboxcanbelimitingfornon-typicalapplications,likethosefromCSE

Example: CloudFoundry

http://12factor.net/

https://www.cloudfoundry.org/platform/

Example: OpenShift

8

https://www.openshift.org/

FromRedHat.SimilargoalsasCloudFoundy,basedonDockerandKubernetes

Software as a Service

9

Thecapabilityprovidedtotheconsumeristousetheprovider’sapplica\onsrunningonacloudinfrastructure.Theapplica\onsareaccessiblefromvariousclientdevicesthrougheitherathinclientinterface,suchasawebbrowser(e.g.,web-basedemail),oraprograminterface.Theconsumerdoesnotmanageorcontroltheunderlyingcloudinfrastructureincludingnetwork,servers,opera\ngsystems,storage,orevenindividualapplica\oncapabili\es,withthepossibleexcep\onoflimiteduser-specificapplica\onconfigura\onse^ngs.

10

www.molns.org,github.com/MOLNs/molnsCloudvirtualappliance/platformforlargescalesimulationswithPyURDME

Case study: MOLNs and StochSS

The tasks: Spatial Stochastic Simulations in Systems Biology

11

PyURDME(www.pyurdme.org)isamodelingandsimulationlibraryforstochasticreaction-diffusionsimulationsofbiochemicalnetworks.

• Stochasticprocesssimulatedonacomputationalmesh.

• Applicationsincludestudyinggeneregulation,yeastpolarization.

• Outputisatimeseriesofspatialdata,X.

What problems did we want to solve?

12

1. PyURDMEsimulations,inparticularparametersweeps,requirelargecomputationalresources.

13

Xisjustonepossibleoutcome,weneedtogeneratelargeensemblesofrealizationstoconductastatisticalanalysis.

1k-1Mtrajectories—> 10*1000~10Gbdata

Fallsinthecategory“High-TroughputComputing(HTC)”and“Many-TaskComputing(MTC)”

Potential solution, use HPC -clusters

• We did that for a while, and used them to make large sweeps and write papers ourselves. Then we started to think about the problem of scientific software. There were problems: • The modeling process really benefits from interactivity,

but HPC resources are not for that. • It was daunting to try to write software to work with any

type of university cluster. • The software stack needed is complex and fragile and

would likely be very time-consuming to install in the HPC environment (leading to long wait times for users)

• Computational experiments become hard to reproduce since they rely on specific resources and organizations.

14

Components in the implementation

15

Amazon EC2

HP Helion

OpenStack

PyURDME: !Spatial stochastic modeling

and simulation engine

molnsclient: automatic setup and provisioning of a distributed !! ! ! computational environment - the “virtual lab”

molnsutil: Library for scalable parallel execution of simulation engines. !! ! Unified data management in the cloud environments.

IPython project: Web based notebook front-end to Python. !! ! ! ! Sharable, computable documents. !

! ! ! ! ! ! Parallel computing in via Ipython.parallel.

MOLNs

Simulation engine

Data analysis frameworks

Simulation engine

Summary of Experiences so far

• Virtual appliance for building VM images with the software stack for one OS only reduces time to release, critical for a small science team.

• Some parts were easy, essentially just deploying existing software in a virtual environment (e.g. IPython)

• Special care was needed for data transport + Special care was needed for storage abstractions

• As the stack grows, the large monolithic virtual appliance/image is starting to give us problems. We are now looking into alternative technologies as a remedy, e.g. containers and more extensive use of orchestration tools.

16

What we have not cover in this course that is important for developing SaaS

• Scalable web sites/services • frameworks • sessions • authentication • databases • security

17

Inpractice,thisisanintegralpartofalmostallcloudcomputingsoftwareprojects,butheremanysourcesofinformation,toolsandframeworks(Webapp2,Djago,Flask..),PaaS(GAE…),existthatwillmakeyourlifefairlyeasy.

“I need my application to scale”• What do we mean with a scalable CSE application?

18

Tightly coupled applications?• Common in CSE, typical examples are Partial Differential

Equation (PDE) solvers

19

Communicationoverblock-boundaries

HPC programming models • Message Passing Interface (Distributed Memory) • OpenMP, Pthreads etc. (Shared Memory) • CUDA, OpenCL (GPGPU) • FGPA

20

• Low-latencyinterprocesscommunicationiscriticaltoapplicationperformance• Specializedhardware/instructionsets.• “Everycyclecounts”,usersofHPCbecomeconcernedaboutVMoverhead• Failedtaskscausesmajorproblems/performancelosses• Computeinfrastructureorganizedaccordingly• ’High-PerformanceComputing,HPCclusterresources.Waitinaqueuetoobtaindedicateddirectaccesstohigh-bandwidthhardwareresources.

“Cloud computing doesn’t scale”

21

Whenyouhearthosetypesofstatements,theytypicallyrefertoconceptionsabouttraditionalcommunicationintensiveHPCapplications,typicallyimplementedwithMPI.

• Referstolow-performingnetworkbetweeninstancesinPublicClouds• DoesnottypicallyaccountfortailoredHPCclusterofferingsandplacementgroups(availableine.g.EC2)

Representsaverynarrowviewonwhatapplicationsandprogrammingmodelsareadmissible.

http://star.mit.edu/cluster/StarClusterdeploysMPI-clustersoverAWS:

22

Vertical vs. Horizontal Scaling

Verticalscaling(“scale-up”): Increasethecapacityofyourserverstomeetapplicationdemand.

• AddmorevCPUs• AddmoreRAM/storage • FundamentallylimitedbyHWcapacityoftheunderlyinghost• Costincreasessharplyforhigh-endmachines

Horizontalscaling(“scale-out”): Meetincreasedapplicationdemandbyaddingmoreservers/storage/etc.

• Alsohaslimitations,butalotofworkgoingontoimprovescale-outproperties.• Costofaddinganothermachineofthesame(commodity)typescaleslinearlyTypicallywhatwearemostinterestedininCloudComputing.

Strong and Weak Scaling

23

Strongscaling:Howdoestheexecutiontimedecreaseasafunctionofincreasedresources,forafixedproblemsize?

HowquicklycanIsolveaproblemofthissizebyaddingmoreHW?

24

Weakscaling:Howdoestheexecutiontimebehaveasafunctionofincreasedresources,foranincreasingproblemsize?

Strong and Weak Scaling

HowmuchcanIgrowmyproblemsizeandstillmaintainthesameexecutiontime?

Scalability and Elasticity

25

Capabili\escanbeelas\callyprovisionedandreleased,insomecasesautoma\cally,toscalerapidlyoutwardandinwardcommensuratewithdemand.Totheconsumer,thecapabili\esavailableforprovisioningomenappeartobeunlimitedandcanbeappropriatedinanyquan\tyatany\me.

Designprinciplestoallowforhorizontalscalingandelasticity?

(Discuss/Brainstorminsmallgroupsfor5min)

Strive for statelessness of task/component

26

Controller

Worker

“Growingiseasierthanshrinking”Data,taskinput/output

Butofcourse,allcomputations/applicationsneeddataandstate.Itisnotsomuchaboutifyoustoreitasitisaboutwhereyoustoreit.

Ideallyyouwantyourtasktobeabletoexecuteanywhere,atanytime.

• Robustness• Canincrease,decreasenumberofworkers

27

Controller

Worker

Isthecontrollermachinedesignedforscalablestorage?

Highpressureoncriticalcomponent(robustness)

Networkapotentialperformancebottleneck

Use the object store when you can

28

Controller

Worker

S3,Swift

key-valuestorage,designedforhorizontalscaling

Inthiskindofmaster-slavesetupthecontrollerwillstillmaintainalotofinformation,routetasksetc.

Whyevenstoretaskinformationoncontroller?Performance(latency)isonegoodreason

Use the object store when you can

29

Controller

Worker

S3,Swift

Decouplethelife-timeofcomputeserversanddata!

• Neededforelasticity• Serversarecattle

http://www.slideshare.net/randybias/pets-vs-cattle-the-elastic-cloud-story

• Datacanbeprecious

30

Controller

Worker

Anotheroptionforrobustness,storereplicasofdataovermultiplehostsinadistributedfilesystem.

• HDFS(Hadoop,BigData)• Doesnotdecouplecompute/storagelifetimesaseasily• DesignedforextremeI/Othroughput

AsynchronousTask

3131

Controller

S3,Swift

Aimtoexecutetasksinnon-blockingmode,andhaveasfewglobalsynchronizationpointsaspossible.

32http://abhishek-tiwari.com/post/amqp-rabbitmq-and-celery-a-visual-guide-for-dummies#fn:1

key,valuestoredatabaseRedis

Example: Celery/RabbitMQ (Lab3)

Celery-DistributedTaskQueue

Our favorite example: MOLNs

33

PyURDME(www.pyurdme.org)isamodelingandsimulationlibraryforstochasticreaction-diffusionsimulationsofbiochemicalnetworks.

Wehaveseenhowweworkedwithautomation,orchestrationetctomakethingsintocloudservice,let’slookabitatthecomputationalproblem.

What problems did we want to solve?

34

1. PyURDMEsimulations,inparticularparametersweeps,requirelargecomputationalresources.

• MonteCarlo• Tasksarenumerous• Butindependent

mostlyMTCHPC,HTCorMTC?

Butalotofthetimewespendbuildinganddebuggingmodels…

ordevelopingsolvers

Forthis,wedon’tneedbigresources,butuserinteractivityisinvaluable.

Interactive Computation

35

Personalopinion:OneofthemoreexcitingthingsaboutcloudcomputingtechnologiesfromtheCSEperspectiveistheopportunitytoenableinteractiveparallelcomputing.

Input Output

Output

Input

Interactwithpartialoutputascomputationproceeds

Non-interactive,obtaincompleteoutputwhenjobisdone.

MOLNs revisited

36

Amazon EC2

HP Helion

OpenStack

PyURDME: !Spatial stochastic modeling

and simulation engine

molnsclient: automatic setup and provisioning of a distributed !! ! ! computational environment - the “virtual lab”

molnsutil: Library for scalable parallel execution of simulation engines. !! ! Unified data management in the cloud environments.

IPython project: Web based notebook front-end to Python. !! ! ! ! Sharable, computable documents. !

! ! ! ! ! ! Parallel computing in via Ipython.parallel.

MOLNs

Simulation engine

Data analysis frameworks

Simulation engine

Architecture

37

Workers

Controller

Workers

WebBrowser

Molns Client

SharedStorage

PersistentStorage

(S3 / Swift)

InfrastructureDeployment

IPythonNotebookServer

IPythonController

IPythonEngine(s)

IPythonEngine(s)

38

IPython ParallelTwotypesofviewstoyourcluster:

• direct_view• Supportsexplicitmappingofdata/taskstoengines(workers)• Runtasksonspecificmachines• MPI

• load_balanced_view• Asynchronoustaskqueue(usesZeroMQ)

Enginesalwaysblockonexecution

AsetofschedulersprovideasynchronousinterfacetoClients

IPython Parallel

39

• Let’syouworkwithmanymodelssimultaneously• InteractivelyfromIPython(Notebook)• Idealforrapidprototyping/simpleparallelflows

IPparallelinClouds:Doesnotattempttosolvetheproblemofdatamanagement

Oneofthegoalsof‘molnsutil’

Inmolnsutilwehaveabstractionsforstoragesystems: SharedStorage() PersistentStorage() LocalStorage()

Programming Patterns and Tools for Cloud

ComputingPart 2: “Big Data”

What is “Big Data”

41

Noteasytodefinejustbasedonsize:• Movingtarget,today’s“big”mightbetomorrows“small”• Rightnow,maybemultipleTBstoPBs(maybe)• Dependsonthetypeofdata(structured,unstructured)• Dependsonthe“velocity”ofthedata(streamingdata)

Relativetostate-of-theartcommoditysystems: Theproblemis“BigData”ifitcan’tbestored,handled,processedusingcommoditysystemsortraditionalRDMSsolutions.Typically,singleworkstations/disksarenotsufficient.MoreaboutthisinLectures2and3.

ThethreeV’s:Volume,Velocity,Variety

The three Vs, Volume, Velocity and Variety

42http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data

43

VolumeBigDataproblemsinvolvelargedatasets,andcomputationstypicallybecomeI/Obound:

• HardDrivescapacityto(cheaply)storelargeamountsofdatahasincreasedrapidlybutread/writespeedshavenot.

—>Distributedatasetsovermultipledisks/machinesandprocessitinparallel.

Posesfundamentalrequirementsonthesystemstosupportsuchanalysis(moreaboutthisinLectures2,3)

1000 Genomes project

44

Today,DNAsequencesfrom~2600individuals—>>460TBorrawdata(fullgenomicsequences),andgrowing.

Aim:Createanatlasofthehumangeneticvariation

FindbiomedicalrelevantDNAvariations

Understandsimpleandcomplextraits,andthein-betweens.

Capturegeneticvariationsthatisfoundinatleast1%ofthehumanpopulation

45

Fig. 3. Cloud computing usage in big data.

Ibrahim Abaker Targio Hashem, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, Abdullah Gani, Samee Ullah Khan

The rise of “big data” on cloud computing: Review and open research issues

Information Systems, Volume 47, 2015, 98–115

http://dx.doi.org/10.1016/j.is.2014.07.006

FrameWorks for Big Data

46

Problemswithverylargedatasetsrequirespecialframeworks:• Hadoop• Hive• Spark• …• …

Focusof“LargeDatasetsforScientificApplications” CanbefollowedasaPhDcourse.March28-endofsemester.

47Theriseof“bigdata”oncloudcomputing:Reviewandopenresearchissues

What is Hadoop?

48

OriginalImplementationofMapReducebyGoogle.(Read“googlemapreduce.pdf”intheportal)

HadoopisanOpenSourceimplementationinspiredbyGoogle’soriginalframework.

Firstdevelopers:DougCuttingandMikeCafarellaatthattimeatYahoo!,namedafterCutting’sson’stoyelephant.

Hadoop

49

Today,theHadoopecosystemisquitelarge,andcontainsmanyanalyticstools.Thecorecomponentsare:

• HDFS(DistributedFilesystem)• Hadoopcommons(libraries,utilitiesetc.)• YARN(Resourcemanagementplatform,notin1.2.1)• MapReduce

What is MapReduce ?

50

• Aprogrammingmodelforwriting(simple)(data)parallelprograms• Aframeworktoexecutesuchprograms.

Advantages:IfyoumanagetoexpressyourproblemasMapReduce,theframeworkhandles:

• Datadistribution(HDFS)• Fault-tolerance(onmanylevels)• Bringscomputationsclosetodata*• Runsoncommodityhardware• Scalesto1000’sofnodes(failureisinevitable) • Simpleprogrammingmodel(whenyouknowit…)

Thedata(andthecluster)shouldbeverylargefortheadvantagestobecomeapparent.

ItwasdesignedforthecaseofcommodityHWwithouthighperformancenetworkconnects.YoushouldnotthinkofHDFSasa“HPCparallelfilesystem”.Youdon’tcommunicatedatatocompute,youcommunicatecomputetodata.

Basic design

51

http://en.wikipedia.org/wiki/Apache_Hadoop

HandlesJobs,Tasks,logicofcomputingclosetodata

DistributedFilesystem,HDFS

Resources

52

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

ThereareacoupleofexcellentbasictutorialsforgettingstartedtoplayingwithHadooponsinglenodeclustersorsmalldistributedclusters(yes,thisisfun,IrecommenditifyouhavethetimeandasomeLinux(admin)experience).

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Also,itisrathereasytosetupa“toy-hadoop”onyourownlaptop:

https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html

Setting up Hadoop in Production

53

Thisisacomplexprocess.Inpractice,therearemanyvendorsthatprovidehostedservices,orprojectstomakeitsimplere.g.

• Cloudera• Hortonworks• ElasticMapReduce(EMR)inAWS• OpenStackSahara • ApacheBigtop

—>MoreimportanttogetfamiliarwithMapReduce.

54

https://developers.google.com/appengine/docs/python/dataprocessing/

So how do you use it?

55Frameworkprovidesoptimizedsort,shuffle.

Youprovide.

Key stages of MapReduce

56

1.Splitinput2.Map3.Shuffle4.Reduce

Hadoop Streaming

57

WhatifIdon’twanttouseJava?

TheHadoopstreamingAPIlet’syouwritethemapperandreducerasastandaloneprogram,readingfromstdinandwritingtostdout.YoumapperandreducercannowbewrittenusingPython,Ruby,Bashshell,R,younameit.

catinput.txt|./map|sort|./reduce>output.txt

Logically,itcorrespondstothefollowingUnixpipe:

Performance guidelines

•ThereisasignificantlatencyinstartingMap-tasksbytheframework•Mapsshoulddoconsiderablework(ruleofthumb,workatleast1min)•Defaultblocksizeis64-128MB,inputfilesshouldbelarge(GBsized)•Worksbestfor“one-shot”computations,notiterativeflows,duetothelargelatency.

•UseCombiner(localreduce)toreducethenetworkoverheadforshuffle/reduce.

58

Criticism against MapReduce• Restricted programming model. • Performance

59

http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html

Anoldbutstillinterestingdiscussion:

http://blog.tonybain.com/tony_bain/2010/09/was-stonebraker-right.html

• Not novel

Apache Spark

60