242

Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION
Page 2: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

INTRODUCTIONTOPYTHON

GregorvonLaszewski

(c)GregorvonLaszewski,2018,2019

Page 3: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

INTRODUCTIONTOPYTHON

1PREFACE1.1Disclaimer☁�1.1.1Acknowledgment1.1.2Extensions

2INTRODUCTION2.1IntroductiontoPython☁�2.1.1References

3INSTALATION3.1Python3.7.4Installation☁�3.1.1Hardware3.1.2PrerequisitsUbuntu19.043.1.3PrerequisitsmacOS3.1.3.1InstallationfromAppleAppStore3.1.3.2Installationfrompython.org3.1.3.3InstallationfromHoembrew

3.1.4PrerequisitsUbuntu18.043.1.5PrerequisiteWindows103.1.5.1LinuxSubsystemInstall

3.1.6Prerequisitvenv3.1.7InstallPython3.7viaAnaconda3.1.7.1Downloadcondainstaller3.1.7.2Installconda3.1.7.3InstallPython3.7.4viaconda

3.2Multi-VersionPythonInstallation☁�3.2.1Disablingwrongpythoninstalls3.2.2Managing2.7and3.7PythonVersionswithoutPyenv3.2.3ManagingMultiplePythonVersionswithPyenv3.2.3.1InstallationpyenvviaHomebrew3.2.3.2InstallpyenvonUbuntu18.043.2.3.3Usingpyenv3.2.3.3.1UsingpyenvtoInstallDifferentPythonVersions3.2.3.3.2SwitchingEnvironments

3.2.3.4UpdatingPythonVersionList3.2.3.4.1UpdatingtoanewversionofPythonwithpyenv

Page 4: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

3.2.4AnacondaandMinicondaandConda3.2.4.1Miniconda3.2.4.2Anaconda

3.2.5Exercises4FIRSTSTEPS4.1InteractivePython☁�4.1.1REPL(ReadEvalPrintLoop)4.1.2Interpreter4.1.3Python3FeaturesinPython2

4.2Editors☁�4.2.1Pycharm4.2.2Pythonin45minutes

4.3GoogleColab☁�4.3.1IntroductiontoGoogleColab4.3.2ProgramminginGoogleColab4.3.3BenchamrkinginGoogleColabwithCloudmesh

5LANGUAGE5.1Language☁�5.1.1StatementsandStrings5.1.2Comments5.1.3Variables5.1.4DataTypes5.1.4.1Booleans5.1.4.2Numbers

5.1.5ModuleManagement5.1.5.1ImportStatement5.1.5.2Thefrom…importStatement

5.1.6DateTimeinPython5.1.7ControlStatements5.1.7.1Comparison5.1.7.2Iteration

5.1.8Datatypes5.1.8.1Lists5.1.8.2Sets5.1.8.3RemovalandTestingforMembershipinSets5.1.8.4Dictionaries5.1.8.5DictionaryKeysandValues

Page 5: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

5.1.8.6CountingwithDictionaries5.1.9Functions5.1.10Classes5.1.11Modules5.1.12LambdaExpressions5.1.12.1map5.1.12.2dictionary

5.1.13Iterators5.1.14Generators5.1.14.1Generatorswithfunction5.1.14.2Generatorsusingforloop5.1.14.3GeneratorswithListComprehension5.1.14.4WhytouseGenerators?

6CLOUDMESH6.1Introduction☁�6.2Installation☁�6.2.1Prerequisite6.2.2BasicInstall

6.3Output☁�6.3.1Console6.3.2Banner6.3.3Heading6.3.4VERBOSE6.3.5Usingprintandpprint

6.4Dictionaries☁�6.4.1Dotdict6.4.2FlatDict6.4.3PrintingDicts

6.5Shell☁�6.6StopWatch☁�6.7CloudmeshCommandShell☁�6.7.1CMD56.7.1.1Resources6.7.1.2Installationfromsource6.7.1.3Execution6.7.1.4CreateyourownExtension6.7.1.5Bug:Quotes

Page 6: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

6.8Exercises☁�6.8.1CloudmeshCommon6.8.2CloudmeshShell

7LIBRARIES7.1PythonModules☁�7.1.1UpdatingPip7.1.2UsingpiptoInstallPackages7.1.3GUI7.1.3.1GUIZero7.1.3.2Kivy

7.1.4FormattingandCheckingPythonCode7.1.5Usingautopep87.1.6WritingPython3CompatibleCode7.1.7UsingPythononFutureSystems7.1.8Ecosystem7.1.8.1pypi7.1.8.2AlternativeInstallations

7.1.9Resources7.1.9.1JupyterNotebookTutorials

7.1.10Exercises7.2DataManagement☁�7.2.1Formats7.2.1.1Pickle7.2.1.2TextFiles7.2.1.3CSVFiles7.2.1.4Excelspreadsheets7.2.1.5YAML7.2.1.6JSON7.2.1.7XML7.2.1.8RDF7.2.1.9PDF7.2.1.10HTML7.2.1.11ConfigParser7.2.1.12ConfigDict

7.2.2Encryption7.2.3DatabaseAccess7.2.4SQLite

Page 7: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

7.2.4.1Exercises �7.3Plottingwithmatplotlib☁�7.4DocOpts☁�7.5OpenCV☁�7.5.1Overview7.5.2Installation7.5.3ASimpleExample7.5.3.1Loadinganimage7.5.3.2Displayingtheimage7.5.3.3ScalingandRotation7.5.3.4Gray-scaling7.5.3.5ImageThresholding7.5.3.6EdgeDetection

7.5.4AdditionalFeatures7.6SecchiDisk☁�7.6.1SetupforOSX7.6.2Step1:Recordthevideo7.6.3Step2:AnalysetheimagesfromtheVideo7.6.3.1ImageThresholding7.6.3.2EdgeDetection7.6.3.3Blackandwhite

8DATA8.1DataFormats☁�8.1.1YAML8.1.2JSON8.1.3XML

9MONGO9.1MongoDBinPython☁�9.1.1CloudmeshMongoDBUsageQuickstart9.1.2MongoDB9.1.2.1Installation9.1.2.1.1Installationprocedure

9.1.2.2CollectionsandDocuments9.1.2.2.1Collectionexample9.1.2.2.2Documentstructure9.1.2.2.3CollectionOperations

9.1.2.3MongoDBQuerying

Page 8: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9.1.2.3.1MongoQueriesexamples9.1.2.4MongoDBBasicFunctions9.1.2.4.1Import/Exportfunctionsexamples

9.1.2.5SecurityFeatures9.1.2.5.1Collectionbasedaccesscontrolexample

9.1.2.6MongoDBCloudService9.1.3PyMongo9.1.3.1Installation9.1.3.2Dependencies9.1.3.3RunningPyMongowithMongoDeamon9.1.3.4ConnectingtoadatabaseusingMongoClient9.1.3.5AccessingDatabases9.1.3.6CreatingaDatabase9.1.3.7InsertingandRetrievingDocuments(Querying)9.1.3.8LimitingResults9.1.3.9UpdatingCollection9.1.3.10CountingDocuments9.1.3.11Indexing9.1.3.12Sorting9.1.3.13Aggregation9.1.3.14DeletingDocumentsfromaCollection9.1.3.15CopyingaDatabase9.1.3.16PyMongoStrengths

9.1.4MongoEngine9.1.4.1Installation9.1.4.2ConnectingtoadatabaseusingMongoEngine9.1.4.3QueryingusingMongoEngine

9.1.5Flask-PyMongo9.1.5.1Installation9.1.5.2Configuration9.1.5.3Connectiontomultipledatabases/servers9.1.5.4Flask-PyMongoMethods9.1.5.5AdditionalLibraries9.1.5.6ClassesandWrappers

9.2Mongoengine☁�9.2.1Introduction9.2.2Installandconnect

Page 9: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9.2.3Basics10OTHER10.1WordCountwithParallelPython☁�10.1.1GeneratingaDocumentCollection10.1.2SerialImplementation10.1.3SerialImplementationUsingmapandreduce10.1.4ParallelImplementation10.1.5Benchmarking10.1.6Excersises10.1.7References

10.2NumPy☁�10.2.1InstallingNumPy10.2.2NumPyBasics10.2.3DataTypes:TheBasicBuildingBlocks10.2.4Arrays:StringingThingsTogether10.2.5Matrices:AnArrayofArrays10.2.6SlicingArraysandMatrices10.2.7UsefulFunctions10.2.8LinearAlgebra10.2.9NumPyResources

10.3Scipy☁�10.3.1Introduction10.3.2References

10.4Scikit-learn☁�10.4.1IntroductiontoScikit-learn10.4.2Installation10.4.3SupervisedLearning10.4.4UnsupervisedLearning10.4.5Building a end to end pipeline for Supervisedmachine learningusingScikit-learn10.4.6Stepsfordevelopingamachinelearningmodel10.4.7ExploratoryDataAnalysis10.4.7.1Barplot10.4.7.2Correlationbetweenattributes10.4.7.3HistogramAnalysisofdatasetattributes10.4.7.4BoxplotAnalysis10.4.7.5ScatterplotAnalysis

Page 10: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.4.8DataCleansing-RemovingOutliers10.4.9PipelineCreation10.4.9.1 Defining DataFrameSelector to separate Numerical andCategoricalattributes10.4.9.2FeatureCreation/AdditionalFeatureEngineering

10.4.10CreatingTrainingandTestingdatasets10.4.11Creatingpipelinefornumericalandcategoricalattributes10.4.12Selectingthealgorithmtobeapplied10.4.12.1LinearRegression10.4.12.2LogisticRegression10.4.12.3Decisiontrees10.4.12.4KMeans10.4.12.5SupportVectorMachines10.4.12.6NaiveBayes10.4.12.7RandomForest10.4.12.8Neuralnetworks10.4.12.9DeepLearningusingKeras10.4.12.10XGBoost

10.4.13ScikitCheatSheet10.4.14ParameterOptimization10.4.14.1Hyperparameteroptimization/tuningalgorithms

10.4.15 Experiments with Keras (deep learning), XGBoost, and SVM(SVC)comparedtoLogisticRegression(Baseline)10.4.15.1Creatingaparametergrid10.4.15.2 Implementing Grid search with models and also creatingmetricsfromeachofthemodel.10.4.15.3ResultstablefromtheModelevaluationwithmetrics.10.4.15.4ROCAUCScore

10.4.16K-meansinscikitlearn.10.4.16.1Import

10.4.17K-meansAlgorithm10.4.17.1Import10.4.17.2Createsamples10.4.17.3Createsamples10.4.17.4Visualize10.4.17.5Visualize

10.5Dask-RandomForestFeatureDetection☁�

Page 11: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.5.1Setup10.5.2Dataset10.5.3DetectingFeatures10.5.3.1DataPreparation

10.5.4RandomForest10.5.5Acknowledgement

10.6ParallelComputinginPython☁�10.6.1Multi-threadinginPython10.6.1.1ThreadvsThreading10.6.1.2Locks

10.6.2Multi-processinginPython10.6.2.1Process10.6.2.2Pool10.6.2.2.1SynchronousPool.map()10.6.2.2.2AsynchronousPool.map_async()

10.6.2.3Locks10.6.2.4ProcessCommunication10.6.2.4.1Value

10.7Dask☁�10.7.1HowDaskWorks10.7.2DaskBag10.7.3ConcurrencyFeatures10.7.4DaskArray10.7.5DaskDataFrame10.7.6DaskDataFrameStorage10.7.7Links

11APPLICATIONS11.1FingerprintMatching☁�11.1.1Overview11.1.2Objectives11.1.3Prerequisites11.1.4Implementation11.1.5Utilityfunctions11.1.6Dataset11.1.7DataModel11.1.7.1Utilities11.1.7.1.1Checksum

Page 12: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11.1.7.1.2Path11.1.7.1.3Image

11.1.7.2Mindtct11.1.7.3Bozorth311.1.7.3.1RunningBozorth311.1.7.3.1.1One-to-one11.1.7.3.1.2One-to-many

11.1.8Plotting11.1.9PuttingitallTogether

11.2NISTPedestrianandFaceDetection �☁�11.2.0.1Introduction11.2.0.1.1INRIAPersonDataset11.2.0.1.2HOGwithSVMmodel11.2.0.1.3AnsibleAutomationTool

11.2.0.2DeploymentbyAnsible11.2.0.3CloudmeshforProvisioning11.2.0.4RolesExplainedforInstallation11.2.0.4.1ServergroupsforMasters/SlavesbyAnsibleinventory

11.2.0.5InstructionsforDeployment11.2.0.5.1CloningPedestrianDetectionRepositoryfromGithub11.2.0.5.2AnsiblePlaybook

11.2.0.6OpenCVinPython11.2.0.6.1Importcv211.2.0.6.2ImageDetection

11.2.0.7HumanandFaceDetectioninOpenCV11.2.0.7.1INRIAPersonDataset11.2.0.7.2FaceDetectionusingHaarCascades11.2.0.7.3FaceDetectionPythonCodeSnippet

11.2.0.8PedestrianDetectionusingHOGDescriptor11.2.0.8.1PythonCodeSnippet

11.2.0.9ProcessingbyApacheSpark11.2.0.9.1ParallelizeinSparkContext11.2.0.9.2MapFunction(apply_batch)11.2.0.9.3CollectFunction

11.2.0.10Resultsfor100+imagesbySparkCluster12REFERENCES

Page 13: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

1PREFACE

SatNov2305:25:16EST2019☁�

1.1DISCLAIMER☁�ThisbookhasbeengeneratedwithCyberaideBookmanager.

Bookmanagerisatooltocreateapublicationfromanumberofsourcesontheinternet. It is especially useful to create customized books, lecture notes, orhandouts. Content is best integrated inmarkdown format as it is very fast toproducetheoutput.

Bookmanagerhasbeendevelopedbasedonourexperienceoverthelast3yearswith amore sophisticated approach.Bookmanager takes the lessons from thisapproachanddistributesatoolthatcaneasilybeusedbyothers.

The followingshieldsprovide some informationabout it.Feel free toclickonthem.

pypipypi v0.2.28v0.2.28 LicenseLicense Apache2.0Apache2.0 pythonpython 3.73.7 formatformat wheelwheel statusstatus stablestable buildbuild unknownunknown

1.1.1Acknowledgment

Ifyouusebookmanagertoproduceadocumentyoumustincludethefollowingacknowledgement.

“This document was produced with Cyberaide Bookmanagerdeveloped by Gregor von Laszewski available athttps://pypi.python.org/pypi/cyberaide-bookmanager. It is in theresponsibility of the user tomake sure an author acknowledgementsection is included in your document. Copyright verification ofcontentincludedinabookisresponsibilityofthebookeditor.”

Thebibtexentryis@Misc{www-cyberaide-bookmanager,

author={GregorvonLaszewski},

Page 14: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

1.1.2Extensions

We are happy to discuss with you bugs, issues and ideas for enhancements.Pleaseusetheconvenientgithubissuesat

https://github.com/cyberaide/bookmanager/issues

Pleasedonotfilewithusissuesthatrelatetoaneditorsbook.Theywillprovideyouwiththeirownmechanismonhowtocorrecttheircontent.

title={{CyberaideBookManager}},

howpublished={pypi},

month=apr,

year=2019,

url={https://pypi.org/project/cyberaide-bookmanager/}

}

Page 15: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

2INTRODUCTION

2.1INTRODUCTIONTOPYTHON☁�

LearningObjectives

Learn quickly Python under the assumption you know a programminglanguageWorkwithmodulesUnderstanddocoptsandcmdContuctsomepythonexamplestorefreshyourpythonknpwledgeLearnaboutthemapfunctioninPythonLearnhowtostartsubprocessesandrederecttheiroutputLearnmoreadvancedconstructssuchasmultiprocessingandQueuesUnderstandwhywedonotuseanacondaGetfamiliarwithpyenv

Portions of this lesson have been adapted from the official Python TutorialcopyrightPythonSoftwareFoundation.

Pythonisaneasytolearnprogramminglanguage.Ithasefficienthigh-leveldatastructuresandasimplebuteffectiveapproachtoobject-orientedprogramming.Python’ssimplesyntaxanddynamictyping,togetherwithitsinterpretednature,make it an ideal language for scripting and rapid application development inmanyareasonmostplatforms.ThePythoninterpreterandtheextensivestandardlibraryarefreelyavailableinsourceorbinaryformforallmajorplatformsfromthe PythonWeb site, https://www.python.org/, and may be freely distributed.ThesamesitealsocontainsdistributionsofandpointerstomanyfreethirdpartyPythonmodules,programsandtools,andadditionaldocumentation.ThePythoninterpretercanbeextendedwithnewfunctionsanddatatypesimplementedinCor C++ (or other languages callable from C). Python is also suitable as anextensionlanguageforcustomizableapplications.

Page 16: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Pythonisaninterpreted,dynamic,high-levelprogramminglanguagesuitableforawiderangeofapplications.

ThephilosophyofpythonissummarizedinTheZenofPythonasfollows:

ExplicitisbetterthanimplicitSimpleisbetterthancomplexComplexisbetterthancomplicatedReadabilitycounts

ThemainfeaturesofPythonare:

UseofindentationwhitespacetoindicateblocksObjectorientparadigmDynamictypingInterpretedruntimeGarbagecollectedmemorymanagementalargestandardlibraryalargerepositoryofthird-partylibraries

Python is used by many companies and is applied for web development,scientific computing, embedded applications, artificial intelligence, softwaredevelopment,andinformationsecurity,tonameafew.

The material collected here introduces the reader to the basic concepts andfeaturesofthePythonlanguageandsystem.Afteryouhaveworkedthroughthematerialyouwillbeableto:

usePythonusetheinteractivePythoninterfaceunderstandthebasicsyntaxofPythonwriteandrunPythonprogramshaveanoverviewofthestandardlibraryinstall Python libraries using pyenv for multipython interpreterdevelopment.

Edoenotattempttobecomprehensiveandcovereverysinglefeature,orevenevery commonly used feature. Instead, it introduces many of Python’s most

Page 17: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

noteworthyfeatures,andwillgiveyouagoodideaofthelanguage’sflavorandstyle.After reading it, youwillbeable to readandwritePythonmodulesandprograms,andyouwillbereadytolearnmoreaboutthevariousPythonlibrarymodules.

Inordertoconductthislessonyouneed

AcomputerwithPython2.7.16or3.7.4FamiliaritywithcommandlineusageA text editor such as PyCharm, emacs, vi or others.You should identitywhichworksbestforyouandsetitup.

2.1.1References

Some important additional information can be found on the following Webpages.

PythonPipVirtualenvNumPySciPyMatplotlibPandaspyenvPyCharm

Python module of the week is a Web site that provides a number of shortexamplesonhowtousesomeelementarypythonmodules.Notallmodulesareequallyusefulandyoushoulddecideiftherearebetteralternatives.Howeverforbeginnersthissiteprovidesanumberofgoodexamples

Python2:https://pymotw.com/2/Python3:https://pymotw.com/3/

Page 18: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

3INSTALATION

3.1PYTHON3.7.4INSTALLATION☁�

LearningObjectives

Learnhowtoinstallpython.FindadditionalinformationaboutPython.MakesureyourComputersupportsPython.

Inthissetionweexplainhowtoinstallpython3.7.4onacomputer.Likelymuchofthecodewillworkwithearlierversions,butwedothedevelopmentinPythononthenewestversionofpythonavailableathttps://www.python.org/downloads.

3.1.1Hardware

Python does not require any special hardware.We have installed Python notonlyonPC’sandLaptops,butalsoonRaspberryPI’sandLegoMindstorms.

However,therearesomethingstoconsider.Ifyouusemanyprogramsonyourdesktop and run them all at the same time you will find that in up-to-dateoperating systems you will find your self quickly out of memmory. This isespeciallytrueifyouuseeditorssuchasPyCharmwhichwehighlyrecommend.Furthermore,asyoulikelyhavelotsofdiskaccess,makesuretouseafastHDDorbetteranSSD.

AtypicalmoderndeveloperPCorLaptophas16GBRAMandanSSD.Youcancertainlydopythonona$35RapbperryPI,butyouprobablywillnotbeabletorun PyCharm. There are many alternative editors with lessMemory footprintavialable.

3.1.2PrerequisitsUbuntu19.04

Page 19: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Python 3.7 is installed in ubuntu 19.04. Therefore, it already fulfills theprerequisits.Howeverwerecommendthatyouupdate to thenewestversionofpythonandpip.Howeverwerecommendthatyouupdatethethenewestversionofpython.Pleasevisit:https://www.python.org/downloads

3.1.3PrerequisitsmacOS

3.1.3.1InstallationfromAppleAppStore

Youwant a number of useful tool on yourmacOS. They are not installed bydefault,butareavailableviaXcode.Firstyouneedtoinstallxcodefrom

https://apps.apple.com/us/app/xcode/id497799835

NextyouneedtoinstallmacOSxcodecommandlinetools:

3.1.3.2Installationfrompython.org

The easiest instalation of Python for cloudmesh is to use the instaltion fromhttps://www.python.org/downloads. Please, visit the page and follow theinstructions.Afterthisinstallyouhavepython3avalablefromthecommandline

3.1.3.3InstallationfromHoembrew

An alternative instalation is provided from Homebrew. To use this installmethod,youneed to installHomebrewfirst.Start theprocessby installing thepython3usinghomebrew.Installhomebrewusingtheinstructionintheirwebpage:

ThenyoushouldbeabletoinstallPython3.7.4using:

3.1.4PrerequisitsUbuntu18.04

We recommend you update your ubuntu version to 19.04 and follow the

$xcode-select--install

$/usr/bin/ruby-e"$(curl-fsSLhttps://raw.githubusercontent.com/Homebrew/install/master/install)"

$brewinstallpython

Page 20: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

instructionsforthatversioninstead,asitissignificantlyeasier.Ifyouhoweverarenotabletodoso,thefollowinginstructionsmaybehelpful.

Wefirstneed tomakesure that thecorrectversionof thePython3is installed.ThedefaultversionofPythononUbuntu18.04is3.6.Youcangettheversionwith:

Iftheversionisnot3.7.4ornewer,youcanupdateitasfollows:

Youcan thencheck the installedversionusing python3.7--version which should be3.7.4.

Nowwewillcreateanewvirtualenvironment:

Theeditthe~/.bashrcfileandaddthefollowinglineattheend:

nowactivatethevirtualenvironmentusing:

nowyoucaninstallthepipforthevirtualenvironmentwithoutconflictingwiththenativepip:

3.1.5PrerequisiteWindows10

Python 3.7 can be installed on Windows 10 using:https://www.python.org/downloads

For3.7.4cangoto thedownloadpageanddownloadoneof thedifferent filesforWindows.

$python3--version

$sudoapt-getupdate

$sudoaptinstallsoftware-properties-common

$sudoadd-apt-repositoryppa:deadsnakes/ppa

$sudoapt-getinstallpython3.7python3-devpython3.7-dev

$python3.7-mvenv--without-pip~/ENV3

aliasENV3="source~/ENV3/bin/activate"

ENV3

$source~/.bashrc

$curl"https://bootstrap.pypa.io/get-pip.py"-o"get-pip.py"

$pythonget-pip.py

$rmget-pip.py

Page 21: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

LetusassumeyouchoetheWebbasedinstaller,thanyouclickonthefileintheedge browser (make sure the account you use has administrative priviledges).Followtheinstructionsthattheinstallergives.Importantisthatyouselectatonepoint“[x]AddtoPath”.Therewillbeanemptycheckmarkaboutthisthatyouwillclickon.

Onceitisinstalled.choseaterminalandexecute

However, ifyouhave installedconda for somereasonyouneed to readuponhowtoinstall3.7.4pythonincondaoridentifyhowtoruncondaandpython.orgatthesametime.Weseeoftenothersgivingthewronginstallationinstructions.

Analternative is tousepythonfromwithin theLinuxSubsystem.But thathassomelimitationsandyouwillneedtoexplorehowtoexxessthefilesysteminthesubssytemtohaveasmoothintegrationbetweenyourWindowshostsoyoucanforexampleusePyCharm.

3.1.5.1LinuxSubsystemInstall

ToactivatetheLinuxSubsystem,pleasefollowtheinstructionsat

https://docs.microsoft.com/en-us/windows/wsl/install-win10

Asuitabledistributionwouldbe

https://www.microsoft.com/en-us/p/ubuntu-1804-lts/9n9tngvndl3q?activetab=pivot:overviewtab

Howeverasitusesanolderversionofpythonyouwillahvetoupdateit.

3.1.6Prerequisitvenv

This step is highly recommend if you have not yet already installed a venv forpythontomakesureyouarenotinterferingwithyoursystempython.NotusingavenvcouldhavecatastrophicconsequencesandadestructionofyouroperatingsystemtoolsiftheyrealyonPython.Theuseofvenvissimple.Forourpurposesweassumethatyouusethedirectory:

python--version

Page 22: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Followthesestepsfirst:

Firstcdtoyourhomedirectory.Thenexecute

Youcanaddat theendofyour .bashrc(ubuntu)or .bash_profile (macOS)filetheline

sotheenvironmentisalwaysloaded.Nowyouarereadytoinstallcloudmesh.

Checkifyouhavetherightversionofpythoninstalledwith

Tomakesureyouhaveanuptodateversionofpipissuethecommand

3.1.7InstallPython3.7viaAnaconda

3.1.7.1Downloadcondainstaller

Minicondaisrecommendedhere.Downloadan installerforWindows,macOS,andLinuxfromthispage:https://docs.conda.io/en/latest/miniconda.html

3.1.7.2Installconda

Followinstructionstoinstallcondaforyouroperatingsystems:

Windows. https://conda.io/projects/conda/en/latest/user-guide/install/windows.htmlmacOS. https://conda.io/projects/conda/en/latest/user-guide/install/macos.htmlLinux.https://conda.io/projects/conda/en/latest/user-guide/install/linux.html

~/ENV3

$python3-mvenv~/ENV3

$source~/ENV3/bin/activate

$source~/ENV3/bin/activate

$python--version

$pipinstallpip-U

Page 23: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

3.1.7.3InstallPython3.7.4viaconda

Itisveryimportanttomakesureyouhaveanewerversionofpipinstalled.Afteryou installed and created theENV3you need to activate it. This can be donewith

Ifyou like toactivate itwhenyoustartanewterminal,pleaseadd this line toyour.bashrcor.bash_profile

Ifyouusezshpleaseadditto.zprofileinstead.

3.2MULTI-VERSIONPYTHONINSTALLATION☁�

LearningObjectives

Understandwhyweneedtoworryaboutpython3.7and2.7UsepyenvtosupportbothversionsUnderstandthelimitationsofanaconda/condafordevelopers

WearelivinginaninterestingjunctionpointinthedevelopmentofPython.InJanuary 2019, it is encouraged that Python developers swoth from pythonversion2.7topythonversion3.7.

Howevertheremaybetherequirementwhenyoustillneedtodevelopcodenotonlyinpython3.7butalsoinpython2.7.Tofacilitatethismulti-pythonversiondevelopment,thebesttoolweknowaboutcapableofdoingsoispyenv.Wewillexplainyouinthissectionhowtoinstallbothversionswiththehelpofpyenv.

Python is easy to install andverygood instructions formostplatformscanbefoundonthepython.orgWebpage.Weseetwodifferentversions:

$cd~

$condacreate-nENV3python=3.7.4

$condaactivateENV3

$condainstall-canacondapip

$condadeactivateENV3

$condaactivateENV3

Page 24: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Python2.7.16Python3.7.4

Tomanagepythonmodules,itisusefultohavepippackageinstallationtoolonyoursystem.

We assume that you have a computer with python installed. The version ofpythonhowevermaynotbethenewestversion.Pleasecheckwith

whichversionofpythonyourun.Ifitisnotthenewestversion,weusepyenvtoinstallanewerversionsoyoudonoteffect thedefaultversionofpythonfromyoursystem.

3.2.1Disablingwrongpythoninstalls

Whileworkingwithstudentswehaveseenattimesthattheytakeotherclasseseither at universities or online that teach them how to program in python.Unfortunately, they seem to often ignore to teach you how to properly installPython.Ijustrecentlyhadastudentsthathadinstalledpython7differenttimesonhismacOSmachine,whileanotherstudenthad3differentinstallations,allofwhich conflicted with each other as they were not set up properly and thestudents did not even realize that theywere using Python incorrectly on theircomputerduetosetupissuesandconflictinglibraries.

Werecommendthatyouinspectifyouhaveafilessuchas~/.bashrcor~/.bashrc_profileinyourhomedirectoryandidentifyifitactivatesvariousversionsofpythononyourcomputer.Ifsoyoucouldtrytodeactivatethemwhileout-commentingthevarious versionswith the # character at the beginning of the line, start a newterminal and see if the terminal shell still works. Than you can follow ourinstructionsherewhileusinganinstallonpyenv.

3.2.2Managing2.7and3.7PythonVersionswithoutPyenv

Ifyouneedtohavemorethanonepythonversioninstalledanddonotwantorcanusepyenv,werecommendyoudownloadandinstallpython2.7.16and3.7.4frompython.org(https://www.python.org/downloads/)

$python--version

Page 25: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

YOucanthanuseeitherpython2orpython3toinvokethepythoninterpreter.

3.2.3ManagingMultiplePythonVersionswithPyenv

Python has several versions that are used by the community. This includesPython2andPython3,butalldifferentmanagementofthepythonlibraries.AseachOSmayhavetheirownversionofpythoninstalled.Itisrecommendedthatyounotmodifythatversion.Insteadyoumaywanttocreatealocalizedpythoninstallation that you as a user can modify. To do that we recommend pyenv.Pyenv allows users to switch between multiple versions of Python(https://github.com/yyuu/pyenv).Tosummarize:

userstochangetheglobalPythonversiononaper-userbasis;userstoenablesupportforper-projectPythonversions;easyversionchangeswithoutcomplexenvironmentvariablemanagement;tosearchinstalledcommandsacrossdifferentpythonversions;integratewithtox(https://tox.readthedocs.io/).

Toinstallpyenvonyoursystemyoucanusethecommand

Nowyoucaninstalldifferentpythonversionsonyoursystemsuchaspython2.7and3.7withafewcommands:

To automatically access them fromyour shellwe integrate them into bash byediting the bash configuration files.Make sure that on Linux you add to the~/.bashrcfileandonmacOStothefile~/.bash_profileor.zprofile.

$curlhttps://pyenv.run|bash

$pyenvinstall3.7.4

$pyenvinstall2.7.16

$pyenvvirtualenv3.7.4ENV3

$pyenvvirtualenv2.7.16ENV2

exportPYENV_ROOT="$HOME/.pyenv"

exportPATH="$PYENV_ROOT/bin:$PATH"

exportPYENV_VIRTUALENV_DISABLE_PROMPT=1

eval"$(pyenvinit-)"

eval"$(pyenvvirtualenv-init-)"

__pyenv_version_ps1(){

localret=$?;

output=$(pyenvversion-name)

if[[!-z$output]];then

echo-n"($output)"

fi

return$ret;

}

Page 26: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Werecommendthatyoudothistowardstheendofyourfile.ThanlookupourconveniencemethodstosetanALIASandinstallPython3.7.4viapyenv

Nextwerecommendtoupdatepip

3.2.3.1InstallationpyenvviaHomebrew

OnmacOSyoucaninstallpyenvalsoviaHomebrew.Beforeinstallinganythingon your computermake sure you have enough space.Use in the terminal thecommand:

whichgivesyour anoverviewofyour file system. Ifyoudonothaveenoughspace,pleasemakesureyoufreeupunusedfilesfromyourdrive.

In many occasions it is beneficial to use readline as it provides nice editingfeaturesfortheterminalandxzforcompletion.First,makesureyouhavexcodeinstalled:

OnMojaveyouwillgetanerrorthatzlibisnotinstalled.THisisduetothattheheaderfilesarenotproperlyinstalled.Todothisyoucansay

Next install homebrew, pyenv, pyenv-virtualenv and pyenv-virtualwrapper.Additionallyinstallreadlineandsomecompressiontools:

PS1="\$(__pyenv_version_ps1)${PS1}"

aliasENV2="pyenvactivateENV2"

aliasENV3="pyenvactivateENV3"

ENV3

$ENV2

$pipinstallpip-U

$ENV3

$pipinstallpip-U

$df-h

$xcode-select--install

$sudoinstaller-pkg/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg-target/

$/usr/bin/ruby-e"$(curl-fsSLhttps://raw.githubusercontent.com/Homebrew/install/master/install)"

$brewupdate

$brewinstallreadlinexz

Page 27: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Toinstallpyenvwithhomebrewexecuteintheterminal:

3.2.3.2InstallpyenvonUbuntu18.04

Thefollowingstepswillinstallpyenvinanewubuntu18.04distribution.

Start up a terminal and execute in the terminal the following commands.Werecommend that you do it one command at a time so you can observe if thecommandsucceeds:

Youcanalsoinstallpyenvusingcurlcommandinfollowingway:

Theninstallitsdependencies:

Now that you have installed pyenv it is not yet activated in your currentterminal.Theeasiestthingtodoistostartanewterminalandtypin:

Ifyouseearesponsepyenvisinstalledandyoucanproceedwiththenextsteps.

Pleaserememberwheneveryoumodify.bashrcor.bash_profileor.zprofileyouneedtostartanewterminal.

3.2.3.3Usingpyenv

3.2.3.3.1UsingpyenvtoInstallDifferentPythonVersions

brewinstallpyenvpyenv-virtualenvpyenv-virtualenvwrapper

$sudoapt-getupdate

$sudoapt-getinstallgitpython-pipmakebuild-essentiallibssl-dev

$sudoapt-getinstallzlib1g-devlibbz2-devlibreadline-devlibsqlite3-dev

$sudopipinstallvirtualenvwrapper

$gitclonehttps://github.com/yyuu/pyenv.git~/.pyenv

$gitclonehttps://github.com/pyenv/pyenv-virtualenv.git~/.pyenv/plugins/pyenv-virtualenv

$gitclonehttps://github.com/yyuu/pyenv-virtualenvwrapper.git~/.pyenv/plugins/pyenv-virtualenvwrapper

$echo'exportPYENV_ROOT="$HOME/.pyenv"'>>~/.bashrc

$echo'exportPATH="$PYENV_ROOT/bin:$PATH"'>>~/.bashrc

$curl-Lhttps://raw.githubusercontent.com/yyuu/pyenv-installer/master/bin/pyenv-installer|bash

$sudoapt-getupdate&&sudoapt-getupgrade

$sudoapt-getinstall-ymakebuild-essentiallibssl-dev

$sudoapt-getinstall-yzlib1g-devlibbz2-devlibreadline-devlibsqlite3-dev

$sudoapt-getinstall-ywgetcurlllvmlibncurses5-devgit

$whichpyenv

Page 28: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Pyenv provides a large list of different python versions. To see the entire listpleaseusethecommand:

However, forusweonlyneed toworryaboutpython2.7.16andpython3.7.4.You can now install different versions of python into your local environmentwiththefollowingcommands:

Youcansettheglobalpythondefaultversionwith:

Typethefollowingtodeterminewhichversionyouactivated:

Typethefollowingtodeterminewhichversionsyouhaveavailable:

Associate a specific environment namewith a certain python version, use thefollowingcommands:

In the example, ENV2 would represent python 2.7.16 while ENV3 wouldrepresentpython3.7.4.Oftenitiseasiertotypethealiasratherthantheexplicitversion.

3.2.3.3.2SwitchingEnvironments

After setting up the different environments, switching between them is noweasy.Simplyusethefollowingcommands:

Tomakeiteveneasier,youcanaddthefollowinglinestoyour.bash_profileoror

$pyenvinstall-l

$pyenvupdate

$pyenvinstall2.7.16

$pyenvinstall3.7.4

$pyenvglobal3.7.4

$pyenvversion

$pyenvversions

$pyenvvirtualenv2.7.16ENV2

$pyenvvirtualenv3.7.4ENV3

(2.7.16)$pyenvactivateENV2

(ENV2)$pyenvactivateENV3

(ENV3)$pyenvactivateENV2

(ENV2)$pyenvdeactivateENV2

(2.7.16)$

Page 29: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

.zprofilefile:

If you start a new terminal, you can switch between the different versions ofpythonsimplybytyping:

3.2.3.4UpdatingPythonVersionList

Pyenvmaintainslocallyalistofavailablepythonversions.Toseethelistusethecommand

Youwillseetheupdatedlist.

3.2.3.4.1UpdatingtoanewversionofPythonwithpyenv

Naturally python itself evolves and new versions will become available viapyenv.Tofacilitatesuchanewversionyouneedtofirstinstallitintopyenv.Letus assume you had an old version of python installed onto the ENV3environment.Thanyouneedtoexecutethefollowingsteps:

Withthepiinstallcommand,wemakesurewehavethenewestversionofpip.Incaseyougetanerror,youmayhavetoupdatexcodeasfollowsandtryagain:

AfteryouinstalledityoucanactivateitbytypingENV3.NaturallythisrequiresthatyouaddedittoyourbashenvironmentasdiscussedinSection1.1.1.8. �

3.2.4AnacondaandMinicondaandConda

aliasENV2="pyenvactivateENV2"

aliasENV3="pyenvactivateENV3"

$ENV2

$ENV3

$pyenvupdate

$pyenvinstall-l

$pyenvdeactivate

$pyenvuninstallENV3

$pyenvinstall3.7.4

$pyenvvirtualenv3.7.4ENV3

$ENV3

$pipinstallpip-U

xcode-select--install

Page 30: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

While inothers on the internet or inyour classesmayhave taught you touseanaconda,Wewillavoiditasithasseveraldisadvantagesforedevelopers.Thereasonforthisisthatitinstallsmanypackagesthatyouarelikelynottouse.InfactinstallinganacondaonyourVMwillwastespaceandtimeandyoushouldlookintootherinstalls.

Wedonotrecommendthatyouuseanacondaorminicondaasitmay

interferewithyourdefaultpythoninterpretersandsetup.

Pleasenotethatbeginnerstopythonshouldalwaysuseanacondaorminicondaonlyafter theyhave installedpyenvanduse it.For thisclassneitheranacondanorminicondaisrequired.Infactwedonotrecommendit.WekeepthissectionasweknowthatotherclassesatIUmayuseanaconda.Wearenotawareiftheseclassesteachyoutherightwaytoinstallit,withpyenv.

3.2.4.1Miniconda

This section about miniconda is experimental and has not beentested.Wearelookingforcontributorsthathelpcompletingit.Ifyouuseanacondaorminicondawerecommendtomanageitviapyenv.

Toinstallminicondayoucanusethefollowingcommands:

Toactivateuse:

Todeactivateuse:

3.2.4.2Anaconda

This section about anaconda is experimental and has not been

$mkdirana

$cdana

$pyenvinstallminiconda3-latest

$pyenvlocalminiconda3-latest

$pyenvactivateminiconda3-latest

$condacreate-nanaanaconda

$sourceactivateana

$sourcedeactivate

Page 31: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

tested.Wearelookingforcontributorsthathelpcompletingit.

Youcanaddanacondatoyourpyenvwiththefollowingcommands:

To switch more easily we recommend that you use the following in your.bash_profileor.zprofilefile:

Onceyouhavedonethisyoucaneasilyswitchtoanacondawiththecommand:

Terminologyinanacondacouldleadtoconfusion.Thusweliketopointoutthattheversionnumberofanacondaisunrelatedtothepythonversion.Furthermore,anaconda uses the term root not for the root user, but for the originatingdirectoryinwhichtheanacondaprogramisinstalled.

Incaseyouliketobuildyourowncondapackagesatalatertimewerecommendthatyouinstalltheconda-buildpackage:

Whenexecuting:

youwillseeaftertheinstallcompletedtheanacondaversionsinstalled:

Letusnowcreatevirtualenvforanaconda:

Toactivateityoucannowuse:

pyenvinstallanaconda3-4.3.1

aliasANA="pyenvactivateanaconda3-4.3.1"

$ANA

$condainstallconda-build

$pyenvversions

pyenvversions

system

2.7.16

2.7.16/envs/ENV2

3.7.4

3.7.4/envs/ENV3

ENV2

ENV3

*anaconda3-4.3.1(setbyPYENV_VERSIONenvironmentvariable)

$pyenvvirtualenvanaconda3-4.3.1ANA

$pyenvANA

Page 32: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

However, anacondamaymodify your .bashrc or .bash_profile or or .zprofile files andmay result in incompatibilitieswith other python versions. For this reasonwerecommendnot touseit. Ifyoufindwaystoget it toworkreliablywithotherversions,pleaseletusknowandweupdatethistutorial.

3.2.5Exercises

E.Python.Install.1:

InstallPython3.7.4

E.Python.Install.1:

Writeinstallationinstructionsforanoperatingsystemofyourchoiceandaddtothisdocumentation.

E.Python.Install.2:

Replicate the steps to install pyenv, so you can type in ENV2 andENV3inyourterminalstoswitchbetweenpython2and3.

E.Python.Install.3:

Why do you not want to use generally anaconda for cloudcomputing?Whenisitoktouseanaconda?

Page 33: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

4FIRSTSTEPS

4.1INTERACTIVEPYTHON☁�Pythoncanbeusedinteractively.Youcanentertheinteractivemodebyenteringtheinteractiveloopbyexecutingthecommand:

Youwillseesomethinglikethefollowing:

The >>> is the prompt used by the interpreter. This is similar to bash wherecommonly$isused.

Sometimes it is convenient to show the promptwhen illustrating an example.This is to provide some context for what we are doing. If you are followingalongyouwillnotneedtotypeintheprompt.

Thisinteractivepythonprocessdoesthefollowing:

readyourinputcommandsevaluateyourcommandprinttheresultofevaluationloopbacktothebeginning.

This is why you may see the interactive loop referred to as aREPL:Read-Evaluate-Print-Loop.

4.1.1REPL(ReadEvalPrintLoop)

There are many different types beyond what we have seen so far, such asdictionariess,lists,sets.Onehandywayofusingtheinteractivepythonistogetthetypeofavalueusingtype():

$python

$python

Python3.7.1(default,Nov242018,14:27:15)

[Clang10.0.0(clang-1000.11.45.5)]ondarwin

Type"help","copyright","credits"or"license"formoreinformation.

>>>

Page 34: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Youcanalsoaskforhelpaboutsomethingusinghelp():

Usinghelp()opensupahelpmessagewithinapager.Tonavigateyoucanusethespacebartogodownapagewtogoupapage,thearrowkeystogoup/downline-by-line,orqtoexit.

4.1.2Interpreter

Althoughtheinteractivemodeprovidesaconvenienttooltotestthingsoutyouwillseequicklythatforourclasswewanttousethepythoninterpreterfromthecommandline.Letusassumetheprogramiscalledprg.py.Onceyouhavewrittenitinthatfileyousimplycancallitwith

Itisimportanttonametheprogramwithmeaningfulnames.

4.1.3Python3FeaturesinPython2

In this coursewewant to be able to seamlessly switch between python 2 andpython3.Thusitisconvenientfromthestarttousepython3syntaxwhenitissupportedalsoinpython2.Oneofthemostusedfunctionsistheprintstatementthathasinpython3parentheses.Toenableitinpython2youjustneedtoimportthisfunction:

Thefirstoftheseimportsallowsustousetheprintfunctiontooutputtexttothescreen, instead of the print statement, which Python 2 uses. This is simply adesigndecisionthatbetterreflectsPython’sunderlyingphilosophy.

Otherfunctionssuchasthedivisionalsobehavedifferently.Thusweuse

>>>type(42)

<type'int'>

>>>type('hello')

<type'str'>

>>>type(3.14)

<type'float'>

>>>help(int)

>>>help(list)

>>>help(str)

$pythonprg.py

from__future__importprint_function,division

from__future__importdivision

Page 35: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Thisimportmakessurethatthedivisionoperatorbehavesinawayanewcomertothelanguagemightfindmoreintuitive.InPython2,division/isfloordivisionwhentheargumentsareintegers,meaningthatthefollowing

InPython3,division/isafloatingpointdivision,thus

4.2EDITORS☁�Thissectionismeanttogiveanoverviewofthepythoneditingtoolsneededfordoing for this course. There are many other alternatives, however, we dorecommendtousePyCharm.

4.2.1Pycharm

PyCharm is an Integrated Development Environment (IDE) used forprogramming in Python. It provides code analysis, a graphical debugger, anintegratedunittester,integrationwithgit.

Python8:56Pycharm

4.2.2Pythonin45minutes

AnadditionalcommunityvideoaboutthePythonprogramminglanguagethatwefoundontheinternet.Naturallytherearemanyalternativestothisvideo,butthevideoisprobablyagoodstart.ItalsousesPyCharmwhichwerecommend.

Python43:16PyCharm

Howmuchyouwanttounderstandofpythonisactuallyabituptoyou.Whileitsgood toknowclassesand inheritance,youmaybeable for thisclass togetawaywithoutusingit.However,wedorecommendthatyoulearnit.

PyCharmInstallation:Method1:PyCharmInstallationonubuntuusingumake

(5/2==2)isTrue

(5/2==2.5)isTrue

Page 36: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Once umake command is run, use the next command to install Pycharmcommunityedition:

IfyouwanttoremovePyCharminstalledusingumakecommand,usethis:

Method2:PyCharminstallationonubuntuusingPPA

PyCharm also has a Professional (paid) version which can be installed usingfollowingcommand:

Onceinstalled,gotoyourVMdashboardandsearchforPyCharm.

4.3GOOGLECOLAB☁�In thissectionwearegoingto introduceyou,howtouseGoogleColabtorundeeplearningmodels.

4.3.1IntroductiontoGoogleColab

ThisvideocontainstheintroductiontoGoogleColab.InthissectionwewillbelearninghowtostartaGoogleColabproject.

4.3.2ProgramminginGoogleColab

Inthisvideowewilllearnhowtocreateasimple,ColabNotebook.

sudoadd-apt-repositoryppa:ubuntu-desktop/ubuntu-make

sudoapt-getupdate

sudoapt-getinstallubuntu-make

umakeidepycharm

umake-ridepycharm

sudoadd-apt-repositoryppa:mystic-mirage/pycharm

sudoapt-getupdate

sudoapt-getinstallpycharm-community

sudoapt-getinstallpycharm

Page 37: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

RequiredInstallations

4.3.3BenchamrkinginGoogleColabwithCloudmesh

In this video we learn how to do a basic benchmark with Cloudmesh tools.CloudmeshStopWatchwillbeusedinthistutorial.

RequiredInstallations

pipinstallnumpy

pipinstallnumpy

pipinstallcloudmesh-installer

pipinstallcloudmesh-common

Page 38: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

5LANGUAGE

5.1LANGUAGE☁�

5.1.1StatementsandStrings

LetusexplorethesyntaxofPythonwhilestartingwithaprintstatement

Thiswillprintontheterminal

The print function was given a string to process. A string is a sequence ofcharacters. A character can be a alphabetic (A through Z, lower and uppercase), numeric (any of the digits), white space (spaces, tabs, newlines, etc),syntacticdirectives(comma,colon,quotation,exclamation,etc),andsoforth.Astringisjustasequenceofthecharacterandtypicallyindicatedbysurroundingthecharactersindoublequotes.

StandardoutputisdiscussedintheSectionLinux.

So, what happened when you pressed Enter? The interactive Python programreadthelineprint("HelloworldfromPython!"),splititintotheprintstatementandthe"HelloworldfromPython!"string,andthenexecutedtheline,showingyoutheoutput.

5.1.2Comments

Commentsinpythonarefollowedbya#:

5.1.3Variables

Youcanstoredataintoavariabletoaccessitlater.Forinstance:

print("HelloworldfromPython!")

HelloworldfromPython!

#Thisisacomment

hello='HelloworldfromPython!'

print(hello)

Page 39: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Thiswillprintagain

5.1.4DataTypes

5.1.4.1Booleans

Aboolean is a value that can have the values True or False. You can combinebooleanswithbooleanoperatorssuchasandandor

5.1.4.2Numbers

Theinteractiveinterpretercanalsobeusedasacalculator.Forinstance,saywewantedtocomputeamultipleof21:

Wesawheretheprintstatementagain.Wepassedintheresultoftheoperation21 * 2.An integer (or int) in Python is a numeric valuewithout a fractionalcomponent(thosearecalledfloatingpointnumbers,orfloatforshort).

Themathematicaloperators compute the relatedmathematicaloperation to theprovidednumbers.Someoperatorsare:

Operator Function* multiplication/ division+ addition- subtraction** exponent

Exponentiationxyiswrittenasx**yisxtotheythpower.

HelloworldfromPython!

print(TrueandTrue)#True

print(TrueandFalse)#False

print(FalseandFalse)#False

print(TrueorTrue)#True

print(TrueorFalse)#True

print(FalseorFalse)#False

print(21*2)#42

Page 40: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Youcancombinefloatsandints:

Notethatoperatorprecedenceisimportant.Usingparenthesistoindicateaffecttheorderofoperationsgivesadifferenceresults,asexpected:

5.1.5ModuleManagement

AmoduleallowsyoutologicallyorganizeyourPythoncode.Groupingrelatedcodeintoamodulemakesthecodeeasiertounderstandanduse.AmoduleisaPythonobjectwitharbitrarilynamedattributesthatyoucanbindandreference.Amodule is a file consistingofPython code.Amodule candefine functions,classesandvariables.Amodulecanalsoincluderunnablecode.

5.1.5.1ImportStatement

Whentheinterpreterencountersanimportstatement,itimportsthemoduleifthemoduleispresentinthesearchpath.Asearchpathisalistofdirectoriesthattheinterpreter searches before importing a module. The from…import StatementPython’s fromstatement letsyou importspecificattributes fromamodule intothecurrentnamespace.Itispreferredtouseforeachimportitsownlinesuchas:

Whentheinterpreterencountersanimportstatement,itimportsthemoduleifthemoduleispresentinthesearchpath.Asearchpathisalistofdirectoriesthattheinterpretersearchesbeforeimportingamodule.

5.1.5.2Thefrom…importStatement

Python’s fromstatement letsyou importspecificattributes fromamodule intothecurrentnamespace.Thefrom…importhasthefollowingsyntax:

print(3.14*42/11+4-2)#13.9890909091

print(2**3)#8

print(3.14*(42/11)+4-2)#11.42

print(1+2*3-4/5.0)#6.2

print((1+2)*(3-4)/5.0)#-0.6

importnumpy

importmatplotlib

fromdatetimeimportdatetime

Page 41: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

5.1.6DateTimeinPython

Thedatetimemodulesuppliesclassesformanipulatingdatesandtimesinbothsimpleandcomplexways.Whiledateandtimearithmeticissupported,thefocusof the implementation is on efficient attribute extraction for output formattingand manipulation. For related functionality, see also the time and calendarmodules.

The import Statement You can use any Python source file as a module byexecutinganimportstatementinsomeotherPythonsourcefile.

Thismoduleoffersagenericdate/timestringparserwhichisabletoparsemostknownformatstorepresentadateand/ortime.

pandas is an open source Python library for data analysis that needs to beimported.

Createastringvariablewiththeclassstarttime

Convertthestringtodatetimeformat

Creatingalistofstringsasdates

ConvertClass_datesstringsintodatetimeformatandsavethelistintovariablea

Useparse()toattempttoauto-convertcommonstringformats.Parsermustbea

fromdatetimeimportdatetime

fromdateutil.parserimportparse

importpandasaspd

fall_start='08-21-2018'

datetime.strptime(fall_start,'%m-%d-%Y')\#

datetime.datetime(2017,8,21,0,0)

class_dates=[

'8/25/2017',

'9/1/2017',

'9/8/2017',

'9/15/2017',

'9/22/2017',

'9/29/2017']

a=[datetime.strptime(x,'%m/%d/%Y')forxinclass_dates]

Page 42: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

stringorcharacterstream,notlist.

Useparse()oneveryelementoftheClass_datesstring.

Useparse,butdesignatethatthedayisfirst.

Create adataframe.ADataFrame is a tabulardata structure comprisedof rowsand columns, akin to a spreadsheet, database table. DataFrame as a group ofSeriesobjectsthatshareanindex(thecolumnnames).

Convertdf[`date`]fromstringtodatetime

5.1.7ControlStatements

5.1.7.1Comparison

parse(fall_start)#datetime.datetime(2017,8,21,0,0)

[parse(x)forxinclass_dates]

#[datetime.datetime(2017,8,25,0,0),

#datetime.datetime(2017,9,1,0,0),

#datetime.datetime(2017,9,8,0,0),

#datetime.datetime(2017,9,15,0,0),

#datetime.datetime(2017,9,22,0,0),

#datetime.datetime(2017,9,29,0,0)]

parse(fall_start,dayfirst=True)

#datetime.datetime(2017,8,21,0,0)

importpandasaspd

data={

'dates':[

'8/25/201718:47:05.069722',

'9/1/201718:47:05.119994',

'9/8/201718:47:05.178768',

'9/15/201718:47:05.230071',

'9/22/201718:47:05.230071',

'9/29/201718:47:05.280592'],

'complete':[1,0,1,1,0,1]}

df=pd.DataFrame(

data,

columns=['dates','complete'])

print(df)

#datescomplete

#08/25/201718:47:05.0697221

#19/1/201718:47:05.1199940

#29/8/201718:47:05.1787681

#39/15/201718:47:05.2300711

#49/22/201718:47:05.2300710

#59/29/201718:47:05.2805921

importpandasaspd

pd.to_datetime(df['dates'])

#02017-08-2518:47:05.069722

#12017-09-0118:47:05.119994

#22017-09-0818:47:05.178768

#32017-09-1518:47:05.230071

#42017-09-2218:47:05.230071

#52017-09-2918:47:05.280592

#Name:dates,dtype:datetime64[ns]

Page 43: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Computer programs do not only execute instructions. Occasionally, a choiceneedstobemade.Suchasachoiceisbasedonacondition.Pythonhasseveralconditionaloperators:

Operator Function> greaterthan< smallerthan== equals!= isnot

Conditionsarealwayscombinedwithvariables.Aprogramcanmakeachoiceusingtheifkeyword.Forexample:

In this example,You guessed correctly! will only be printed if the variable xequals to four. Python can also executemultiple conditions using the elif andelsekeywords.

5.1.7.2Iteration

To repeat code, the for keyword can be used. For example, to display thenumbersfrom1to10,wecouldwritesomethinglikethis:

Thesecondargument to range,11, isnot inclusive,meaning that the loopwillonlygetto10beforeitfinishes.Pythonitselfstartscountingfrom0,sothiscodewillalsowork:

x=int(input("Guessx:"))

ifx==4:

print('Correct!')

x=int(input("Guessx:"))

ifx==4:

print('Correct!')

elifabs(4-x)==1:

print('Wrong,butclose!')

else:

print('Wrong,wayoff!')

foriinrange(1,11):

print('Hello!')

foriinrange(0,10):

print(i+1)

Page 44: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Infact,therangefunctiondefaultstostartingvalueof0,soitisequivalentto:

Wecanalsonestloopsinsideeachother:

In this case we have two nested loops. The code will iterate over the entirecoordinaterange(0,0)to(9,9)

5.1.8Datatypes

5.1.8.1Lists

see:https://www.tutorialspoint.com/python/python_lists.htm

Lists inPythonareorderedsequencesofelements,whereeachelementcanbeaccessedusinga0-basedindex.

Todefinealist,yousimplylistitselementsbetweensquarebrackets‘[]’:

Youcanalsouseanegative index ifyouwant tostartcountingelementsfromthe endof the list.Thus, the last element has index -1, the second before lastelementhasindex-2andsoon:

Pythonalsoallowsyoutotakewholeslicesofthelistbyspecifyingabeginningandendofthesliceseparatedbyacolon

foriinrange(10):

print(i+1)

foriinrange(0,10):

forjinrange(0,10):

print(i,'',j)

names=[

'Albert',

'Jane',

'Liz',

'John',

'Abby']

#accessthefirstelementofthelist

names[0]

#'Albert'

#accessthethirdelementofthelist

names[2]

#'Liz'

#accessthelastelementofthelist

names[-1]

#'Abby'

#accessthesecondlastelementofthelist

names[-2]

#'John'

Page 45: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Asyoucanseefromtheexample,thestartingindexinthesliceisinclusiveandtheendingone,exclusive.

Pythonprovidesavarietyofmethodsformanipulatingthemembersofalist.

Youcanaddelementswithappend’:

Asyoucansee,theelementsinalistneednotbeunique.

Mergetwolistswith‘extend’:

Findtheindexofthefirstoccurrenceofanelementwith‘index’:

Removeelementsbyvaluewith‘remove’:

Removeelementsbyindexwith‘pop’:

Noticethatpopreturnstheelementbeingremoved,whileremovedoesnot.

Ifyouarefamiliarwithstacksfromotherprogramminglanguages,youcanuseinsertand‘pop’:

#themiddleelements,excludingfirstandlast

names[1:-1]

#['Jane','Liz','John']

names.append('Liz')

names

#['Albert','Jane','Liz',

#'John','Abby','Liz']

names.extend(['Lindsay','Connor'])

names

#['Albert','Jane','Liz','John',

#'Abby','Liz','Lindsay','Connor']

names.index('Liz')\#2

names.remove('Abby')

names

#['Albert','Jane','Liz','John',

#'Liz','Lindsay','Connor']

names.pop(1)

#'Jane'

names

#['Albert','Liz','John',

#'Liz','Lindsay','Connor']

names.insert(0,'Lincoln')

names

#['Lincoln','Albert','Liz',

#'John','Liz','Lindsay','Connor']

names.pop()

Page 46: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

ThePythondocumentationcontainsafulllistoflistoperations.

To go back to the range function you used earlier, it simply creates a list ofnumbers:

5.1.8.2Sets

Pythonlistscancontainduplicatesasyousawpreviously:

Whenwedonotwantthistobethecase,wecanuseaset:

Keepinmindthatthesetisanunorderedcollectionofobjects,thuswecannotaccessthembyindex:

However,wecanconvertasettoalisteasily:

Noticethatinthiscase,theorderofelementsinthenewlistmatchestheorderinwhichtheelementsweredisplayedwhenwecreatetheset.Wehadset(['Lincoln','John','Albert','Liz','Lindsay'])

andnowwehave['Lincoln','John','Albert','Liz','Lindsay'])

#'Connor'

names

#['Lincoln','Albert','Liz',

#'John','Liz','Lindsay']

range(10)

#[0,1,2,3,4,5,6,7,8,9]

range(2,10,2)

#[2,4,6,8]

names=['Albert','Jane','Liz',

'John','Abby','Liz']

unique_names=set(names)

unique_names

#set(['Lincoln','John','Albert','Liz','Lindsay'])

unique_names[0]

#Traceback(mostrecentcalllast):

#File"<stdin>",line1,in<module>

#TypeError:'set'objectdoesnotsupportindexing

unique_names=list(unique_names)

unique_names[`Lincoln',`John',`Albert',`Liz',`Lindsay']

unique_names[0]

#`Lincoln'

Page 47: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

You should not assume this is the case in general. That is, do not make anyassumptionsabouttheorderofelementsinasetwhenitisconvertedtoanytypeofsequentialdatastructure.

You can change a set’s contents using the add, remove and update methodswhich correspond to the append, remove and extend methods in a list. Inaddition to these, set objects support the operations youmay be familiarwithfrommathematicalsets:union,intersection,difference,aswellasoperations tocheck containment. You can read about this in the Python documentation forsets.

5.1.8.3RemovalandTestingforMembershipinSets

Oneimportantadvantageofasetoveralististhataccesstoelementsisfast. Ifyou are familiarwith different data structures fromaComputerScience class,thePython list is implementedby an array,while the set is implementedby ahashtable.

Wewilldemonstratethiswithanexample.Letussaywehavealistandasetofthesamenumberofelements(approximately100thousand):

Wewill use the timeit Pythonmodule to time 100 operations that test for theexistenceofamemberineitherthelistorset:

The exact duration of the operations on your systemwill be different, but thetake away will be the same: searching for an element in a set is orders ofmagnitudefasterthaninalist.Thisisimportanttokeepinmindwhenyouworkwithlargeamountsofdata.

5.1.8.4Dictionaries

importsys,random,timeit

nums_set=set([random.randint(0,sys.maxint)for_inrange(10**5)])

nums_list=list(nums_set)

len(nums_set)

#100000

timeit.timeit('random.randint(0,sys.maxint)innums',

setup='importrandom;nums=%s'%str(nums_set),number=100)

#0.0004038810729980469

timeit.timeit('random.randint(0,sys.maxint)innums',

setup='importrandom;nums=%s'%str(nums_list),number=100)

#0.398054122924804

Page 48: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Oneoftheveryimportantdatastructuresinpythonisadictionaryalsoreferredtoasdict.

Adictionaryrepresentsakeyvaluestore:

Aconvenientfortoprintbynamedattributesis

Thisformofprintingwiththeformatstatementandareferencetodataincreasesreadabilityoftheprintstatements.

Youcandeleteelementswiththefollowingcommands:

Youcaniterateoveradict:

5.1.8.5DictionaryKeysandValues

Youcanretrieveboth thekeysandvaluesofadictionaryusing thekeys()andvalues()methodsofthedictionary,respectively:

person={

'Name':'Albert',

'Age':100,

'Class':'Scientist'

}

print("person['Name']:",person['Name'])

#person['Name']:Albert

print("person['Age']:",person['Age'])

#person['Age']:100

print("{Name}{Age}'.format(**data))

delperson['Name']#removeentrywithkey'Name'

#person

#{'Age':100,'Class':'Scientist'}

person.clear()#removeallentriesindict

#person

#{}

delperson#deleteentiredictionary

#person

#Traceback(mostrecentcalllast):

#File"<stdin>",line1,in<module>

#NameError:name'person'isnotdefined

person={

'Name':'Albert',

'Age':100,

'Class':'Scientist'

}

foriteminperson:

print(item,person[item])

#Age100

#NameAlbert

#ClassScientist

person.keys()#['Age','Name','Class']

person.values()#[100,'Albert','Scientist']

Page 49: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Bothmethodsreturnlists.Notice,however,thattheorderinwhichtheelementsappear in the returned lists (Age, Name, Class) is different from the order inwhichwe listed theelementswhenwedeclared thedictionary initially (Name,Age,Class).Itisimportanttokeepthisinmind:

Youcannotmakeanyassumptionsabouttheorderinwhichtheelementsofadictionarywillbe returnedby thekeys()andvalues()methods.

However,youcanassumethatifyoucallkeys()andvalues()insequence,theorderof elements will at least correspond in both methods. In the example Agecorrespondsto100,Nameto Albert,andClasstoScientist,andyouwillobservethe same correspondence in general as long as keys() and values() are called onerightaftertheother.

5.1.8.6CountingwithDictionaries

Oneapplicationofdictionariesthatfrequentlycomesupiscountingtheelementsinasequence.Forexample,saywehaveasequenceofcoinflips:

Theactual listdie_rollswill likelybedifferentwhenyouexecute thisonyourcomputersincetheoutcomesofthedierollsarerandom.

Tocomputetheprobabilitiesofheadsandtails,wecouldcounthowmanyheadsandtailswehaveinthelist:

In addition to how we use the dictionary counts to count the elements of

importrandom

die_rolls=[

random.choice(['heads','tails'])for_inrange(10)

]

#die_rolls

#['heads','tails','heads',

#'tails','heads','heads',

'tails','heads','heads','heads']

counts={'heads':0,'tails':0}

foroutcomeincoin_flips:

assertoutcomeincounts

counts[outcome]+=1

print('Probabilityofheads:%.2f'%(counts['heads']/len(coin_flips)))

#Probabilityofheads:0.70

print('Probabilityoftails:%.2f'%(counts['tails']/sum(counts.values())))

#Probabilityoftails:0.30

Page 50: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

coin_flips,noticeacouplethingsaboutthisexample:

1. We used the assert outcome in counts statement. The assert statement inPython allows you to easily insert debugging statements in your code tohelp you discover errors more quickly. assert statements are executedwhenevertheinternalPython__debug__variableissettoTrue,whichisalwaysthecaseunlessyoustartPythonwiththe-OoptionwhichallowsyoutorunoptimizedPython.

2. When we computed the probability of tails, we used the built-in sumfunction,whichallowedus toquickly find the totalnumberof coin flips.sumisoneofmanybuilt-infunctionyoucanreadabouthere.

5.1.9Functions

Youcanreusecodebyputtingitinsideafunctionthatyoucancallinotherpartsofyourprograms.Functionsarealsoagoodwayofgroupingcodethatlogicallybelongs together in one coherentwhole.A function has a unique name in theprogram.Onceyoucallafunction,itwillexecuteitsbodywhichconsistsofoneormorelinesofcode:

The def keyword tells Python we are defining a function. As part of thedefinition,wehavethefunctionname,check_triangle,andtheparametersofthefunction–variablesthatwillbepopulatedwhenthefunctioniscalled.

Wecallthefunctionwitharguments4,5and6,whicharepassedinorderintotheparametersa,bandc.Afunctioncanbecalledseveral timeswithvaryingparameters.Thereisnolimittothenumberoffunctioncalls.

It is also possible to store the output of a function in a variable, so it can bereused.

defcheck_triangle(a,b,c):

return\

a<b+canda>abs(b-c)and\

b<a+candb>abs(a-c)and\

c<a+bandc>abs(a-b)

print(check_triangle(4,5,6))

defcheck_triangle(a,b,c):

return\

a<b+canda>abs(b-c)and\

b<a+candb>abs(a-c)and\

c<a+bandc>abs(a-b)

Page 51: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

5.1.10Classes

Aclass is an encapsulation of data and the processes thatwork on them.Thedata is represented inmember variables, and the processes are defined in themethodsoftheclass(methodsarefunctionsinsidetheclass).Forexample,let’sseehowtodefineaTriangleclass:

Python has full object-oriented programming (OOP) capabilities, however wecannotcoveralloftheminthissection,soifyouneedmoreinformationpleaserefertothePythondocsonclassesandOOP.

5.1.11Modules

Nowwritethissimpleprogramandsaveit:

Asacheck,makesurethefilecontainstheexpectedcontentsonthecommandline:

result=check_triangle(4,5,6)

print(result)

classTriangle(object):

def__init__(self,length,width,

height,angle1,angle2,angle3):

ifnotself._sides_ok(length,width,height):

print('Thesidesofthetriangleareinvalid.')

elifnotself._angles_ok(angle1,angle2,angle3):

print('Theanglesofthetriangleareinvalid.')

self._length=length

self._width=width

self._height=height

self._angle1=angle1

self._angle2=angle2

self._angle3=angle3

def_sides_ok(self,a,b,c):

return\

a<b+canda>abs(b-c)and\

b<a+candb>abs(a-c)and\

c<a+bandc>abs(a-b)

def_angles_ok(self,a,b,c):

returna+b+c==180

triangle=Triangle(4,5,6,35,65,80)

print("Helloworld!")

$cathello.py

print("Helloworld!")

Page 52: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Toexecuteyourprogrampassthefileasaparametertothepythoncommand:

Files in which Python code is stored are calledmodules. You can execute aPythonmoduleformthecommandlinelikeyoujustdid,oryoucanimportitinotherPythoncodeusingtheimportstatement.

Let us write a more involved Python program that will receive as input thelengths of the three sides of a triangle, andwill outputwhether they define avalidtriangle.Atriangleisvalidifthelengthofeachsideislessthanthesumofthelengthsoftheothertwosidesandgreaterthanthedifferenceofthelengthsoftheothertwosides.:

Assumingwesavetheprograminafilecalledcheck_triangle.py,wecanrunitlikeso:

Letusbreakthisdownabit.

1. Weare importing theprint_function anddivisionmodules frompython3likewedidearlierinthissection.It’sagoodideatoalwaysincludetheseinyourprograms.

2. We’vedefinedabooleanexpressionthattellsusifthesidesthatwereinputdefine a valid triangle. The result of the expression is stored in the

$pythonhello.py

Helloworld!

"""Usage:check_triangle.py[-h]LENGTHWIDTHHEIGHT

Checkifatriangleisvalid.

Arguments:

LENGTHThelengthofthetriangle.

WIDTHThewidthofthetraingle.

HEIGHTTheheightofthetriangle.

Options:

-h--help

"""

fromdocoptimportdocopt

if__name__=='__main__':

arguments=docopt(__doc__)

a,b,c=int(arguments['LENGTH']),

int(arguments['WIDTH']),

int(arguments['HEIGHT'])

valid_triangle=\

a<b+canda>abs(b-c)and\

b<a+candb>abs(a-c)and\

c<a+bandc>abs(a-b)

print('Trianglewithsides%d,%dand%disvalid:%r'%(

a,b,c,valid_triangle

))

$pythoncheck_triangle.py456

Trianglewithsides4,5and6isvalid:True

Page 53: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

valid_trianglevariable.insidearetrue,andFalseotherwise.3. We’ve used the backslash symbol \ to format are code nicely. The

backslash simply indicates that the current line is being continued on thenextline.

4. Whenweruntheprogram,wedothecheckif__name__=='__main__'. __name__ is aninternal Python variable that allows us to tell whether the current file isbeingrunfromthecommandline(value__name__),orisbeingimportedbyamodule (the value will be the name of the module). Thus, with thisstatementwe’rejustmakingsuretheprogramisbeingrunbythecommandline.

5. Weareusing thedocoptmodule tohandlecommand linearguments.Theadvantageofusing thismodule is that itgeneratesausagehelpstatementfor theprogramandenforces command line arguments automatically.Allofthisisdonebyparsingthedocstringatthetopofthefile.

6. Intheprintfunction,weareusingPython’sstringformattingcapabilitiestoinsertvaluesintothestringwearedisplaying.

5.1.12LambdaExpressions

As oppose to normal functions in Python which are defined using the def

keyword, lambda functions in Python are anonymous functions which do nothaveanameandaredefinedusing the lambda keyword.Thegeneric syntaxof alambda function is in form oflambdaarguments:expression, as shown in the followingexample:

Asyoucouldprobablyguess,theresultis:

Nowconsiderthefollowingexamples:

The power2 function defined in the expression, is equivalent to the followingdefinition:

greeter=lambdax:print('Hello%s!'%x)

print(greeter('Albert'))

HelloAlbert!

power2=lambdax:x**2

defpower2(x):

returnx**2

Page 54: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Lambdafunctionsareusefulforwhenyouneedafunctionforashortperiodoftime.Note that theycanalsobeveryusefulwhenpassedasanargumentwithotherbuilt-infunctionsthattakeafunctionasanargument,e.g.filter()andmap().Inthenextexampleweshowhowalambdafunctioncanbecombinedwiththefilerfunction. Consider the array all_names which contains five words that rhymetogether.Wewanttofilterthewordsthatcontainthewordname.Toachievethis,wepassthefunctionlambdax:'name'inxas thefirstargument.This lambdafunctionreturns True if the word name exists as a sub-string in the string x. The secondargumentoffilterfunctionisthearrayofnames,i.e.all_names.

Asyoucansee,thenamesaresuccessfullyfilteredasweexpected.

InPython3,filterfunctionreturnsafilterobjectortheiteratorwhichgetslazilyevaluatedwhichmeans neitherwe can access the elements of the filter objectwithindexnorwecanuselen()tofindthelengthofthefilterobject.

InPython,wecanhaveasmallusuallyasinglelineranonymousfunctioncalledLambda functionwhich canhave anynumberof arguments just like anormalfunctionbutwithonlyoneexpressionwithnoreturnstatement.Theresultofthisexpressioncanbeappliedtoavalue.

BasicSyntax:

Foranexample:afunctioninpython

SamefunctioncanwrittenasLambdafunction.Thisfunctionnamedasmultiplyishaving2argumentsandreturnstheirmultiplication.

all_names=['surname','rename','nickname','acclaims','defame']

filtered_names=list(filter(lambdax:'name'inx,all_names))

print(filtered_names)

#['surname','rename','nickname']

list_a=[1,2,3,4,5]

filter_obj=filter(lambdax:x%2==0,list_a)

#Convertthefilerobjtoalist

even_num=list(filter_obj)

print(even_num)

#Output:[2,4]

lambdaarguments:expression

defmultiply(a,b):

returna*b

#callthefunction

multiply(3*5)#outputs:15

Page 55: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Lambdaequivalentforthisfunctionwouldbe:

Here a and b are the 2 arguments and a*b is the expression whose value isreturnedasanoutput.

Alsowedon’tneedtoassignLambdafunctiontoavariable.

Lambdafunctionsaremostlypassedasparametertoafunctionwhichexpectsafunctionobjectslikeinmaporfilter.

5.1.12.1map

Thebasicsyntaxofthemapfunctionis

mapfunctionsexpectsafunctionobjectandanynumberofiterableslikelistordictionary.Itexecutesthefunction_objectforeachelementinthesequenceandreturnsalistoftheelementsmodifiedbythefunctionobject.

Example:

IfwewanttowritesamefunctionusingLambda

5.1.12.2dictionary

Now,letsseehowwecaninterateoveradictionaryusingmapandlambdaLetssaywehaveadictionaryobject

multiply=Lambdaa,b:a*b

print(multiply(3,5))

#outputs:15

(lambdaa,b:a*b)(3*5)

map(function_object,iterable1,iterable2,...)

defmultiply(x):

returnx*2

map(multiply2,[2,4,6,8])

#Output[4,8,12,16]

map(lambdax:x*2,[2,4,6,8])

#Output[4,8,12,16]

dict_movies=[

Page 56: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Wecan iterate over this dictionary and read the elements of it usingmap andlambdafunctionsinfollowingway:

In Python3, map function returns an iterator or map object which gets lazilyevaluatedwhichmeans neitherwe can access the elements of themap objectwith indexnorwe canuse len() to find the lengthof themapobject.We canforceconvertthemapoutputi.e.themapobjecttolistasshownnext:

5.1.13Iterators

InPython, an iteratorprotocol isdefinedusing twomethods: __iter()__ and next().The former returns the iterator object and latter returns the next element of asequence.Someadvantagesofiteratorsareasfollows:

ReadabilitySupportssequencesofinfinitelengthSavingresources

Thereareseveralbuilt-inobjects inPythonwhich implement iteratorprotocol,e.g.string,list,dictionary.Inthefollowingexample,wecreateanewclassthatfollowstheiteratorprotocol.Wethenusetheclasstogeneratelog2ofnumbers:

{'movie':'avengers','comic':'marvel'},

{'movie':'superman','comic':'dc'}]

map(lambdax:x['movie'],dict_movies)#Output:['avengers','superman']

map(lambdax:x['comic'],dict_movies)#Output:['marvel','dc']

map(lambdax:x['movie']=="avengers",dict_movies)

#Output:[True,False]

map_output=map(lambdax:x*2,[1,2,3,4])

print(map_output)

#Output:mapobject:<mapobjectat0x04D6BAB0>

list_map_output=list(map_output)

print(list_map_output)#Output:[2,4,6,8]

frommathimportlog2

classLogTwo:

"Implementsaniteratoroflogtwo"

def__init__(self,last=0):

self.last=last

def__iter__(self):

self.current_num=1

returnself

def__next__(self):

ifself.current_num<=self.last:

result=log2(self.current_num)

self.current_num+=1

returnresult

Page 57: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

As you can see,we first create an instance of the class and assign its __iter()__functiontoavariablecalledi.Thenbycallingthenext()functionfourtimes,wegetthefollowingoutput:

Asyouprobablynoticed,thelinesarelog2()of1,2,3,4respectively.

5.1.14Generators

Before we go to Generators, please understand Iterators. Generators are alsoIteratorsbuttheycanonlybeinteratedoveronce.ThatsbecauseGeneratorsdonotstorethevaluesinmemoryinsteadtheygeneratethevaluesonthego.Ifwewanttoprintthosevaluesthenwecaneithersimplyiterateoverthemorusetheforloop.

5.1.14.1Generatorswithfunction

For example:we have a function named asmultiplyBy10which prints all theinputnumbersmultipliedby10.

Now,ifwewanttouseGeneratorsherethenwewillmakefollowingchanges.

else:

raiseStopIteration

L=LogTwo(5)

i=iter(L)

print(next(i))

print(next(i))

print(next(i))

print(next(i))

$pythoniterator.py

0.0

1.0

1.584962500721156

2.0

defmultiplyBy10(numbers):

result=[]

foriinnumbers:

result.append(i*10)

returnresult

new_numbers=multiplyBy10([1,2,3,4,5])

printnew_numbers#Output:[10,20,30,40,50]

defmultiplyBy10(numbers):

foriinnumbers:

yield(i*10)

new_numbers=multiplyBy10([1,2,3,4,5])

Page 58: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

InGenerators,weuseyield() function inplaceof return().Sowhenwe try toprintnew_numberslistnow,itjustprintsGeneratorsobject.ThereasonforthisisbecauseGeneratorsdontholdanyvalue inmemory, ityieldsone resultatatime.Soessentiallyitisjustwaitingforustoaskforthenextresult.Toprintthenextresultwecanjustsayprintnext(new_numbers),sohowitisworkingisitsreadingthefirstvalueandsquaringitandyieldingoutvalue1.Alsointhiscasewecanjustprintnext(new_numbers)5timestoprintallnumbersandifwedoitfor6thtimethenwewillgetanerrorStopIterationwhichmeannsGeneratorshasexausteditslimitandithasno6thelementtoprint.

5.1.14.2Generatorsusingforloop

Ifwenowwanttoprintthecompletelistofsquaredvaluesthenwecanjustdo:

Theoutputwillbe:

5.1.14.3GeneratorswithListComprehension

Python has something called List Comprehension, ifwe use this thenwe canreplacethecompletefunctiondefwithjust:

Here the point to note is square brackets [] in line 1 is very important. Ifwechangeitto()thenagainwewillstartgettingGeneratorsobject.

printnew_numbers#Output:Generatorsobject

printnext(new_numbers)#Output:1

defmultiplyBy10(numbers):

foriinnumbers:

yield(i*10)

new_numbers=multiplyBy10([1,2,3,4,5])

fornuminnew_numbers:

printnum

10

20

30

40

50

new_numbers=[x*10forxin[1,2,3,4,5]]

printnew_numbers#Output:[10,20,30,40,50]

new_numbers=(x*10forxin[1,2,3,4,5])

printnew_numbers#Output:Generatorsobject

Page 59: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Wecanget the individualelementsagain fromGenerators ifwedoa for loopovernew_numberslikewedidpreviously.Alternatively,wecanconvertitintoalistandthenprintit.

Buthereifweconvertthisintoalistthenwelooseonperformance,whichwewilljustseenext.

5.1.14.4WhytouseGenerators?

Generators are betterwithPerformance because it does not hold the values inmemoryandherewiththesmallexamplesweprovideitsnotabigdealsinceweare dealing with small amount of data but just consider a scenario where therecords are in millions of data set. And if we try to convert millions of dataelements into a list then that will definitely make an impact on memory andperformancebecauseeverythingwillinmemory.

Lets see an example on how Generators help in Performance. First, withoutGenerators, normal function taking 1 million record and returns theresult[people]for1million.

new_numbers=(x*10forxin[1,2,3,4,5])

printlist(new_numbers)#Output:[10,20,30,40,50]

names=['John','Jack','Adam','Steve','Rick']

majors=['Math',

'CompScience',

'Arts',

'Business',

'Economics']

#printsthememorybeforewerunthefunction

memory=mem_profile.memory_usage_resource()

print(f'Memory(Before):{memory}Mb')

defpeople_list(people):

result=[]

foriinrange(people):

person={

'id':i,

'name':random.choice(names),

'major':randon.choice(majors)

}

result.append(person)

returnresult

t1=time.clock()

people=people_list(10000000)

t2=time.clock()

#printsthememoryafterwerunthefunction

memory=mem_profile.memory_usage_resource()

print(f'Memory(After):{memory}Mb')

print('Took{time}seconds'.format(time=t2-t1))

#Output

Memory(Before):15Mb

Page 60: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

I am justgivingapproximatevalues tocompare itwithnext executionbutwejusttrytorunitwewillseeaseriousconsumptionofmemorywithgoodamountoftimetaken.

Now after running the same code using Generators, wewill see a significantamount of performanceboostwith alomost 0Seconds.And the reasonbehindthisisthatincaseofGenerators,wedonotkeepanythinginmemorysosystemjustreads1atatimeandyieldsthat.

Memory(After):318Mb

Took1.2seconds

names=['John','Jack','Adam','Steve','Rick']

majors=['Math',

'CompScience',

'Arts',

'Business',

'Economics']

#printsthememorybeforewerunthefunction

memory=mem_profile.memory_usage_resource()

print(f'Memory(Before):{memory}Mb')

defpeople_generator(people):

foriinxrange(people):

person={

'id':i,

'name':random.choice(names),

'major':randon.choice(majors)

}

yieldperson

t1=time.clock()

people=people_list(10000000)

t2=time.clock()

#printsthememoryafterwerunthefunction

memory=mem_profile.memory_usage_resource()

print(f'Memory(After):{memory}Mb')

print('Took{time}seconds'.format(time=t2-t1))

#Output

Memory(Before):15Mb

Memory(After):15Mb

Took0.01seconds

Page 61: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

6CLOUDMESH

6.1INTRODUCTION☁�

LearningObjectives

IntroductiontothecloudmeshAPIUsingcmd5viacmsIntroduction to cloudmesh convenience API for output, dotdict, shell,stopwatch,benchmarkmanagementCreatingyourowncmscommandsCloudmeshconfigurationfileCloudmeshinventory

InthisChapterweliketointroduceyoutocloudmeshwhichprovidesyouwithanumberofconvenientmethodstointerfacewiththelocalsystem,butalsowithcloud services.Wewill startwhile focussing on some simpleAPI’s and thangraduallyintroducethecloudmeshshellwhichnotonlyprovidesashell,butalsoacommandline interfacesoyoucanusecloudmeshfroma terminal.Thisdualabilityisquiteusefulaswecanwritecloudmeshscripts,butcanalsoinvokethefunctionality from the terminal. This is quite an important distinction towardsothertoolsthatonlyallowcommandlineinterfaces.

Moreoverwealsoshoyouthatitiseasytocreatenewcommandsandaddthemdynamicallytothecloudmeshshellviasimplepipinstalls.

Cloudmeshisanevolvingprojectandyouhavetheopportunitytoimproveitifyouseesomefeaturesmissing.

Themanualofcloudmeshcanbefoundat

https://cloudmesh.github.io/cloudmesh-manual

TheAPIdocumentationislocatedat

Page 62: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

https://cloudmesh.github.io/cloudmesh-manual/api/index.html#cloudmesh-api

Wewillinitiallyfocusonasubsetofthisfunctionality.

6.2INSTALLATION☁�Theinstallationofcloudmeshissimpleandcantechnicallybedoneviapipbyauser.Howeveryouarenotauser,youareadeveloper.Cloudmeshisdistributedindifferenttopicalrepositoriesandinorderfordeveloperstoeasilyinteractwiththemwehavewrittenaconvenientcloudmesh-installerprogram.

As a developer you must also use a python virtual environment to avoidaffectingyoursystemwidepythoninstallation.ThiscanbeachievedwhileusingPython3 from python.org or via conda. We do recommend that you usepython.orgas this is thevanillapython thatmostdevelopers in theworlduse.Condaisoftenusedbyusersofpythoniftheynotneedtousebleeding-edgebutolderprepackagedpythontoolsandlibraries.

6.2.1Prerequisite

Werequireyoutocreateapythonvirtualenvironmentandactivateit.HowtodothiswasdiscussedinSection3.1.Pleasecreate theENV3environment.Pleaseactivateit.

6.2.2BasicInstall

Cloudmeshcaninstallfordevelopersanumberofbundles.Abundleisasetofgitrepositories that are needed for a particular install. For us, we are mostlyinterested in thebundles cms, cloud, storage.Wewill introduceyou tootherbundlesthroughoutthisdocumentation.

Ifyouliketofindoutmoreaboutthedetailsofthisyoucanlookatcloudmesh-installerwhichwillberegularlyupdated.

Tomakeuseofthebundleandtheeasyinstallationfordeveloperspleaseinstallthecloudmesh-installerviapip,butmakesureyoudothisinapythonvirtualenv

Page 63: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

asdiscussedpreviously. Ifnotyoumay impactyoursystemnegatively.Pleasenote thatwe are not responsible for fixing your computer.Naturally, you canalso use a virtualmachine, if you prefer. It is also important thatwe create auniform development environment. In our case we create an empty directorycalledcminwhichweplacethebundle.

Toseethebundleyoucanuse

WewillstartwiththebasiccloudmeshfunctionalityatthistimeandonlyinstalltheshellandsomecommonAPI’s.

Thesecommandsdownloadandinstallcloudmeshshellintoyourenvironment.Itisimportantthatyouusethe-eflag

Toseeifitworksyoucanusethecommand

Youwillseeanoutput.Ifthisdoesnotworkforyou,andyoucannotfigureouttheissue,pleasecontactussowecanidentifywhatwentwrong.

Formoreinformation,pleasevisitourInstallationInstructionsforDevelopers

6.3OUTPUT☁�Cloudmesh provides a number of convenient API’s to make output easier ormorefancyful.

TheseAPI’sinclude

ConsoleBannerHeading

$mkdircm

$cdcm

$pipinstallcloudmesh-installer

$cloudmesh-installerbundles

$cloudmesh-installergitclonecms

$cloudmesh-installerinstallcms-e

$cmshelp

Page 64: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

VERBOSE

6.3.1Console

Print is theusual function tooutput to the terminal.However,oftenwe like tohavecoloredoutputthathelpsusinthenotificationtotheuser.Forthisreasonwe have a simple Console class that has several built-in features. You can evenswitchanddefineyourowncolorschemes.

In case of the error message we also have convenient flags that allow us toincludethetracebackintheoutput.

Theprefixcanbeswitchedonandoffwith theprefix flag,while the traceflagswitchesonandofifthetraceshouldbeset.

The verbosity of the output is controlled via variables that are stored in the~/.cloudmeshdirectory.

Formorefeatures,seeAPI:Console

6.3.2Banner

Incaseyouneedabanneryoucandothiswith

Formorefeatures,seeAPI:Banner

fromcloudmesh.common.consoleimportConsole

msg="mymessage"

Console.ok(msg)#prinsagreenmessage

Console.error(msg)#prinsaredmessageproceededwithERROR

Console.msg(msg)#prinsaregularblackmessage

Console.error(msg,prefix=True,traceflag=True)

fromcloudmesh.common.variablesimportVariables

variables=Variables()

variables['debug']=True

variables['trace']=True

variables['verbose']=10

fromcloudmesh.common.utilimportbanner

banner("mytext")

Page 65: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

6.3.3Heading

AparticularusefulfunctionisHEADING()whichprintsthemethodname.

The invocation of the HEADING() function doit prints a banner with the nameinformation.ThereasonwedidnotdoitasadecoratoristhatyoucanplacetheHEADING()functioninanarbitrarylocationofthemethodbody.

Formorefeatures,seeAPI:Heading

6.3.4VERBOSE

VERBOSEisaveryusefulmethodallowingyoutoprintadictionary.Notonlywillitprintthedict,butitwillalsoprovideyouwiththeinformationinwhichfileitisusedandwhichlinenumber.Itwillevenprintthenameofthedict thatyouuseinyourcode.

To use this youwill have to enable the debuggingmethods for cloudmesh asdiscusedinSection6.3.1

Formorefeatures,pleaseseeVERBOSE

6.3.5Usingprintandpprint

Inmanycasesitmaybesufficienttouseprintandpprintfordebugging.However,asthecodeisbigandyoumayforgetwhereyouplacedprintstatementsortheprintstatementsmayhavebeenaddedbyothers,werecommendthatyouusetheVERBOSE function. If you use print or pprint we recommend using a uniqueprefix,suchas:

fromcloudmesh.common.utilimportHEADING

classExample(object):

defdoit(self):

HEADING()

print("Hello")

fromcloudmesh.common.debugimportVERBOSE

m={"key":"value"}

VERBOSE(m)

frompprintimportpprint

Page 66: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

6.4DICTIONARIES☁�6.4.1Dotdict

For simple dictionaries we sometimes like to simplify the notation with a .

insteadofusingthe[]:

Youcanachievethiswithdotdict

Nowyoucaneithercall

or

Thisisespaciallyusefulinifconditionsasitmaybeeasiertoreadandwrite

andisthesameas

Formorefeatures,seeAPI:dotdict

6.4.2FlatDict

d={"sample":"value"}

print("MYDEBUG:")

pprint(d)

#orwithprint

print("MYDEBUG:",d)

fromcloudmesh.common.dotdictimportdotdict

data={

"name":"Gregor"

}

data=dotdict(data)

data["name"]

data.name

ifdata.nameis"Gregor":

print("thisisquitereadable")

ifdata["name"]is"Gregor":

print("thisisquitereadable")

Page 67: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Insomecasesitisusefultobeabletoflattenoutdictionariesthatcontaindictswithindicts.ForthiswecanuseFlatDict.

Thiswillbeconvertedtoadictwiththefollowingstructure.

With sep you can change the sepaerator between the nested dict attributes. Formorefeatures,seeAPI:dotdict

6.4.3PrintingDicts

In case we want to print dicts and lists of dicts in various formats, we haveincludedasimplePrinterthatcanprintadictinyaml,json,table,andcsvformat.

Thefunctioncanevenguessfromthepassedparameterswhattheinputformatisandusestheappropriateinternalfunction.

Acommonexampleis

fromcloudmesh.common.FlatdictimportFlatDict

data={

"name":"Gregor"

"address":{

"city":"Bloomington",

"state":"IN"

}

}

flat=FlatDict(data,sep=".")

flat={

"name":"Gregor"

"address.city":"Bloomington",

"address.state":"IN"

}

frompprintimportpprint

fromcloudmesh.common.PrinterimportPrinter

data=[

{

"name":"Gregor",

"address":{

"street":"FunnyLane11",

"city":"Cloudville"

}

},

{

"name":"Albert",

"address":{

"street":"MemoryLane1901",

"city":"Cloudnine"

}

}

]

Page 68: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Formorefeatures,seeAPI:Printer

Moreexamplesareavailableinthesourcecodeastests

6.5SHELL☁�Python provides a sophisticated method for starting background processes.However inmanycases it isquitecomplex to interactwith it. Italsodoesnotprovideconvenientwrappersthatwecanusetostarttheminapythonicfashion.Forthisreasonwehavewrittenaprimitive Shellclass thatprovidesjustenoughfunctionalitytobeusefulinmanycases.

Let us review some exampleswhere result is set to theoutput of the commandbeingexecuted.

Formanycommoncommands,weprovidebuilt-infunctions.Forexample:

Thelistincludes(naturallythecommandsmustbeavailableonyourOS.IftheshellcommandisnotavailableonyourOS,pleasehelpusimprovingthecodetoeither provide functions that work on your OS or develop with us platformindependentfunctionalityofasubsetofthefunctionalityfortheshellcommand

pprint(data)

table=Printer.flatwrite(data,

sort_keys=["name"],

order=["name","address.street","address.city"],

header=["Name","Street","City"],

output='table')

print(table)

fromcloudmesh.common.ShellimportShell

result=Shell.execute('pwd')

print(result)

result=Shell.execute('ls',["-l","-a"])

print(result)

result=Shell.execute('ls',"-l-a")

print(result)

result=Shell.ls("-aux")

print(result)

result=Shell.ls("-a","-u","-x")

print(result)

result=Shell.pwd()

print(result)

Page 69: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

thatwemaybenefitfrom.

VBoxManage(cls,*args)

bash(cls,*args)

blockdiag(cls,*args)

brew(cls,*args)

cat(cls,*args)

check_output(cls,*args,**kwargs)

check_python(cls)

cm(cls,*args)

cms(cls,*args)

command_exists(cls,name)

dialog(cls,*args)

edit(filename)

execute(cls,*args)

fgrep(cls,*args)

find_cygwin_executables(cls)

find_lines_with(cls,lines,what)

get_python(cls)

git(cls,*args)

grep(cls,*args)

head(cls,*args)

install(cls,name)

install(cls,name)

keystone(cls,*args)

kill(cls,*args)

live(cls,command,cwd=None)

ls(cls,*args)

mkdir(cls,directory)

mongod(cls,*args)

nosetests(cls,*args)

nova(cls,*args)

operating_system(cls)

pandoc(cls,*args)

ping(cls,host=None,count=1)

pip(cls,*args)

ps(cls,*args)

Page 70: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

pwd(cls,*args)

rackdiag(cls,*args)

remove_line_with(cls,lines,what)

rm(cls,*args)

rsync(cls,*args)

scp(cls,*args)

sh(cls,*args)

sort(cls,*args)

ssh(cls,*args)

sudo(cls,*args)

tail(cls,*args)

terminal(cls,command='pwd')

terminal_type(cls)

unzip(cls,source_filename,dest_dir)

vagrant(cls,*args)

version(cls,name)

which(cls,command)

Formorefeatures,pleaseseeShell

6.6STOPWATCH☁�Often you find yourself in a situation where you like to measure the timebetween two events.We provide a simple StopWatch that allows you not only tomeasureanumberoftimes,butalsotoprintthemoutinaconvenientformat.

Toprintthem,youcanalsouse:

Formorefeatures,pleaseseeeStopWatch

fromcloudmesh.common.StopWatchimportStopWatch

fromtimeimportsleep

StopWatch.start("test")

sleep(1)

StopWatch.stop("test")

print(StopWatch.get("test"))

StopWatch.benchmark()

Page 71: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

6.7CLOUDMESHCOMMANDSHELL☁�

6.7.1CMD5

Python’s CMD (https://docs.python.org/2/library/cmd.html) is a very usefulpackagetocreatecommandlineshells.Howeveritdoesnotallowthedynamicintegrationofnewlydefinedcommands.Furthermore,additionstoCMDneedtobe donewithin the same source tree. To simplify developing commands by anumber of people and to have a dynamic plugin mechanism, we developedcmd5.Itisarewriteonourearliereffortsincloudmeshclientandcmd3.

6.7.1.1Resources

Thesourcecodeforcmd5islocatedingithub:

https://github.com/cloudmesh/cmd5

We have discussed in Section 6.2 how to install cloudmesh as developer andhaveaccesstothesourcecodeinadirectorycalledcm.Asyoureadthisdocumentweassumeyouareadeveloperandcanskipthenextsection.

6.7.1.2Installationfromsource

WARNING:DONOT EXECUTE THIS IFYOUAREADEVELOPERORYOURENVIRONMENTWILLNOTPROPERLYWORK.

However,ifyouareauserofcloudmeshyoucaninstallitwith

6.7.1.3Execution

To run the shell you can activate it with the cms command. cms stands forcloudmeshshell:

Itwillprintthebannerandentertheshell:

$pipinstallcloudmesh-cmd5

(ENV2)$cms

Page 72: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Toseethelistofcommandsyoucansay:

Toseethemanualpageforaspecificcommand,pleaseuse:

6.7.1.4CreateyourownExtension

OneofthemostimportantfeaturesofCMD5isitsabilitytoextenditwithnewcommands.Thisisdoneviapackagednamespaces.Werecommendyounameiscloudmesh-mycommand,wheremycommand is thenameof thecommand thatyou like to create. This can easily be done while using the sys* cloudmeshcommand (we suggest you use a different name than gregor maybe yourfirstname):

Itwilldownloadatemplatefromcloudmeshcalledcloudmesh-barandgenerateanewdirectorycloudmesh-gregorwithalltheneededfilestocreateyourowncommandandregister it dynamically with cloudmesh. All you have to do is to cd into thedirectoryandinstallthecode:

Addingyourowncommandiseasy.Itisimportantthatallobjectsaredefinedinthe command itself and that noglobal variables beuse in order to alloweachshell command to stand alone. Naturally you should develop API librariesoutside of the cloudmesh shell command and reuse them in order to keep thecommandcodeassmallaspossible.Weplacethecommandin:

+-------------------------------------------------------+

|_______|

|/___||_______||____________||__|

|||||/_\||||/_`|'_`_\/_\/__|'_\|

|||___||(_)||_||(_|||||||__/\__\||||

|\____|_|\___/\__,_|\__,_|_||_||_|\___||___/_||_||

+-------------------------------------------------------+

|CloudmeshCMD5Shell|

+-------------------------------------------------------+

cms>

cms>help

helpCOMMANDNAME

$cmssyscommandgenerategregor

$cdcloudmesh-gregor

$pythonsetup.pyinstall

#pipinstall.

cloudmsesh/mycommand/command/gregor.py

Page 73: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Nowyoucangoaheadandmodifyyourcommandinthatdirectory.Itwilllooksimilarto(ifyouusedthecommandnamegregor):

An important difference to other CMD solutions is that our commands canleverage (besides the standarddefinition), docopts as away todefine themanualpage. This allows us to use arguments as dict and use simple if conditions tointerpret the command. Using docopts has the advantage that contributors areforcedtothinkaboutthecommandanditsoptionsanddocumentthemfromthestart.Previouslywedidnotusebutargparseandclick.Howeverwenoticedthatforourcontributorsbothsystemsleadtocommandsthatwereeithernotproperlydocumentedor thedevelopersdelivered ambiguous commands that resulted inconfusionandwrongusagebysubsequentusers.Hence,wedorecommendthatyou use docopts for documenting cmd5 commands. The transformation isenabledbythe@commanddecoratorthatgeneratesamanualpageandcreatesaproper help message for the shell automatically. Thus there is no need tointroduceaseparatehelpmethodaswouldnormallybeneeded inCMDwhilereducingtheeffortittakestocontributenewcommandsinadynamicfashion.

6.7.1.5Bug:Quotes

Wehaveonebugincmd5thatrelatestotheuseofquotesonthecommandline

Forexampleyouneedtosay

from__future__importprint_function

fromcloudmesh.shell.commandimportcommand

fromcloudmesh.shell.commandimportPluginCommand

classGregorCommand(PluginCommand):

@command

defdo_gregor(self,args,arguments):

"""

::

Usage:

gregor-fFILE

gregorlist

Thiscommanddoessomeusefulthings.

Arguments:

FILEafilename

Options:

-fspecifythefile

"""

print(arguments)

ifarguments.FILE:

print("Youhaveusedfile:",arguments.FILE)

return""

$cmsgregor-f\"filenamewithspaces\"

Page 74: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Ifyou like tohelpus fix this thatwouldbegreat. it requires theuseof shlex.Unfortuantlywedidnotyettimetofixthis“feature”.

6.8EXERCISES☁�Whendoingyourassignment,make sureyou label theprogramsappropriatelywith comments that clearly identify the assignment.Place all assignments in afolderongithubnamed“cloudmesh-exercises”

ForexamplenametheprogramsolvingE.Cloudmesh.Common.1e-cloudmesh-1.pyandsoon.Formorecomplexassignmentsyoucannamethemasyoulike,aslongasinthefileyouhaveacommentsuchas#fa19-516-000E.Cloudmesh.Common.1

at the beginning of the file. Please do not store any screenshots in your gitrepositoryofyourworkingprogram.

6.8.1CloudmeshCommon

E.Cloudmesh.Common.1

Developaprogramthatdemonstratestheuseofbanner,HEADING,andVERBOSE.

E.Cloudmesh.Common.2

Developaprogramthatdemonstratestheuseofdotdict.

E.Cloudmesh.Common.3

DevelopaprogramthatdemonstratestheuseofFlatDict.

E.Cloudmesh.Common.4

Developaprogramthatdemonstratestheuseofcloudmesh.common.Shell.

E.Cloudmesh.Common.5

Developaprogramthatdemonstratestheuseofcloudmesh.common.StopWatch.

Page 75: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

6.8.2CloudmeshShell

E.Cloudmesh.Shell.1

Installcmd5andthecommandcmsonyourcomputer.

E.Cloudmesh.Shell.2

Writeanewcommandwithyourfirstnameasthecommandname.

E.Cloudmesh.Shell.3

Write a new command and experiment with docopt syntax andargumentinterpretationofthedictwithifconditions.

E.Cloudmesh.Shell.4

Ifyouhaveusefulextensionsthatyoulikeustoaddbydefault,pleaseworkwithus.

E.Cloudmesh.Shell.5

Atthis timeoneneedstoquoteinsomecommandsthe " intheshellcommandline.Developandtestcodethatfixesthis.

Page 76: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

7LIBRARIES

7.1PYTHONMODULES☁�OftenyoumayneedfunctionalitythatisnotpresentinPython’sstandardlibrary.Inthiscaseyouhavetwooption:

implementthefeaturesyourselfuseathird-partylibrarythathasthedesiredfeatures.

Oftenyoucanfindapreviousimplementationofwhatyouneed.Sincethisisacommonsituation,thereisaservicesupportingit:thePythonPackageIndex(orPyPiforshort).

Our task here is to install the autopep8 tool from PyPi. Thiswill allow us toillustratetheuseifvirtualenvironmentsusingthepyenvorvirtualenvcommand,andinstallinganduninstallingPyPipackagesusingpip.

7.1.1UpdatingPip

Itisimportantthatyouhavethenewestversionofpipinstalledforyourversionof python. Let us assume your python is registered with python and you usepyenv,thanyoucanupdatepipwith

without interferingwith a potential systemwide installed version of p ip thatmaybeneededby the systemdefaultversionofpython.See the sectionaboutpyenvformoredetails

7.1.2UsingpiptoInstallPackages

Letusnowlookatanother important toolforPythondevelopment: thePythonPackageIndex,orPyPIforshort.PyPIprovidesalargesetofthird-partypythonpackages. If youwant todo something inpython, first checkpypi, asodd aresomeonealreadyranintotheproblemandcreatedapackagesolvingit.

pipinstall-Upip

Page 77: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

InordertoinstallpackagefromPyPI,usethepipcommand.WecansearchforPyPIforpackages:

Itappearsthatthetoptworesultsarewhatwewantsoinstallthem:

Thiswill cause pip to download the packages fromPyPI, extract them, checktheir dependencies and install those as needed, then install the requestedpackages.

Youcanskip‘–trusted-hostpypi.python.org’optionifyouhave

patchedurllib3onPython2.7.9.

7.1.3GUI

7.1.3.1GUIZero

Installguizerowiththefollowingcommand:

Foracomprehensivetutorialonguizero,clickhere.

7.1.3.2Kivy

YoucaninstallKivyonmacOSasfollows:

Ahelloworldprogramforkivy is included in thecloudmesh.robot repository.Whichyoucanfinehere

https://github.com/cloudmesh/cloudmesh.robot/tree/master/projects/kivy

To run the program, please download it or execute it in cloudmesh.robot as

$pipsearch--trusted-hostpypi.python.orgautopep8pylint

$pipinstall--trusted-hostpypi.python.orgautopep8pylint

sudopipinstallguizero

brewinstallpkg-configsdl2sdl2_imagesdl2_ttfsdl2_mixergstreamer

pipinstall-UCython

pipinstallkivy

pipinstallpygame

Page 78: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

follows:

Tocreatestandalonepackageswithkivy,pleasesee:

7.1.4FormattingandCheckingPythonCode

First,getthebadcode:

Examinethecode:

As you can see, this is very dense and hard to read. Cleaning it up by handwouldbeatime-consuminganderror-proneprocess.Luckily,thisisacommonproblemsothereexistacouplepackagestohelpinthissituation.

7.1.5Usingautopep8

Wecannowrunthebadcodethroughautopep8tofixformattingproblems:

Letuslookattheresult.Thisisconsiderablybetterthanbefore.Itiseasytotellwhattheexample1andexample2functionsaredoing.

It is a good idea to develop a habit of using autopep8 in your python-development workflow. For instance: use autopep8 to check a file, and if itpasses,makeanychangesinplaceusingthe-iflag:

IfyouusepyCharmyouhavetheabilitytouseasimilarfunctionwhilepressingonInspectCode.

7.1.6WritingPython3CompatibleCode

cdcloudmesh.robot/projects/kivy

pythonswim.py

-https://kivy.org/docs/guide/packaging-osx.html

$wget--no-check-certificatehttp://git.io/pXqb-Obad_code_example.py

$emacsbad_code_example.py

$autopep8bad_code_example.py>code_example_autopep8.py

$autopep8file.py#checkoutputtoseeofpasses

$autopep8-ifile.py#updateinplace

Page 79: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Towritepython2and3compatiblecodewerecommendthatyoutakealookat:http://python-future.org/compatible_idioms.html

7.1.7UsingPythononFutureSystems

ThisisonlyimportantifyouuseFuturesystemsresources.

InordertousePythonyoumust logintoyourFutureSystemsaccount.Thenattheshellpromptexecutethefollowingcommand:

Thiswillmakethepythonandvirtualenvcommandsavailabletoyou.

Thedetailsofwhatthemoduleloadcommanddoesaredescribedinthefuturelessonmodules.

7.1.8Ecosystem

7.1.8.1pypi

The Python Package Index is a large repository of software for the Pythonprogramminglanguagecontaininga largenumberofpackages,manyofwhichcanbefoundonpypi.Thenice thingaboutpypi is thatmanypackagescanbeinstalledwiththeprogram‘pip’.

Todosoyouhave to locate the<package_name>forexamplewith thesearchfunctioninpypiandsayonthecommandline:

where package_name is the string name of the package. an example would be thepackagecalledcloudmesh_clientwhichyoucaninstallwith:

Ifallgoeswellthepackagewillbeinstalled.

7.1.8.2AlternativeInstallations

$moduleloadpython

$pipinstall<package_name>

$pipinstallcloudmesh_client

Page 80: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

The basic installation of python is provided by python.org. However othersclaim to have alternative environments that allow you to install python. Thisincludes

CanopyAnacondaIronPython

Typically they include not only the python compiler but also several usefulpackages.Itisfinetousesuchenvironmentsfortheclass,butitshouldbenotedthat in both cases not every python librarymay be available for install in thegivenenvironment.Forexampleifyouneedtousecloudmeshclient,itmaynotbeavailableascondaorCanopypackage.This isalso thecaseformanyothercloudrelatedandusefulpythonlibraries.Hence,wedorecommendthat ifyouare new to python to use the distribution form python.org, and use pip andvirtualenv.

Additionally some python version have platform specific libraries ordependencies.Forexamplecocalibraries,.NETorotherframeworksareexamples.Fortheassignmentsandtheprojectssuchplatformdependentlibrariesarenottobeused.

If however you can write a platform independent code that works on Linux,macOSandWindowswhileusingthepython.orgversionbutdevelopitwithanyoftheothertoolsthatisjustfine.Howeveritisuptoyoutoguaranteethatthisindependence is maintained and implemented. You do have to writerequirements.txtfilesthatwillinstallthenecessarypythonlibrariesinaplatformindependent fashion.ThehomeworkassignmentPRG1hasevena requirementtodoso.

Inordertoprovideplatformindependencewehavegivenintheclassaminimalpythonversionthatwehavetestedwithhundredsofstudents:python.org.Ifyouuseanyotherversion,thatisyourdecision.Additionallysomestudentsnotonlyusepython.orgbuthaveusediPythonwhichisfinetoo.Howeverthisclassisnotonlyaboutpython,butalsoabouthowtohaveyourcoderunonanyplatform.Thehomeworkisdesignedsothatyoucanidentifyasetupthatworksforyou.

Howeverwehaveconcernsifyouforexamplewantedtousechameleoncloud

Page 81: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

whichwerequireyoutoaccesswithcloudmesh.cloudmeshisnotavailableasconda,canopy,orotherframeworkpackage.Cloudmeshclientisavailableformpypiwhichisstandardandshouldbesupportedbytheframeworks.Wehavenottestedcloudmeshonanyotherpythonversionthenpython.orgwhichistheopensourcecommunitystandard.Noneoftheotherversionsarestandard.

In factwe had students over the summer using canopyon theirmachines andtheygotconfusedas theynowhadmultiplepythonversionsanddidnotknowhow to switchbetween themandactivate the correct version.Certainly if youknowhowtodothat,thanfeelfreetousecanopy,andifyouwanttousecanopyall this isuptoyou.However thehomeworkandprojectrequiresyoutomakeyourprogramportabletopython.org.Ifyouknowhowtodothatevenifyouusecanopy,anaconda,oranyotherpythonversionthatisfine.Graderswilltestyourprograms on a python.org installation and not canopy, anaconda, ironpythonwhileusingvirtualenv. It isobviouswhy. Ifyoudonotknowthatansweryoumaywant to thinkabout thatevery timetheytestaprogramtheyneedtodoanewvirtualenvandrunvanillapythoninit.Ifweweretoruntwoinstallsinthesamesystem,thiswillnotworkaswedonotknowifonestudentwillcauseasideeffect foranother.Thusweas instructorsdonot justhave to lookatyourcode but code of hundreds of students with different setups. This is a nonscalablesolutionaseverytimewetestoutcodefromastudentwewouldhavetowipeout theOS, install itnew, installannewversionofwhateverpythonyouhave elected, become familiarwith that version and so on and on.This is thereason why the open source community is using python.org.We follow bestpractices.Usingotherversionsisnotacommunitybestpractice,butmayworkforanindividual.

We have however in regards to using other python version additional bonusprojectssuchas

deployrunanddocumentcloudmeshonironpythondeploy run and document cloudmesh on anaconda, develop script togenerateacondapackageformgithubdeployrunanddocumentcloudmeshoncanopy,developscripttogenerateacondapackageformgithubdeployrunanddocumentcloudmeshonironpythonotherdocumentationthatwouldbeuseful

Page 82: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

7.1.9Resources

IfyouareunfamiliarwithprogramminginPython,wealsoreferyoutosomeofthenumerousonlineresources.YoumaywishtostartwithLearnPythonorthebookLearnPythontheHardWay.OtheroptionsincludeTutorialsPointorCodeAcademy, and the Python wiki page contains a long list of references forlearningaswell.Additionalresourcesinclude:

https://virtualenvwrapper.readthedocs.iohttps://github.com/yyuu/pyenvhttps://amaral.northwestern.edu/resources/guides/pyenv-tutorialhttps://godjango.com/96-django-and-python-3-how-to-setup-pyenv-for-multiple-pythons/https://www.accelebrate.com/blog/the-many-faces-of-python-and-how-to-manage-them/http://ivory.idyll.org/articles/advanced-swc/http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.htmlhttp://www.youtube.com/watch?v=0vJJlVBVTFghttp://www.korokithakis.net/tutorials/python/http://www.afterhoursprogramming.com/tutorial/Python/Introduction/http://www.greenteapress.com/thinkpython/thinkCSpy.pdfhttps://docs.python.org/3.3/tutorial/modules.htmlhttps://www.learnpython.org/en/Modules/_and/_Packageshttps://docs.python.org/2/library/datetime.htmlhttps://chrisalbon.com/python/strings/_to/_datetime.html

Averylonglistofusefulinformationarealsoavailablefrom

https://github.com/vinta/awesome-pythonhttps://github.com/rasbt/python_reference

This list may be useful as it also contains links to data visualization andmanipulationlibraries,andAItoolsandlibraries.Pleasenotethatforthisclassyoucanreusesuchlibrariesifnototherwisestated.

7.1.9.1JupyterNotebookTutorials

Page 83: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

AShort Introduction toJupyterNotebooksandNumPyToviewthenotebook,open this link in a background tab https://nbviewer.jupyter.org/ and copy andpaste the following link in the URL input areahttps://cloudmesh.github.io/classes/lesson/prg/Jupyter-NumPy-tutorial-I523-F2017.ipynbThenhitGo.

7.1.10Exercises

E.Python.Lib.1:

Write a python program called iterate.py that accepts an integer nfromthecommandline.Passthisintegertoafunctioncallediterate.

Theiteratefunctionshouldtheniteratefrom1ton.Ifthei-thnumberis a multiple of three, print multiple of 3, if a multiple of 5 printmultipleof5,ifamultipleofbothprintmultipleof3and5,elseprintthevalue.

E:Python.Lib.2:

1. Createapyenvorvirtualenv~/ENV

2. Modify your ~/.bashrc shell file to activate your environmentuponlogin.

3. Installthedocoptpythonpackageusingpip

4. Write a program that uses docopt to define a commandlineprogram.Hint:modifytheiterateprogram.

5. Demonstratetheprogramworks.

7.2DATAMANAGEMENT☁�Obviouslywhendealingwithbigdatawemaynotonlybedealingwithdatainoneformatbutinmanydifferentformats.Itisimportantthatyouwillbeabletomastersuchformatsandseamlesslyintegrateinyouranalysis.Thusweprovidesome simple examples on which different data formats exist and how to use

Page 84: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

them.

7.2.1Formats

7.2.1.1Pickle

Pythonpickleallowsyoutosavedatainapythonnativeformatintoafilethatcan later be read in by other programs.However, the data formatmay not beportableamongdifferentpythonversionsthustheformatisoftennotsuitabletostoreinformation.Insteadwerecommendforstandarddatatouseeitherjsonoryaml.

Toreaditbackinuse

7.2.1.2TextFiles

Toreadtextfilesintoavariablecalledcontentyoucanuse

Youcanalsousethefollowingcodewhileusingtheconvenientwithstatement

Tosplitupthelinesofthefileintoanarrayyoucando

Thiscamalsobedonewiththebuildinreadlinesfunction

Incasethefileistoobigyouwillwanttoreadthefilelinebyline:

importpickle

flavor={

"small":100,

"medium":1000,

"large":10000

}

pickle.dump(flavor,open("data.p","wb"))

flavor=pickle.load(open("data.p","rb"))

content=open('filename.txt','r').read()

withopen('filename.txt','r')asfile:

content=file.read()

withopen('filename.txt','r')asfile:

lines=file.read().splitlines()

lines=open('filename.txt','r').readlines()

Page 85: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

7.2.1.3CSVFiles

Oftendataiscontainedincommaseparatedvalues(CSV)withinafile.Toreadsuchfilesyoucanusethecsvpackage.

Usingpandasyoucanreadthemasfollows.

TherearemanyothermodulesandlibrariesthatincludeCSVreadfunctions.Incaseyouneedtosplitasinglelinebycomma,youmayalsousethesplitfunction.However,rememberitswillsplitateverycomma,includingthosecontainedinquotes.Sothismethodalthoughlookingoriginallyconvenienthaslimitations.

7.2.1.4Excelspreadsheets

PandascontainsamethodtoreadExcelfiles

7.2.1.5YAML

YAML is a very important format as it allows you easily to structure data inhierarchicalfieldsItisfrequentlyusedtocoordinateprogramswhileusingyamlasthespecificationforconfigurationfiles,butalsodatafiles.Toreadinayamlfilethefollowingcodecanbeused

Thenicepartisthatthiscodecanalsobeusedtoverifyifafileisvalidyaml.Towritedataoutwecanuse

withopen('filename.txt','r')asfile:

line=file.readline()

print(line)

importcsv

withopen('data.csv','rb')asf:

contents=csv.reader(f)

forrowincontent:

printrow

importpandasaspd

df=pd.read_csv("example.csv")

importpandasaspd

filename='data.xlsx'

data=pd.ExcelFile(file)

df=data.parse('Sheet1')

importyaml

withopen('data.yaml','r')asf:

content=yaml.load(f)

Page 86: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

The flow style set to false formats the data in a nice readable fashion withindentations.

7.2.1.6JSON

7.2.1.7XML

XML format is extensively used to transport data across the web. It has ahierarchicaldataformat,andcanberepresentedintheformofatree.

ASampleXMLdatalookslike:

PythonprovidestheElementTreeXMLAPItoparseandcreateXMLdata.

ImportingXMLdatafromafile:

ReadingXMLdatafromastringdirectly:

Iteratingoverchildnodesinaroot:

ModifyingXMLdatausingElementTree:

Modifyingtextwithinatagofanelementusing.textmethod:

withopen('data.yml','w')asf:

yaml.dump(data,f,default_flow_style=False)

importjson

withopen('strings.json')asf:

content=json.load(f)

<data>

<items>

<itemname="item-1"></item>

<itemname="item-2"></item>

<itemname="item-3"></item>

</items>

</data>

importxml.etree.ElementTreeasET

tree=ET.parse('data.xml')

root=tree.getroot()

root=ET.fromstring(data_as_string)

forchildinroot:

print(child.tag,child.attrib)

tag.text=new_data

tree.write('output.xml')

Page 87: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Adding/modifyinganattributeusing.set()method:

OtherPythonmodulesusedforparsingXMLdatainclude

minidom:https://docs.python.org/3/library/xml.dom.minidom.htmlBeautifulSoup:https://www.crummy.com/software/BeautifulSoup/

7.2.1.8RDF

ToreadRDFfilesyouwillneedtoinstallRDFlibwith

ThiswillthanallowyoutoreadRDFfiles

Good examples on using RDF are provided on the RDFlib Web page athttps://github.com/RDFLib/rdflib

FromtheWebpageweshowcasealsohowtodirectlyprocessRDFdata fromtheWeb

7.2.1.9PDF

The Portable Document Format (PDF) has been made available by AdobeInc.royaltyfree.ThishasenabledPDFtobecomeaworldwideadoptedformatthat also has been standardized in 2008 (ISO/IEC 32000-1:2008,https://www.iso.org/standard/51502.html). A lot of research is published inpapersmakingPDFoneofthede-factostandardsforpublishing.However,PDFis difficult to parse and is focused on high quality output instead of datarepresentation.Nevertheless,toolstomanipulatePDFexist:

tag.set('key','value')

tree.write('output.xml')

$pipinstallrdflib

fromrdflib.graphimportGraph

g=Graph()

g.parse("filename.rdf",format="format")

forentrying:

print(entry)

importrdflib

g=rdflib.Graph()

g.load('http://dbpedia.org/resource/Semantic_Web')

fors,p,oing:

prints,p,o

Page 88: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

PDFMiner

https://pypi.python.org/pypi/pdfminer/allowsthesimpletranslationofPDFinto text that than can be further mined. The manual page helps todemonstratesomeexampleshttp://euske.github.io/pdfminer/index.html.

pdf-parser.py

https://blog.didierstevens.com/programs/pdf-tools/ parses pdf documentsandidentifiessomestructuralelementsthatcanthanbefurtherprocessed.

Ifyouknowaboutothertools,letusknow.

7.2.1.10HTML

A very powerful library to parse HTML Web pages is provided withhttps://www.crummy.com/software/BeautifulSoup/

More details about it are provided in the documentation pagehttps://www.crummy.com/software/BeautifulSoup/bs4/doc/

�TODO:Studentscancontributeasection

BeautifulSoupisapythonlibrarytoparse,processandeditHTMLdocuments.

ToinstallBeautifulSoup,usepipcommandasfollows:

In order to process HTML documents, a parser is required. Beautiful Soupsupports the HTML parser included in Python’s standard library, but it alsosupports a number of third-party Python parsers like the lxml parser which iscommonlyused[1].

Followingcommandcanbeusedtoinstalllxmlparser

Tobeginwith,weimportthepackageandinstantiateanobjectasfollowsforahtmldocumenthtml_handle:

$pipinstallbeautifulsoup4

$pipinstalllxml

Page 89: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Now,wewilldiscussafewfunctions,attributesandmethodsofBeautifulSoup.

prettifyfunction

prettify() method will turn a Beautiful Soup parse tree into a nicely formattedUnicode string,witha separate line for eachHTML/XML tagand string. It isanalgoustopprint()function.Theobjectcreatedabovecanbeviewedbyprintingtheprettfiedversionofthedocumentasfollows:

tagObject

AtagobjectreferstotagsintheHTMLdocument.ItispossibletogodowntotheinnerlevelsoftheDOMtree.Toaccessatagdivunderthetagbody,itcanbedoneasfollows:

TheattrsattributeofthetagobjectreturnsadictionaryofallthedefinedattributesoftheHTMLtagaskeys.

has_attr()method

Tocheckifatagobjecthasaspecificattribute,has_attr()methodcanbeused.

tagobjectattributes

name-Thisattributereturnsthenameofthetagselected.attrs -Thisattribute returnsadictionaryofall thedefinedattributesof theHTMLtagaskeys.contents -Thisattributereturnsa listofcontentsenclosedwithin theHTMLtagstring-ThisattributewhichreturnsthetextenclosedwithintheHTMLtag.ThisreturnsNoneiftherearemultiplechildrenstrings-Thisovercomesthelimitationofstringandreturnsageneratorofall

frombs4importBeautifulSoup

soup=BeautifulSoup(html_handle,`lxml`)

print(soup.prettify())

body_div=soup.body.div

print(body_div.prettify())

ifbody_div.has_attr('p'):

print('Thevalueof\'p\'attributeis:',body_div['p'])

Page 90: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

stringsenclosedwithinthegiventag

Followingcodeshowcasesusageoftheabovediscussedattributes:

SearchingtheTree

find() function takes a filter expression as argument and returns the firstmatchfoundfindall()functionreturnsalistofallthematchingelements

select()functioncanbeusedtosearchthetreeusingCSSselectors

7.2.1.11ConfigParser

�TODO:Studentscancontributeasection

https://pymotw.com/2/ConfigParser/

7.2.1.12ConfigDict

https://github.com/cloudmesh/cloudmesh-common/blob/master/cloudmesh/common/ConfigDict.py

7.2.2Encryption

body_tag=soup.body

print("Nameofthetag:',body_tag.name)

attrs=body_tag.attrs

print('Theattributesdefinedforbodytagare:',attrs)

print('Thecontentsof\'body\'tagare:\n',body_tag.contents)

print('Thestringvalueenclosedin\'body\'tagis:',body_tag.string)

forsinbody_tag.strings:

print(repr(s))

search_elem=soup.find('a')

print(search_elem.prettify())

search_elems=soup.find_all("a",class_="sample")

pprint(search_elems)

#Select`a`tagwithclass`sample`

a_tag_elems=soup.select('a.sample')

print(a_tag_elems)

Page 91: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Oftenweneedtoprotecttheinformationstoredinafile.Thisisachievedwithencryption.Therearemanymethodsofsupportingencryptionandevenifafileisencrypteditmaybetargettoattacks.Thusitisnotonlyimportanttoencryptdatathatyoudonotwantotherstosebutalsotomakesurethatthesystemonwhichthedataishostedissecure.Thisisespeciallyimportantifwetalkaboutbigdatahavingapotentiallargeeffectifitgetsintothewronghands.

To illustrate one type of encryption that is non trivial we have chosen todemonstrate how to encrypt a file with an ssh key. In case you have opensslinstalledonyoursystem,thiscanbeachievedasfollows.

MostimportanthereareStep4thatencryptsthefileandStep5thatdecryptsthefile. Using the Python os module it is straight forward to implement this.However,weareprovidingincloudmeshaconvenientclassthatmakestheuseinpythonverysimple.

Inourclassweinitializeitwiththelocationsofthefilethatistobeencryptedanddecrypted.Toinitiatethatactionjustcallthemethodsencryptanddecrypt.

7.2.3DatabaseAccess

�TODO:Students:defineconventionaldatabaseaccesssection

see:https://www.tutorialspoint.com/python/python_database_access.htm

7.2.4SQLite

#!/bin/sh

#Step1.Creatingafilewithdata

echo"BigDataisthefuture.">file.txt

#Step2.Createthepem

opensslrsa-in~/.ssh/id_rsa-pubout>~/.ssh/id_rsa.pub.pem

#Step3.lookatthepemfiletoillustratehowitlookslike(optional)

cat~/.ssh/id_rsa.pub.pem

#Step4.encryptthefileintosecret.txt

opensslrsautl-encrypt-pubin-inkey~/.ssh/id_rsa.pub.pem-infile.txt-outsecret.txt

#Step5.decryptthefileandprintthecontentstostdout

opensslrsautl-decrypt-inkey~/.ssh/id_rsa-insecret.txt

fromcloudmesh.common.ssh.encryptimportEncryptFile

e=EncryptFile('file.txt','secret.txt')

e.encrypt()

e.decrypt()

Page 92: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

�TODO:Studentscancontributetothissection

https://www.sqlite.org/index.html

https://docs.python.org/3/library/sqlite3.html

7.2.4.1Exercises �

E:Encryption.1:

Testtheshellscripttoreplicatehowthisexampleworks

E:Encryption.2:

Testthecloudmeshencryptionclass

E:Encryption.3:

What other encryptionmethods exist. Can you provide an exampleandcontributetothesection?

E:Encryption.4:

WhatistheissueofencryptionthatmakeitchallengingforBigData

E:Encryption.5:

Givenatestdatasetwithmanyfilestextfiles,howlongwillittaketoencrypt anddecrypt themon variousmachines.Write a benchmarkthatyoutest.Developthisbenchmarkasagroup,testoutthetimeittakestoexecuteitonavarietyofplatforms.

7.3PLOTTINGWITHMATPLOTLIB☁�Abrief overviewofplottingwithmatplotlib alongwith examples is provided.Firstmatplotlibmustbeinstalled,whichcanbeaccomplishedwithpipinstallasfollows:$pipinstallmatplotlib

Page 93: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Wewillstartbyplottingasimplelinegraphusingbuiltinnumpyfunctionsforsineandcosine.Thisfirststepistoimporttheproperlibrariesshownnext.

Nextwewilldefinethevaluesforthexaxis,wedothiswiththelinspaceoptioninnumpy.Thefirsttwoparametersarethestartingandendingpoints,thesemustbescalars.Thethirdparameterisoptionalanddefinesthenumberofsamplestobe generated between the starting and ending points, this value must be aninteger.Additionalparametersforthelinspaceutilitycanbefoundhere:

Nowwewillusethesineandcosinefunctionsinordertogenerateyvalues,forthiswewill use the values of x for the argument of both our sine and cosinefunctionsi.e.cos(x).

Youcandisplay thevaluesof the threeparameterswehavedefinedby typingtheminapythonshell.

Havingdefinedxandyvalueswecangeneratealineplotandsinceweimportedmatplotlib.pyplotaspltwesimplyuseplt.plot.

Wecandisplaytheplotusingplt.show()whichwillpopupafiguredisplayingtheplotdefined.

Additionallywecanaddthesinelinetooutlinegraphbyenteringthefollowing.

Invoking plt.show() now will show a figure with both sine and cosine linesdisplayed.Nowthatwehaveafiguregenerateditwouldbeusefultolabelthex

importnumpyasnp

importmatplotlib.pyplotasplt

x=np.linspace(-np.pi,np.pi,16)

cos=np.cos(x)

sin=np.sin(x)

x

array([-3.14159265,-2.72271363,-2.30383461,-1.88495559,-1.46607657,

-1.04719755,-0.62831853,-0.20943951,0.20943951,0.62831853,

1.04719755,1.46607657,1.88495559,2.30383461,2.72271363,

3.14159265])

plt.plot(x,cos)

plt.show()

plt.plot(x,sin)

Page 94: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

andyaxisandprovideatitle.Thisisdonebythefollowingthreecommands:

Alongwithaxislabelsandatitleanotherusefulfigurefeaturemaybealegend.Inordertocreatealegendyoumustfirstdesignatealabelfortheline,thislabelwill be what shows up in the legend. The label is defined in the initialplt.plot(x,y)instance,nextisanexample.

Theninordertodisplaythelegendthefollowingcommandisissued:

Thelocationisspecifiedbyusingupperorlowerandleftorright.Naturallyallthesecommandscanbecombinedandput ina filewith the .pyextensionandrunfromthecommandline.

�linkerror

Anexampleofabarchartisprecedednextusingdatafrom[T:fast-cars].

plt.xlabel("X-label(units)")

plt.ylabel("Y-label(units)")

plt.title("AcleverTitleforyourFigure")

plt.plot(x,cos,label="cosine")

plt.legend(loc='upperright')

importnumpyasnp

importmatplotlib.pyplotasplt

x=np.linspace(-np.pi,np.pi,16)

cos=np.cos(x)

sin=np.sin(x)

plt.plot(x,cos,label="cosine")

plt.plot(x,sin,label="sine")

plt.xlabel("X-label(units)")

plt.ylabel("Y-label(units)")

plt.title("AcleverTitleforyourFigure")

plt.legend(loc='upperright')

plt.show()

importmatplotlib.pyplotasplt

x=['ToyotaPrius',

'TeslaRoadster',

'BugattiVeyron',

'HondaCivic',

'LamborghiniAventador']

horse_power=[120,288,1200,158,695]

x_pos=[ifori,_inenumerate(x)]

plt.bar(x_pos,horse_power,color='green')

plt.xlabel("CarModel")

plt.ylabel("HorsePower(Hp)")

plt.title("HorsePowerforSelectedCars")

Page 95: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

You can customize plots further by using plt.style.use(), in python 3. If youprovide thefollowingcommandinsideapythoncommandshellyouwillseealistofavailablestyles.

Anexampleofusingapredefinedstyleisshownnext.

Uptothispointwehaveonlyshowcasedhowtodisplayfiguresthroughpythonoutput, however web browsers are a popular way to display figures. OneexampleisBokeh, thefollowinglinescanbeenteredinapythonshellandthefigureisoutputtedtoabrowser.

7.4DOCOPTS☁�Whenwewanttodesigncommandlineargumentsforpythonprogramswehavemany options. However, as our approach is to create documentation first,docoptsprovidesalsoagoodapprachforPython.Thecodeforitislocatedat

https://github.com/docopt/docopt

Itcanbeinstalledwith

Asampleprogramsarelocatedat

https://github.com/docopt/docopt/blob/master/examples/options_example.py

Asampleprogramofusingdocoptsforourpurposesloksasfollows

plt.xticks(x_pos,x)

plt.show()

print(plt.style.available)

plt.style.use('seaborn')

frombokeh.ioimportshow

frombokeh.plottingimportfigure

x_values=[1,2,3,4,5]

y_values=[6,7,2,3,6]

p=figure()

p.circle(x=x_values,y=y_values)

show(p)

$pipinstalldocopt

Page 96: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Another good feature of using docopts is that we can use the same verbaldescriptioninotherprogramminglanguagesasshowcasedinthisbook.

7.5OPENCV☁�

LearningObjectives

Providesomesimplecalculationssowecantestcloudservices.ShowcasesomeelementaryOpenCVfunctionsShowanenvironmentalimageanalysisapplicationusingSecchidisks

OpenCV (OpenSourceComputerVisionLibrary) is a library of thousands ofalgorithmsforvariousapplicationsincomputervisionandmachinelearning.Ithas C++, C, Python, Java and MATLAB interfaces and supports Windows,Linux,AndroidandMacOS. In this section,wewill explainbasic featuresofthislibrary,includingtheimplementationofasimpleexample.

7.5.1Overview

OpenCVhascountlessfunctionsforimageandvideosprocessing.Thepipelinestarts with reading the images, low-level operations on pixel values,preprocessinge.g.denoising,andthenmultiplestepsofhigher-leveloperations

"""CloudmeshVMmanagement

Usage:

cm-govmstartNAME[--cloud=CLOUD]

cm-govmstopNAME[--cloud=CLOUD]

cm-goset--cloud=CLOUD

cm-go-h|--help

cm-go--version

Options:

-h--helpShowthisscreen.

--versionShowversion.

--cloud=CLOUDThenameofthecloud.

--mooredMoored(anchored)mine.

--driftingDriftingmine.

ARGUMENTS:

NAMEThenameoftheVM`

"""

fromdocoptimportdocopt

if__name__=='__main__':

arguments=docopt(__doc__,version='1.0.0rc2')

print(arguments)

Page 97: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

which vary depending on the application.OpenCV covers thewhole pipeline,especiallyprovidingalargesetoflibraryfunctionsforhigh-leveloperations.Asimpler library for image processing in Python is Scipy’s multi-dimensionalimageprocessingpackage(scipy.ndimage).

7.5.2Installation

OpenCV for Python can be installed on Linux in multiple ways, namelyPyPI(Python Package Index), Linux package manager (apt-get for Ubuntu),Condapackagemanager,andalsobuildingfromsource.YouarerecommendedtousePyPI.Here’sthecommandthatyouneedtorun:

ThiswastestedonUbuntu16.04withafreshPython3.6virtualenvironment.Inordertotest,importthemoduleinPythoncommandline:

If itdoesnotraiseanerror, it is installedcorrectly.Otherwise, try tosolvetheerror.

ForinstallationonWindows,see:

https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_setup/py_setup_in_windows/py_setup_in_windows.html#install-opencv-python-in-windows

NotethatbuildingfromsourcecantakealongtimeandmaynotbefeasiblefordeployingtolimitedplatformssuchasRaspberryPi.

7.5.3ASimpleExample

Inthisexample,animageisloaded.Asimpleprocessingisperformed,andtheresultiswrittentoanewimage.

7.5.3.1Loadinganimage

$pipinstallopencv-python

importcv2

%matplotlibinline

importcv2

Page 98: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

TheimagewasdownloadedfromUSCstandarddatabase:

http://sipi.usc.edu/database/database.php?volume=misc&image=9

7.5.3.2Displayingtheimage

The image is saved in anumpyarray.Eachpixel is representedwith3values(R,G,B).Thisprovidesyouwithaccesstomanipulatetheimageatthelevelofsingle pixels. You can display the image using imshow function as well asMatplotlib’simshowfunction.

Youcandisplaytheimageusingimshowfunction:

oryoucanuseMatplotlib.IfyouhavenotinstalledMatplotlibbefore,installitusing:

Nowyoucanuse:

whichresultsinFigure1

Figure1:Imagedisplay

img=cv2.imread('images/opencv/4.2.01.tiff')

cv2.imshow('Original',img)

cv2.waitKey(0)

cv2.destroyAllWindows()

$pipinstallmatplotlib

importmatplotlib.pyplotasplt

plt.imshow(img)

Page 99: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

7.5.3.3ScalingandRotation

Scaling(resizing)theimagerelativetodifferentaxis

whichresultsinFigure2

Figure2:Scalingandrotation

Rotationoftheimageforanangleoft

whichresultsinFigure3

res=cv2.resize(img,

None,

fx=1.2,

fy=0.7,

interpolation=cv2.INTER_CUBIC)

plt.imshow(res)

rows,cols,_=img.shape

t=45

M=cv2.getRotationMatrix2D((cols/2,rows/2),t,1)

dst=cv2.warpAffine(img,M,(cols,rows))

plt.imshow(dst)

Page 100: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure3:image

7.5.3.4Gray-scaling

whichresultsin+Figure4

Figure4:Graysacling

7.5.3.5ImageThresholding

whichresultsinFigure5

img2=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

plt.imshow(img2,cmap='gray')

ret,thresh=cv2.threshold(img2,127,255,cv2.THRESH_BINARY)

plt.subplot(1,2,1),plt.imshow(img2,cmap='gray')

plt.subplot(1,2,2),plt.imshow(thresh,cmap='gray')

Page 101: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure5:ImageThresholding

7.5.3.6EdgeDetection

EdgedetectionusingCannyedgedetectionalgorithm

whichresultsinFigure6

Figure6:Edgedetection

7.5.4AdditionalFeatures

OpenCV has implementations of many machine learning techniques such asKMeansandSupportVectorMachines,thatcanbeputintousewithonlyafewlines of code. It also has functions especially for video analysis, featuredetection,objectrecognitionandmanymore.Youcanfindoutmoreaboutthemintheirwebsite

[OpenCV](https://docs.opencv.org/3.0-beta/index.html was initially developed

edges=cv2.Canny(img2,100,200)

plt.subplot(121),plt.imshow(img2,cmap='gray')

plt.subplot(122),plt.imshow(edges,cmap='gray')

Page 102: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

for C++ and still has a focus on that language, but it is still one of themostvaluableimageprocessinglibrariesinPython.

7.6SECCHIDISK☁�Wearedevelopinganautonomousrobotboatthatyoucanbepartofdevelopingwithinthisclass.Therobotbotisactuallymeasuringturbidityorwaterclarity.TraditionallythishasbeendonewithaSecchidisk.TheuseoftheSecchidiskisasfollows:

1. LowertheSecchidiskintothewater.2. Measurethepointwhenyoucannolongerseeit3. Recordthedepthatvariouslevelsandplotinageographical3Dmap

One of the thingswe can do is take a video of themeasurement instead of ahumanrecordingthem.Thanwecananalysethevideoautomaticallytoseehowdeep a diskwas lowered.This is a classical image analysis program.You areencouragedtoidentifyalgorithmsthatcanidentifythedepth.Themostsimplestseemstobetodoahistogramatavarietyofdepthsteps,andmeasurewhenthehistogramno longerchangessignificantly.Thedepthat that imagewillbe themeasurementwelookfor.

Thus ifwe analyse the imageswe need to look at the image and identify thenumbersonthemeasuringtape,aswellasthevisibilityofthedisk.

To show case how such a disk looks like we refer to the image showcasingdifferent Secchi disks. For our purpose the black-white contrast Secchi diskworkswell.SeeFigure7

Page 103: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure 7: Secchi disk types. A marine style on the left and thefreshwaterversionontherightwikipedia.

MoreinformationaboutSecchiDiskcanbefoundat:

https://en.wikipedia.org/wiki/Secchi/_disk

WehaveincludednextacoupleofexampleswhileusingsomeobviouslyusefulOpenCVmethods.Surprisingly,theuseoftheedgedetectionthatcomesinmindfirst to identify if we still can see the disk, seems to complicated to use foranalysis.Weatthistimebelievethehistogramwillbesufficient.

Pleaseinspectourexamples.

7.6.1SetupforOSX

First lest setup theOpenCVenvironment forOSX.Naturallyyouwillhave toupdatetheversionsbasedonyourversionsofpython.Whenwetriedtheinstallof OpenCV on MacOS, the setup was slightly more complex than otherpackages. This may have changed by now and if you have improvedinstructions, pleas elt us know. However we do not want to install it viaAnacondaoutoftheobviousreasonthatanacondainstallstomanyotherthings.importos,sys

fromos.pathimportexpanduser

Page 104: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

7.6.2Step1:Recordthevideo

Recordthevideoontherobot

Wehaveactuallydonethisforyouandwillprovideyouwithimagesandvideosifyouareinterestedinanalyzingthem.SeeFigure8

7.6.3Step2:AnalysetheimagesfromtheVideo

Fornowwejustselected4imagesfromthevideo

os.path

home=expanduser("~")

sys.path.append('/usr/local/Cellar/opencv/3.3.1_1/lib/python3.6/site-packages/')

sys.path.append(home+'/.pyenv/versions/OPENCV/lib/python3.6/site-packages/')

importcv2

cv2.__version__

!pipinstallnumpy>tmp.log

!pipinstallmatplotlib>>tmp.log

%matplotlibinline

importcv2

importmatplotlib.pyplotasplt

img1=cv2.imread('secchi/secchi1.png')

img2=cv2.imread('secchi/secchi2.png')

img3=cv2.imread('secchi/secchi3.png')

img4=cv2.imread('secchi/secchi4.png')

figures=[]

fig=plt.figure(figsize=(18,16))

foriinrange(1,13):

figures.append(fig.add_subplot(4,3,i))

count=0

forimgin[img1,img2,img3,img4]:

figures[count].imshow(img)

color=('b','g','r')

fori,colinenumerate(color):

histr=cv2.calcHist([img],[i],None,[256],[0,256])

figures[count+1].plot(histr,color=col)

figures[count+2].hist(img.ravel(),256,[0,256])

count+=3

print("Legend")

print("Firstcolumn=imageofSecchidisk")

print("Secondcolumn=histogramofcolorsinimage")

print("Thirdcolumn=histogramofallvalues")

plt.show()

Page 105: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure8:Histogram

7.6.3.1ImageThresholding

SeeFigure9,Figure10,Figure11,Figure12defthreshold(img):

ret,thresh=cv2.threshold(img,150,255,cv2.THRESH_BINARY)

plt.subplot(1,2,1),plt.imshow(img,cmap='gray')

plt.subplot(1,2,2),plt.imshow(thresh,cmap='gray')

threshold(img1)

Page 106: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure9:Threshold1

Figure10:Threshold2

Figure11:Threshold3

Figure12:Threshold4

7.6.3.2EdgeDetection

threshold(img2)

threshold(img3)

threshold(img4)

Page 107: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

SeeFigure13,Figure14,Figure15,Figure16,Figure17.EdgedetectionusingCannyedgedetectionalgorithm

Figure13:EdgeDetection1

Figure14:EdgeDetection2

Figure15:EdgeDetection3

deffind_edge(img):

edges=cv2.Canny(img,50,200)

plt.subplot(121),plt.imshow(img,cmap='gray')

plt.subplot(122),plt.imshow(edges,cmap='gray')

find_edge(img1)

find_edge(img2)

find_edge(img3)

find_edge(img4)

Page 108: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure16:EdgeDetection4

7.6.3.3Blackandwhite

Figure17:BackWhiteconversion

bw1=cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)

plt.imshow(bw1,cmap='gray')

Page 109: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

8DATA

8.1DATAFORMATS☁�

8.1.1YAML

ThetermYAMLstandfor“YAMLAinotMarkupLanguage”.AccordingtotheWebPageat

http://yaml.org/

“YAML is a human friendly data serialization standard for all programminglanguages.”TherearemultipleversionsofYAMLexistingandoneneedstotakecare of that your software supports the right version. The current version isYAML1.2.

YAML is oftenused for configuration and inmany cases can alsobeused asXMLreplacement.ImportantistatYAMincontrasttoXMLremovesthetagswhilereplacingthemwithindentation.Thishasnaturallytheadvantagethatitismoreasily to read,however, the format is strictandneeds toadhere toproperindentation. Thus it is important that you check your YAML files forcorrectness,eitherbywritingforexampleapythonprogramthatreadyouryamlfile,oranonlineYAMLcheckersuchasprovidedat

http://www.yamllint.com/

An example on how to use yaml in python is provided in our next example.PleasenotethatYAMLisasupersetofJSON.OriginallyYAMLwasdesignedasamarkuplanguage.Howeverasitisnotdocumentorientedbutdataorientedithasbeenrecastanditdoesnolongerclassifyitselfasmarkuplanguage.importos

importsys

importyaml

try:

yamlFilename=os.sys.argv[1]

yamlFile=open(yamlFilename,"r")

except:

print("filenamedoesnotexist")

sys.exit()

try:

Page 110: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Resources:

http://yaml.org/https://en.wikipedia.org/wiki/YAMLhttp://www.yamllint.com/

8.1.2JSON

ThetermJSONstandforJavaScriptObjectNotation.Itistargetedasanopen-standard file format that emphasizes on integration of human-readable text totransmitdataobjects.Thedataobjectscontainattributevaluepairs.Althoughitoriginates from JavaScript, the format itself is language independent. It usesbracketstoalloworganizationofthedata.PLeasenotethatYAMLisasupersetofJSONandnotallYAMLdocumentscanbeconvertedtoJSON.FurthermoreJSONdoesnotsupportcomments.ForthesereasonsweoftenprefertousYAMlinsteadofJSON.HoweverJSONdatacaneasilybetranslatedtoYAMLaswellasXML.

Resources:

https://en.wikipedia.org/wiki/JSONhttps://www.json.org/

8.1.3XML

XMLstandsforExtensibleMarkupLanguage.XMLallowstodefinedocumentswith the help of a set of rules in order to make it machine readable. Theemphasize here is on machine readable as document in XML can becomequickly complex and difficult to understand for humans. XML is used fordocumentsaswellasdatastructures.

AtutorialaboutXMLisavailableat

https://www.w3schools.com/xml/default.asp

Resources:

yaml.load(yamlFile.read())

except:

print("YAMLfileisnotvalid.")

Page 111: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

https://en.wikipedia.org/wiki/XML

Page 112: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9MONGO

9.1MONGODBINPYTHON☁�

LearningObjectives

IntroductiontobasicMongoDBknowledgeUseofMongoDBviaPyMongoUseofMongoEngineMongoEngineandObject-Documentmapper,UseofFlask-Mongo

In today’s era, NoSQL databases have developed an enormous potential toprocess the unstructured data efficiently. Modern information is complex,extensive, andmaynothavepre-existing relationships.With the adventof theadvanced search engines, machine learning, and Artificial Intelligence,technology expectations to process, store, and analyze such data have growntremendously[2].TheNoSQLdatabaseenginessuchasMongoDB,Redis,andCassandra have successfully overcome the traditional relational databasechallenges such as scalability, performance, unstructured data growth, agilesprint cycles, andgrowingneeds of processingdata in real-timewithminimalhardwareprocessingpower[3].TheNoSQLdatabasesareanewgenerationofengines that do not necessarily require SQL language and are sometimes alsocalledNotOnlySQL databases.However,mostof themsupportvarious third-partyopenconnectivitydriversthatcanmapNoSQLqueriestoSQL’s.Itwouldbe safe to say that althoughNoSQL databases are still far from replacing therelationaldatabases,theyareaddinganimmensevaluewhenusedinhybridITenvironmentsinconjunctionwithrelationaldatabases,basedontheapplicationspecific needs [3].We will be covering theMongoDB technology, its driverPyMongo, itsobject-documentmapperMongoEngine, and theFlask-PyMongomicro-webframeworkthatmakeMongoDBmoreattractiveanduser-friendly.

9.1.1CloudmeshMongoDBUsageQuickstart

Page 113: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Beforeyoureadonwelikeyoutoreadthisquickstart.Theeasiestwayformanyof the activities we do to interact with MongoDB is to use our cloudmeshfunctionality.Thispreludesectionisnotintendedtodescribeallthedetails,butgetyoustartedquicklywhileleveragingcloudmesh

Thisisdoneviathecloudmeshcmd5andthecloudmesh_community/cmcode:

https://cloudmesh-community.github.io/cm/

ToinstallmongoonforexamplemacOSyoucanuse

Tostart,stopandseethestatusofmongoyoucanuse

To add anobject toMongo, you simplyhave to define a dictwithpredefinedvaluesforkindandcloud.InfuturesuchattributescanbepassedtothefunctiontodeterminetheMongoDBcollection.

When you invoke the function itwill automatically store the information intoMongoDB. Naturally this requires that the ~/.cloudmesh/cloudmesh.yaml file is properlyconfigured.

9.1.2MongoDB

TodayMongoDB isoneof leadingNoSQLdatabasewhich is fullycapableofhandling dynamic changes, processing large volumes of complex andunstructureddata,easilyusingobject-orientedprogrammingfeatures;aswellasdistributed system challenges [4]. At its core, MongoDB is an open source,cross-platform,documentdatabasemainlywritteninC++language.

$cmsadminmongoinstall

$cmsadminmongostart

$cmsadminmongostop

$cmsadminmongostatus

fromcloudmesh.mongo.DataBaseDecoratorimportDatabaseUpdate

@DatabaseUpdate

deftest():

data={

"kind":"test",

"cloud":"testcloud",

"value":"hello"

}

returndata

result=test()

Page 114: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9.1.2.1Installation

MongoDBcanbeinstalledonvariousUnixPlatforms,includingLinux,Ubuntu,AmazonLinux,etc[5].ThissectionfocusesoninstallingMongoDBonUbuntu18.04BionicBeaverusedasastandardOSforavirtualmachineusedasapartofBigDataApplicationClassduringthe2018Fallsemester.

9.1.2.1.1Installationprocedure

Beforeinstalling,itisrecommendedtoconfigurethenon-rootuserandprovidethe administrative privileges to it, in order to be able to perform generalMongoDBadmin tasks.Thiscanbeaccomplishedby loginas the rootuser inthefollowingmanner[6].

When logged in as a regular user, one can perform actions with superuserprivilegesbytypingsudobeforeeachcommand[6].

Oncetheusersetupiscompleted,onecanloginasaregularuser(mongoadmin)andusethefollowinginstructionstoinstallMongoDB.

To update the Ubuntu packages to the most recent versions, use the nextcommand:

ToinstalltheMongoDBpackage:

Tochecktheserviceanddatabasestatus:

VerifyingthestatusofasuccessfulMongoDBinstallationcanbeconfirmedwithanoutputsimilartothis:

$addusermongoadmin

$usermod-aGsudosammy

$sudoaptupdate

$sudoaptinstall-ymongodb

$sudosystemctlstatusmongodb

$mongodb.service-Anobject/document-orienteddatabase

Loaded:loaded(/lib/systemd/system/mongodb.service;enabled;vendorpreset:enabled)

Active:**active**(running)sinceSat2018-11-1507:48:04UTC;2min17sago

Docs:man:mongod(1)

MainPID:2312(mongod)

Tasks:23(limit:1153)

Page 115: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Toverify theconfiguration,morespecifically the installedversion, server,andport,usethefollowingcommand:

Similarly,torestartMongoDB,usethefollowing:

To allow access toMongoDB from an outside hosted server one can use thefollowingcommandwhichopensthefire-wallconnections[5].

Statuscanbeverifiedbyusing:

Other MongoDB configurations can be edited through the /etc/mongodb.conffilessuchasportandhostnames,filepaths.

Also, to complete this step, a server’s IPaddressmustbe added to thebindIPvalue[5].

MongoDB is now listening for a remote connection that can be accessed byanyonewithappropriatecredentials[5].

9.1.2.2CollectionsandDocuments

Each database within Mongo environment contains collections which in turncontaindocuments.Collectionsanddocumentsareanalogoustotablesandrowsrespectivelytotherelationaldatabases.Thedocumentstructureisinakey-valueform which allows storing of complex data types composed out of field andvalue pairs. Documents are objects which correspond to native data types inmanyprogramming languages, hence awell defined, embeddeddocument can

CGroup:/system.slice/mongodb.service

└─2312/usr/bin/mongod--unixSocketPrefix=/run/mongodb--config/etc/mongodb.conf

$mongo--eval'db.runCommand({connectionStatus:1})'

$sudosystemctlrestartmongodb

$sudoufwallowfromyour_other_server_ip/32toanyport27017

$sudoufwstatus

$sudonano/etc/mongodb.conf

$logappend=true

bind_ip=127.0.0.1,your_server_ip

*port=27017*

Page 116: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

helpreduceexpensivejoinsandimprovequeryperformance.The_idfieldhelpstoidentifyeachdocumentuniquely[3].

MongoDB offers flexibility towrite records that are not restricted by columntypes.Thedata storageapproach is flexibleas it allowsone to storedataas itgrowsandtofulfillvaryingneedsofapplicationsand/orusers.ItsupportsJSONlikebinarypointsknownasBSONwheredatacanbestoredwithoutspecifyingthe type of data.Moreover, it can be distributed tomultiplemachines at highspeed. It includes a sharding feature that partitions and spreads the data outacrossvariousservers.ThismakesMongoDBanexcellentchoiceforclouddataprocessing. Its utilities can load high volumes of data at high speed whichultimately provides greater flexibility and availability in a cloud-basedenvironment[2].

ThedynamicschemastructurewithinMongoDBallowseasytestingofthesmallsprints in theAgile projectmanagement life cycles and research projects thatrequirefrequentchangestothedatastructurewithminimaldowntime.Contrarytothisflexibleprocess,modifyingthedatastructureofrelationaldatabasescanbeaverytediousprocess[2].

9.1.2.2.1Collectionexample

ThefollowingcollectionexampleforapersonnamedAlbertincludesadditionalinformationsuchasage,status,andgroup[7].

9.1.2.2.2Documentstructure

9.1.2.2.3CollectionOperations

If collection does not exists, MongoDB database will create a collection by

{

name:"Albert"

age:"21"

status:"Open"

group:["AI","MachineLearning"]

}

{

field1:value1,

field2:value2,

field3:value3,

...

fieldN:valueN

}

Page 117: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

default.

9.1.2.3MongoDBQuerying

Thedataretrievalpatterns, thefrequencyofdatamanipulationstatementssuchas insert, updates, and deletes may demand for the use of indexes orincorporatingtheshardingfeaturetoimprovequeryperformanceandefficiencyof MongoDB environment [3]. One of the significant difference betweenrelationaldatabasesandNoSQLdatabasesare joins. In the relationaldatabase,onecancombineresultsfromtwoormoretablesusingacommoncolumn,oftencalled as key. The native table contains the primary key column while thereferenced table contains a foreign key. This mechanism allows one to makechangesinasinglerowinsteadofchangingallrowsinthereferencedtable.Thisaction is referred to as normalization.MongoDB is a document database andmainlycontainsdenormalizeddatawhichmeansthedataisrepeatedinsteadofindexedoveraspecifickey.Ifthesamedataisrequiredinmorethanonetable,itneedstoberepeated.ThisconstrainthasbeeneliminatedinMongoDB’snewversion 3.2. The new release introduced a $lookup feature whichmore likelyworksasaleft-outer-join.Lookupsarerestrictedtoaggregatedfunctionswhichmeansthatdatausuallyneedsometypeoffilteringandgroupingoperations tobe conducted beforehand. For this reason, joins in MongoDB require morecomplicated querying compared to the traditional relational database joins.Although at this time, lookups are still very far from replacing joins, this is aprominent feature that can resolve some of the relational data challenges forMongoDB[8].MongoDBqueriessupport regularexpressionsaswellas rangeasksforspecificfieldsthateliminatetheneedofreturningentiredocuments[3].MongoDB collections do not enforce document structure like SQL databaseswhichisacompellingfeature.However,itisessentialtokeepinmindtheneedsoftheapplications[2].

9.1.2.3.1MongoQueriesexamples

ThequeriescanbeexecutedfromMongoshellaswellasthroughscripts.

To query the data from a MongoDB collection, one would use MongoDB’s

>db.myNewCollection1.insertOne({x:1})

>db.myNewCollection2.createIndex({y:1})

Page 118: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

find()method.

Theoutputcanbeformattedbyusingthepretty()command.

TheMongoDBinsertstatementscanbeperformedinthefollowingmanner:

“The$lookup command performs a left-outer-join to an unshardedcollectioninthesamedatabasetofilterindocumentsfromthejoinedcollectionforprocessing”[9].

ThisoperationisequivalenttothefollowingSQLoperation:

ToperformaLikeMatch(Regex),onewouldusethefollowingcommand:

9.1.2.4MongoDBBasicFunctions

WhenitcomestothetechnicalelementsofMongoDB,itpossesarichinterfacefor importing and storage of external data in various formats. By using theMongoImport/Exporttool,onecaneasilytransfercontentsfromJSON,CSV,orTSV files into a database. MongoDB supports CRUD (create, read, update,delete) operations efficiently and has detailed documentation available on theproductwebsite.Itcanalsoquerythegeospatialdata,anditiscapableofstoringgeospatial data in GeoJSON objects. The aggregation operation of theMongoDB process data records and returns computed results. MongoDB

>db.COLLECTION_NAME.find()

>db.mycol.find().pretty()

>db.COLLECTION_NAME.insert(document)

${

$lookup:

{

from:<collectiontojoin>,

localField:<fieldfromtheinputdocuments>,

foreignField:<fieldfromthedocumentsofthe"from"collection>,

as:<outputarrayfield>

}

}

$SELECT*,<outputarrayfield>

FROMcollection

WHERE<outputarrayfield>IN(SELECT*

FROM<collectiontojoin>

WHERE<foreignField>=<collection.localField>);`

>db.products.find({sku:{$regex:/789$/}})

Page 119: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

aggregationframeworkismodeledontheconceptofdatapipelines[10].

9.1.2.4.1Import/Exportfunctionsexamples

ToimportJSONdocuments,onewouldusethefollowingcommand:

The CSV import uses the input file name to import a collection, hence, thecollectionnameisoptional[10].

“Mongoexport is a utility that produces a JSON or CSV export ofdatastoredinaMongoDBinstance”[10].

9.1.2.5SecurityFeatures

DatasecurityisacrucialaspectoftheenterpriseinfrastructuremanagementandisthereasonwhyMongoDBprovidesvarioussecurityfeaturessuchasolebasedaccess control, numerous authentication options, and encryption. It supportsmechanisms such as SCRAM, LDAP, and Kerberos authentication. Theadministrator can create role/collection-based access control; also roles can bepredefined or custom. MongoDB can audit activities such as DDL, CRUDstatements,authenticationandauthorizationoperations[11].

9.1.2.5.1Collectionbasedaccesscontrolexample

Auserdefinedrolecancontainthefollowingprivileges[11].

9.1.2.6MongoDBCloudService

In regards to the cloud technologies, MongoDB also offers fully automatedcloudservicecalledAtlaswithcompetitivepricingoptions.MongoAtlasCloudinterface offers interactive GUI for managing cloud resources and deploying

$mongoimport--dbusers--collectioncontacts--filecontacts.json

$mongoimport--dbusers--typecsv--headerline--file/opt/backups/contacts.csv

$mongoexport--dbtest--collectiontraffic--outtraffic.json

$privileges:[

{resource:{db:"products",collection:"inventory"},actions:["find","update"]},

{resource:{db:"products",collection:"orders"},actions:["find"]}

]

Page 120: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

applications quickly. The service is equipped with geographically distributedinstances to ensure no single point failure. Also, a well-rounded performancemonitoring interface allows users to promptly detect anomalies and generateindex suggestions to optimize the performance and reliability of the database.Global technology leaders such as Google, Facebook, eBay, and Nokia areleveragingMongoDB andAtlas cloud services makingMongoDB one of themostpopularchoicesamongtheNoSQLdatabases[12].

9.1.3PyMongo

PyMongo is the official Python driver or distribution that allowsworkwith aNoSQLtypedatabasecalledMongoDB[13].Thefirstversionofthedriverwasdevelopedin2009[14],onlytwoyearsafterthedevelopmentofMongoDBwasstarted.Thisdriverallowsdevelopers tocombinebothPython’sversatilityandMongoDB’sflexibleschemanatureintosuccessfulapplications.Currently,thisdriver supports MongoDB versions 2.6, 3.0, 3.2, 3.4, 3.6, and 4.0 [15].MongoDBandPythonrepresentacompatiblefitconsideringthatBSON(binaryJSON) used in this NoSQL database is very similar to Python dictionaries,whichmakes thecollaborationbetweenthe twoevenmoreappealing[16].Forthisreason,dictionariesaretherecommendedtoolstobeusedinPyMongowhenrepresentingdocuments[17].

9.1.3.1Installation

Prior to being able to exploit the benefits of Python and MongoDBsimultaneously,thePyMongodistributionmustbeinstalledusingpip.Toinstallitonallplatforms,thefollowingcommandshouldbeused[18]:$python-mpipinstallpymongo

SpecificversionsofPyMongocanbe installedwithcommand lines suchas inourexamplewherethe3.5.1versionisinstalled[18].

Asinglelineofcodecanbeusedtoupgradethedriveraswell[18].

$python-mpipinstallpymongo==3.5.1

$python-mpipinstall--upgradepymongo

Page 121: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Furthermore, the installation process can be completed with the help of theeasy_installtool,whichrequiresuserstousethefollowingcommand[18].

To do an upgrade of the driver using this tool, the following command isrecommended[18]:

There are many other ways of installing PyMongo directly from the source,however, theyrequireforCextensiondependencies tobe installedprior to thedriver installation step, as they are the ones that skim through the sources onGitHubandusethemostup-to-datelinkstoinstallthedriver[18].

Tocheckiftheinstallationwascompletedaccurately,thefollowingcommandisusedinthePythonconsole[19].

If the command returns zero exceptions within the Python shell, one canconsiderforthePyMongoinstallationtohavebeencompletedsuccessfully.

9.1.3.2Dependencies

The PyMongo driver has a few dependencies that should be taken intoconsiderationpriortoitsusage.Currently,itsupportsCPython2.7,3.4+,PyPy,and PyPy 3.5+ interpreters [15]. An optional dependency that requires someadditionalcomponentstobeinstalledistheGSSAPIauthentication[15].FortheUnixbasedmachines, it requirespykerberos,while for theWindowsmachinesWinKerberos is needed to fullfill this requirement [15]. The automaticinstallation of this dependency can be done simultaneously with the driverinstallation,inthefollowingmanner:

Other third-party dependencies such as ipaddress, certifi, or wincerstore arenecessaryforconnectionswithhelpofTLS/SSLandcanalsobesimultaneouslyinstalledalongwiththedriverinstallation[15].

$python-measy_installpymongo

$python-measy_install-Upymongo

importpymongo

$python-mpipinstallpymongo[gssapi]

Page 122: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9.1.3.3RunningPyMongowithMongoDeamon

OncePyMongois installed, theMongodeamoncanberunwithaverysimplecommandinanewterminalwindow[19].

9.1.3.4ConnectingtoadatabaseusingMongoClient

In order to be able to establish a connectionwith a database, aMongoClientclass needs to be imported, which sub-sequentially allows the MongoClientobjecttocommunicatewiththedatabase[19].

Thiscommandallowsaconnectionwithadefault,localhostthroughport27017,however, depending on the programming requirements, one can also specifythosebylistingthemintheclient instanceorusethesameinformationviatheMongoURIformat[19].

9.1.3.5AccessingDatabases

Since MongoClient plays a server role, it can be used to access any desireddatabasesinaneasyway.Todothat,onecanusetwodifferentapproaches.Thefirstapproachwouldbedoingthisvia theattributemethodwhere thenameofthe desired database is listed as an attribute, and the second approach, whichwouldincludeadictionary-styleaccess[19].Forexample,toaccessadatabasecalled cloudmesh_community, onewould use the following commands for theattributeandforthedictionarymethod,respectively.

9.1.3.6CreatingaDatabase

Creating a database is a straight forward process. First, one must create aMongoClientobjectandspecifytheconnection(IPaddress)aswellasthenameofthedatabasetheyaretryingtocreate[20].Theexampleof thiscommandispresentedinthefollowngsection:

$mongod

frompymongoimportMongoClient

client=MongoClient()

db=client.cloudmesh_community

db=client['cloudmesh_community']

Page 123: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9.1.3.7InsertingandRetrievingDocuments(Querying)

Creating documents and storing data using PyMongo is equally easy asaccessingandcreatingdatabases.Inordertoaddnewdata,acollectionmustbespecifiedfirst.Inthisexample,adecisionismadetousethecloudmeshgroupofdocuments.

Oncethisstepiscompleted,datamaybeinsertedusingtheinsert_one()method,whichmeans that only one document is being created.Of course, insertion ofmultiple documents at the same time is possible as well with use of theinsert_many()method[19].Anexampleofthismethodisasfollows:

Anotherexampleofthismethodwouldbetocreateacollection.Ifwewantedtocreateacollectionofstudents in thecloudmesh_community,wewoulddo it inthefollowingmanner:

Retrievingdocumentsisequallysimpleascreatingthem.Thefind_one()methodcanbeusedtoretrieveonedocument[19].Animplementationofthismethodisgiveninthefollowingexample.

Similarly, to retieve multiple documents, one would use the find() method

importpymongo

client=pymongo.MongoClient('mongodb://localhost:27017/')

db=client['cloudmesh']

cloudmesh=db.cloudmesh

course_info={

'course':'BigDataApplicationsandAnalytics',

'instructor':'GregorvonLaszewski',

'chapter':'technologies'

}

result=cloudmesh.insert_one(course_info)`

student=[{'name':'John','st_id':52642},

{'name':'Mercedes','st_id':5717},

{'name':'Anna','st_id':5654},

{'name':'Greg','st_id':5423},

{'name':'Amaya','st_id':3540},

{'name':'Cameron','st_id':2343},

{'name':'Bozer','st_id':4143},

{'name':'Cody','price':2165}]

client=MongoClient('mongodb://localhost:27017/')

withclient:

db=client.cloudmesh

db.students.insert_many(student)

gregors_course=cloudmesh.find_one({'instructor':'GregorvonLaszewski'})

Page 124: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

insteadofthe find_one().Forexample,tofindallcoursesthoughtbyprofessorvonLaszewski,onewouldusethefollowingcommand:

Onethingthatusersshouldbecognizantofwhenusingthefind()methodisthatit doesnot return results in an array formatbut as acursor object,which is acombinationofmethods thatwork together tohelpwithdataquerying[19]. Inordertoreturnindividualdocuments,iterationovertheresultmustbecompleted[19].

9.1.3.8LimitingResults

When itcomes toworkingwith largedatabases it isalwaysuseful to limit thenumberofquery results.PyMongosupports thisoptionwith its limit()method[20]. This method takes in one parameter which specifies the number ofdocumentstobereturned[20].Forexample,ifwehadacollectionwithalargenumber of cloud technologies as individual documents, one couldmodify thequery results to return only the top 10 technologies.To do this, the followingexamplecouldbeutilized:

9.1.3.9UpdatingCollection

Updating documents is very similar to inserting and retrieving the same.Depending on the number of documents to be updated, one would use theupdate_one()orupdate_many()method[20].Twoparametersneedtobepassedintheupdate_one()methodforittosuccessfullyexecute.Thefirstargumentisthe query object that specifies the document to be changed, and the secondargumentistheobjectthatspecifiesthenewvalueinthedocument.Anexampleoftheupdate_one()methodinactionisthefollowing:

Updating all documents that fall under the same criteria can be donewith theupdate_many method [20]. For example, to update all documents in which

gregors_course=cloudmesh.find({'instructor':'GregorvonLaszewski'})

client=pymongo.MongoClient('mongodb://localhost:27017/')

db=client['cloudmesh']

col=db['technologies']

topten=col.find().limit(10)

myquery={'course':'BigDataApplicationsandAnalytics'}

newvalues={'$set':{'course':'CloudComputing'}}

Page 125: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

coursetitlestartswithletterBwithadifferentinstructorinformation,wewoulddothefollowing:

9.1.3.10CountingDocuments

Counting documents can be done with one simple operation calledcount_documents()insteadofusingafullquery[21].Forexample,wecancountthedocumentsinthecloudmesh_commpunitybyusingthefollowingcommand:

Tocreateamorespecificcount,onewoulduseacommandsimilartothis:

Thistechnologysupportssomemoreadvancedqueryingoptionsaswell.Thoseadvanced queries allow one to add certain contraints and narrow down theresults even more. For example, to get the courses thought by professor vonLaszewskiafteracertaindate,onewouldusethefollowingcommand:

9.1.3.11Indexing

Indexing is a very important part of querying. It can greately improve queryperformancebutalsoaddfunctionalityandaideinstoringdocuments[21].

“To create a unique index on a key that rejects documents whosevalueforthatkeyalreadyexistsintheindex”[21].

Weneedtofirstlycreatetheindexinthefollowingmanner:

client=pymongo.MongoClient('mongodb://localhost:27017/')

db=client['cloudmesh']

col=db['courses']

query={'course':{'$regex':'^B'}}

newvalues={'$set':{'instructor':'GregorvonLaszewski'}}

edited=col.update_many(query,newvalues)

cloudmesh=count_documents({})

cloudmesh=count_documents({'author':'vonLaszewski'})

d=datetime.datetime(2017,11,12,12)

forcourseincloudmesh.find({'date':{'$lt':d}}).sort('author'):

pprint.pprint(course)

result=db.profiles.create_index([('user_id',pymongo.ASCENDING)],

unique=True)

sorted(list(db.profiles.index_information()))

Page 126: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Thiscommandacutallycreatestwodifferentindexes.Thefirstoneisthe*_id*,createdbyMongoDBautomatically,andthesecondoneis theuser_id,createdbytheuser.

Thepurposeof those indexes is to cleverlyprevent future additionsof invaliduser_idsintoacollection.

9.1.3.12Sorting

Sortingontheserver-sideisalsoavaialableviaMongoDB.ThePyMongosort()methodisequivalenttotheSQLorderbystatementanditcanbeperformedaspymongo.ascending andpymongo.descending [22].Thismethod ismuchmoreefficient as it is being completed on the server-side, compared to the sortingcompleted on the client side. For example, to return all userswith first nameGregorsortedindescendingorderbybirthdatewewoulduseacommandsuchasthis:

9.1.3.13Aggregation

Aggregationoperationsareusedtoprocessgivendataandproducesummarizedresults. Aggregation operations collect data from a number of documents andprovide collective results by grouping data. PyMongo in its documentationoffers a separate framework that supports data aggregation. This aggregationframeworkcanbeusedto

“provideprojectioncapabilitiestoreshapethereturneddata”[23].

In the aggregation pipeline, documents pass through multiple pipeline stageswhich convert documents into result data. The basic pipeline stages includefilters. Those filters act like document transformation by helping change thedocument output form. Other pipelines help group or sort documents withspecific fields. By using native operations from MongoDB, the pipelineoperatorsareefficientinaggregatingresults.

TheaddFieldsstageisusedtoaddnewfieldsintodocuments.Itreshapeseach

users=cloudmesh.users.find({'firstname':'Gregor'}).sort(('dateofbirth',pymongo.DESCENDING))

foruserinusers:

printuser.get('email')

Page 127: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

document in stream, similarly to the project stage. The output document willcontain existing fields from input documents and the newly added fields 24].Thefollowingexampleshowshowtoaddstudentdetailsintoadocument.

Thebucketstageisusedtocategorizeincomingdocumentsintogroupsbasedonspecified expressions. Those groups are called buckets [24]. The followingexampleshowsthebucketstageinaction.

In the bucketAuto stage, the boundaries are automatically determined in anattempttoevenlydistributedocumentsintoaspecifiednumberofbuckets.Inthefollowingoperation,inputdocumentsaregroupedintofourbucketsaccordingtothevaluesinthepricefield[24].

ThecollStatsstagereturnsstatisticsregardingacollectionorview[24].

Thecount stagepasses a document to thenext stage that contains thenumberdocumentsthatwereinputtothestage[24].

db.cloudmesh_community.aggregate([

{

$addFields:{

"document.StudentDetails":{

$concat:['$document.student.FirstName','$document.student.LastName']

}

}

}])

db.user.aggregate([

{"$group":{

"_id":{

"city":"$city",

"age":{

"$let":{

"vars":{

"age":{"$subtract":[{"$year":newDate()},{"$year":"$birthDay"}]}},

"in":{

"$switch":{

"branches":[

{"case":{"$lt":["$$age",20]},"then":0},

{"case":{"$lt":["$$age",30]},"then":20},

{"case":{"$lt":["$$age",40]},"then":30},

{"case":{"$lt":["$$age",50]},"then":40},

{"case":{"$lt":["$$age",200]},"then":50}

]}}}}},

"count":{"$sum":1}}})

db.artwork.aggregate([

{

$bucketAuto:{

groupBy:"$price",

buckets:4

}

}

])

db.matrices.aggregate([{$collStats:{latencyStats:{histograms:true}}

}])

db.scores.aggregate([{

Page 128: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

The facet stage helps processmultiple aggregation pipelines in a single stage[24].

The geoNear stage returns an ordered stream of documents based on theproximity to a geospatial point. The output documents include an additionaldistancefieldandcanincludealocationidentifierfield[24].

The graphLookup stage performs a recursive search on a collection. To eachoutputdocument, itaddsanewarrayfieldthatcontainsthetraversalresultsoftherecursivesearchforthatdocument[24].

Thegroup stageconsumes thedocumentdatapereachdistinctgroup. IthasaRAM limit of 100MB. If the stage exceeds this limit, thegroup produces anerror[24].

$match:{score:{$gt:80}}},

{$count:"passing_scores"}])

db.artwork.aggregate([{

$facet:{"categorizedByTags":[{$unwind:"$tags"},

{$sortByCount:"$tags"}],"categorizedByPrice":[

//Filteroutdocumentswithoutapricee.g.,_id:7

{$match:{price:{$exists:1}}},

{$bucket:{groupBy:"$price",

boundaries:[0,150,200,300,400],

default:"Other",

output:{"count":{$sum:1},

"titles":{$push:"$title"}

}}}],"categorizedByYears(Auto)":[

{$bucketAuto:{groupBy:"$year",buckets:4}

}]}}])

db.places.aggregate([

{$geoNear:{

near:{type:"Point",coordinates:[-73.99279,40.719296]},

distanceField:"dist.calculated",

maxDistance:2,

query:{type:"public"},

includeLocs:"dist.location",

num:5,

spherical:true

}}])

db.travelers.aggregate([

{

$graphLookup:{

from:"airports",

startWith:"$nearestAirport",

connectFromField:"connects",

connectToField:"airport",

maxDepth:2,

depthField:"numConnections",

as:"destinations"

}

}

])

db.sales.aggregate(

[

{

$group:{

_id:{month:{$month:"$date"},day:{$dayOfMonth:"$date"},

Page 129: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

The indexStats stage returns statistics regarding the use of each index for acollection[24].

The limit stage is used for controlling the number of documents passed to thenextstageinthepipeline[24].

ThelistLocalSessionsstagegivesthesessioninformationcurrentlyconnectedtomongosormongodinstance[24].

ThelistSessionsstagelistsoutallsessionthathavebeenactivelongenoughtopropagatetothesystem.sessionscollection[24].

Thelookupstageisusefulforperformingouterjoinstoothercollectionsinthesamedatabase[24].

Thematchstageisusedtofilterthedocumentstream.Onlymatchingdocumentspasstonextstage[24].

Theproject stage is used to reshape the documents by adding or deleting thefields.

year:{$year:"$date"}},

totalPrice:{$sum:{$multiply:["$price","$quantity"]}},

averageQuantity:{$avg:"$quantity"},

count:{$sum:1}

}

}

]

)

db.orders.aggregate([{$indexStats:{}}])

db.article.aggregate(

{$limit:5}

)

db.aggregate([{$listLocalSessions:{allUsers:true}}])

useconfig

db.system.sessions.aggregate([{$listSessions:{allUsers:true}}])

{

$lookup:

{

from:<collectiontojoin>,

localField:<fieldfromtheinputdocuments>,

foreignField:<fieldfromthedocumentsofthe"from"collection>,

as:<outputarrayfield>

}

}

db.articles.aggregate(

[{$match:{author:"dave"}}]

)

Page 130: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

The redact stage reshapes stream documents by restricting information usinginformationstoredindocumentsthemselves[24].

ThereplaceRootstageisusedtoreplaceadocumentwithaspecifiedembeddeddocument[24].

Thesample stage isused to sampleoutdataby randomlyselectingnumberofdocumentsforminput[24].

Theskipstageskipsspecifiedinitialnumberofdocumentsandpassesremainingdocumentstothepipeline[24].

Thesort stage is usefulwhile reordering document stream by a specified sortkey[24].

The sortByCounts stage groups the incoming documents based on a specifiedexpressionvalueandcountsdocumentsineachdistinctgroup[24].

Theunwindstagedeconstructsanarrayfieldfromtheinputdocumentstooutput

db.books.aggregate([{$project:{title:1,author:1}}])

db.accounts.aggregate(

[

{$match:{status:"A"}},

{

$redact:{

$cond:{

if:{$eq:["$level",5]},

then:"$$PRUNE",

else:"$$DESCEND"

}}}]);

db.produce.aggregate([

{

$replaceRoot:{newRoot:"$in_stock"}

}

])

db.users.aggregate(

[{$sample:{size:3}}]

)

db.article.aggregate(

{$skip:5}

);

db.users.aggregate(

[

{$sort:{age:-1,posts:1}}

]

)

db.exhibits.aggregate(

[{$unwind:"$tags"},{$sortByCount:"$tags"}])

Page 131: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

adocumentforeachelement[24].

Theoutstageisusedtowriteaggregationpipelineresultsintoacollection.Thisstageshouldbethelaststageofapipeline[24].

AnotheroptionfromtheaggregationoperationsistheMap/Reduceframework,whichessentiallyincludestwodifferentfunctions,mapandreduce.Thefirstoneprovidesthekeyvaluepairforeachtaginthearray,whilethelatterone

“sumsoveralloftheemittedvaluesforagivenkey”[23].

ThelaststepintheMap/Reduceprocessittocallthemap_reduce()functionanditerateovertheresults[23].TheMap/Reduceoperationprovidesresultdatainacollectionorreturnsresultsin-line.Onecanperformsubsequentoperationswiththesameinputcollectioniftheoutputofthesameiswrittentoacollection[25].Anoperationthatproducesresultsinain-lineformmustprovideresultswithintheBSONdocument size limit.Thecurrent limit for aBSONdocument is16MB.Thesetypesofoperationsarenotsupportedbyviews[25].ThePyMongo’sAPI supports all features of the MongoDB’s Map/Reduce engine [26].Moreover,Map/Reduce has the ability to getmore detailed results by passingfull_response=Trueargumenttothemap_reduce()function[26].

9.1.3.14DeletingDocumentsfromaCollection

ThedeletionofdocumentswithPyMongo is fairly straight forward.Todo so,one would use the remove() method of the PyMongo Collection object [22].Similarlytothereadsandupdates,specificationofdocumentstoberemovedisamust.Forexample,removaloftheentiredocumentcollectionwithascoreof1,wouldrequiredonetousethefollowingcommand:

ThesafeparametersettoTrueensurestheoperationwascompleted[22].

db.inventory.aggregate([{$unwind:"$sizes"}])

db.inventory.aggregate([{$unwind:{path:"$sizes"}}])

db.books.aggregate([

{$group:{_id:"$author",books:{$push:"$title"}}},

{$out:"authors"}

])

cloudmesh.users.remove({"score":1,safe=True})

Page 132: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9.1.3.15CopyingaDatabase

Copying databases within the same mongod instance or between differentmongodservers ismadepossiblewith thecommand()methodafterconnectingto the desired mongod instance [27]. For example, to copy the cloudmeshdatabase and name the new database cloudmesh_copy, one would use thecommand()methodinthefollowingmanner:

There are two ways to copy a database between servers. If a server is notpassword-prodected, one would not need to pass in the credentials nor toauthenticate to the admin database [27]. In that case, to copy a database onewouldusethefollowingcommand:

On the other hand, if the server where we are copying the database to isprotected,onewouldusethiscommandinstead:

9.1.3.16PyMongoStrengths

One of PyMongo strengths is that allows document creation and queryingnatively

“through the use of existing language features such as nesteddictionariesandlists”[22].

Formoderately experienced Python developers, it is very easy to learn it andquicklyfeelcomfortablewithit.

“For these reasons, MongoDB and Python make a powerfulcombinationforrapid,iterativedevelopmentofhorizontallyscalable

client.admin.command('copydb',

fromdb='cloudmesh',

todb='cloudmesh_copy')

client.admin.command('copydb',

fromdb='cloudmesh',

todb='cloudmesh_copy',

fromhost='source.example.com')

client=MongoClient('target.example.com',

username='administrator',

password='pwd')

client.admin.command('copydb',

fromdb='cloudmesh',

todb='cloudmesh_copy',

fromhost='source.example.com')

Page 133: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

backendapplications”[22].

Accordingto[22],MongoDBisveryapplicable tomodernapplications,whichmakesPyMongoequallyvaluable[22].

9.1.4MongoEngine

“MongoEngineisanObject-DocumentMapper,writteninPythonforworkingwithMongoDB”[28].

It is actually a library that allows a more advanced communication withMongoDBcomparedtoPyMongo.AsMongoEngineistechnicallyconsideredtobeanobject-documentmapper(ODM),itcanalsobeconsideredtobe

“equivalenttoaSQL-basedobjectrelationalmapper(ORM)”[19].

Theprimary techniquewhyonewoulduse anODM includesdataconversionbetweencomputersystemsthatarenotcompatiblewitheachother[29].Forthepurpose of converting data to the appropriate form, a virtual object databasemustbecreatedwithin theutilizedprogramming language[29].This library isalsousedtodefineschematafordocumentswithinMongoDB,whichultimatelyhelpswithminimizingcodingerrorsaswelldefiningmethodsonexistingfields[30].Itisalsoverybeneficialtotheoverallworkflowasittrackschangesmadetothedocumentsandaidsinthedocumentsavingprocess[31].

9.1.4.1Installation

Theinstallationprocessforthistechnologyisfairlysimpleasitisconsideredtobealibrary.Toinstallit,onewouldusethefollowingcommand[32]:

Ableeding-edgeversionofMongoEnginecanbeinstalleddirectlyfromGitHubbyfirstcloningtherepositoryonthelocalmachine,virtualmachine,orcloud.

9.1.4.2ConnectingtoadatabaseusingMongoEngine

Once installed, MongoEngine needs to be connected to an instance of the

$pipinstallmongoengine

Page 134: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

mongod, similarly to PyMongo [33]. The connect() function must be used tosuccessfully complete this step and the argument that must be used in thisfunctionisthenameofthedesireddatabase[33].Priortousingthisfunction,thefunctionnameneedstobeimportedfromtheMongoEnginelibrary.

SimilarlytotheMongoClient,MongoEngineusesthelocalhostandport27017by default, however, the connect() function also allows specifying other hostsandportargumentsaswell[33].

Other types of connections are also supported (i.e. URI) and they can becompletedbyprovidingtheURIintheconnect()function[33].

9.1.4.3QueryingusingMongoEngine

ToqueryMongoDBusingMongoEngineanobjectsattribute isused,whichis,technically, a part of the document class [34]. This attribute is called theQuerySetManagerwhichinreturn

“createsanewQuerySetobjectonaccess”[34].

Tobeabletoaccessindividualdocumentsfromadatabase,thisobjectneedstobe iterated over. For example, to return/print all students in thecloudmesh_community object (database), the following command would beused.

MongoEngine also has a capability of query filtering which means that akeyword can be used within the called QuerySet object to retrieve specificinformation [34]. Let us say one would like to iterate overcloudmesh_communitystudentsthatarenativesofIndiana.Toachievethis,onewouldusethefollowingcommand:

Thislibraryalsoallowstheuseofalloperatorsexceptfortheequalityoperator

frommongoengineimportconnect

connect('cloudmesh_community')

connect('cloudmesh_community',host='196.185.1.62',port=16758)

foruserincloudmesh_community.objects:

printcloudmesh_community.student

indy_students=cloudmesh_community.objects(state='IN')

Page 135: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

in itsqueries, andmoreover,has thecapabilityofhandlingstringqueries,geoqueries,listquerying,andqueryingoftherawPyMongoqueries[34].

The string queries are useful in performing text operations in the conditionalqueries.Aquery to find a document exactlymatching andwith stateACTIVEcanbeperformedinthefollowingmanner:

ThequerytoretrievedocumentdatafornamesthatstartwithacasesensitiveALcanbewrittenas:

Toperformanexactsamequeryforthenon-key-sensitiveALonewouldusethefollowingcommand:

TheMongoEngineallowsdataextractionofgeographicallocationsbyusingGeoqueries.Thegeo_withinoperatorchecksifageometryiswithinapolygon.

Thelistquerylooksupthedocumentswherethespecifiedfieldsmatchesexactlytothegivenvalue.Tomatchallpagesthathavethewordcodingasaniteminthetagslistonewouldusethefollowingquery:

Overall,itwouldbesafetosaythatMongoEnginehasgoodcompatibilitywithPython. It provides different functions to utilize Python easily withMongoDBand which makes this pair even more attractive to applicationdevelopers.

9.1.5Flask-PyMongo

“Flaskisamicro-webframeworkwritteninPython”[35].

db.cloudmesh_community.find(State.exact("ACTIVE"))

db.cloudmesh_community.find(Name.startswith("AL"))

db.cloudmesh_community.find(Name.istartswith("AL"))

cloudmesh_community.objects(

point__geo_within=[[[40,5],[40,6],[41,6],[40,5]]])

cloudmesh_community.objects(

point__geo_within={"type":"Polygon",

"coordinates":[[[40,5],[40,6],[41,6],[40,5]]]})

classPage(Document):

tags=ListField(StringField())

Page.objects(tags='coding')

Page 136: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

ItwasdevelopedafterDjango,and it isverypythonic innaturewhich impliesthatitisexplicitlythetargetingthePythonusercommunity.ItislightweightasitdoesnotrequireadditionaltoolsorlibrariesandhenceisclassifiedasaMicro-Webframework.ItisoftenusedwithMongoDBusingPyMongoconnector,andit treats data within MongoDB as searchable Python dictionaries. TheapplicationssuchasPinterest,LinkedIn,andthecommunitywebpageforFlaskare using theFlask framework.Moreover, it supports various features such asthe RESTful request dispatching, secure cookies, Google app enginecompatibility,andintegratedsupportforunittesting,etc[35].Whenitcomestoconnectingtoadatabase,theconnectiondetailsforMongoDBcanbepassedasavariableorconfiguredinPyMongoconstructorwithadditionalargumentssuchasusernameandpassword,ifrequired.ItisimportantthatversionsofbothFlaskandMongoDBarecompatiblewitheachothertoavoidfunctionalitybreaks[36].

9.1.5.1Installation

Flask-PyMongocanbeinstalledwithaneasycommandsuchasthis:

PyMongocanbeaddedinthefollowingmanner:

9.1.5.2Configuration

TherearetwowaystoconfigureFlask-PyMongo.ThefirstwaywouldbetopassaMongoDBURItothePyMongoconstructor,whilethesecondwaywouldbeto

“assignittotheMONGO_URIFlaskconfiurationvariable”[36].

9.1.5.3Connectiontomultipledatabases/servers

Multiple PyMongo instances can be used to connect to multiple databases ordatabase servers. To achieve this, once would use a command similar to thefollowing:

$pipinstallFlask-PyMongo

fromflaskimportFlask

fromflask_pymongoimportPyMongo

app=Flask(__name__)

app.config["MONGO_URI"]="mongodb://localhost:27017/cloudmesh_community"

mongo=PyMongo(app)

Page 137: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

9.1.5.4Flask-PyMongoMethods

Flask-PyMongo provides helpers for some common tasks.One of them is theCollection.find_one_or_404methodshowninthefollowingexample:

This method is very similar to the MongoDB’s find_one() method, however,insteadofreturningNoneitcausesa404NotFoundHTTPstatus[36].

Similarly,thePyMongo.send_fileandPyMongo.save_filemethodsworkon thefile-likeobjectsandsavethemtoGridFSusingthegivenfilename[36].

9.1.5.5AdditionalLibraries

Flask-MongoAlchemyandFlask-MongoEngineare theadditional libraries thatcanbeused toconnect toaMongoDBdatabasewhileusingenhancedfeatureswith the Flask app. The Flask-MongoAlchemy is used as a proxy betweenPython and MongoDB to connect. It provides an option such as server ordatabasebasedauthenticationtoconnecttoMongoDB.Whilethedefault issetserver based, to use a database-based authentication, the config valueMONGOALCHEMY_SERVER_AUTHparametermustbesettoFalse[37].

Flask-MongoEngine is the Flask extension that provides integration with theMongoEngine. It handles connection management for the apps. It can beinstalledthroughpipandsetupveryeasilyaswell.Thedefaultconfigurationisset to the local host and port 27017. For the custom port and in caseswhereMongoDB is running on another server, the host and port must be explicitlyspecifiedinconnectstringswithintheMONGODB_SETTINGSdictionarywithapp.config, alongwith the database username and password, in caseswhere adatabaseauthenticationisenabled.TheURIstyleconnectionsarealsosupportedandsupply theURIas thehost in theMONGODB_SETTINGS dictionarywithapp.config.TherearevariouscustomquerysetsthatareavailablewithinFlask-

app=Flask(__name__)

mongo1=PyMongo(app,uri="mongodb://localhost:27017/cloudmesh_community_one")

mongo2=PyMongo(app,uri="mongodb://localhost:27017/cloudmesh_community_two")

mongo3=PyMongo(app,uri=

"mongodb://another.host:27017/cloudmesh_community_Three")

@app.route("/user/<username>")

defuser_profile(username):

user=mongo.db.cloudmesh_community.find_one_or_404({"_id":username})

returnrender_template("user.html",user=user)

Page 138: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

MongoenginethatareattachedtoMongoengine’sdefaultqueryset[38].

9.1.5.6ClassesandWrappers

Attributes such as cx and db in the PyMongo objects are the ones that helpprovideaccesstotheMongoDBserver[36].Toachievethis,onemustpasstheFlaskapptotheconstructororcallinit_app()[36].

“Flask-PyMongo wraps PyMongo’s MongoClient, Database, andCollectionclasses,andoverridestheirattributeanditemaccessors”[36].

This type of wrapping allows Flask-PyMongo to add methods to CollectionwhileatthesametimeallowingaMongoDB-styledottedexpressionsinthecode[36].

Flask-PyMongo creates connectivity between Python and Flask using aMongoDBdatabaseandsupports

“extensions that can add application features as if they wereimplementedinFlaskitself”[39],

hence, it canbeused as an additionalFlask functionality inPython code.Theextensions are there for the purpose of supporting form validations,authentication technologies, object-relational mappers and framework relatedtoolswhichultimatelyaddsalotofstrengthtothismicro-webframework[39].OneofthemainreasonsandbenefitswhyitisfrequentlyusedwithMongoDBisitscapabilityofaddingmorecontroloverdatabasesandhistory[39].

9.2MONGOENGINE☁�9.2.1Introduction

MongoEngine isadocumentmapper forworkingwithmongoldbwithpython.To be able to use mongo engine MongodD should be already installed and

type(mongo.cx)

type(mongo.db)

type(mongo.db.cloudmesh_community)

Page 139: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

running.

9.2.2Installandconnect

Mongoenginecanbeinstalledbyrunning:

Thiswillinstallsix,pymongoandmongoengine.

To connect to mongoldb use connect () function by specifying mongoldbinstancename.Youdon’tneedtogotomongoshellbutthiscanbedonefromunix shell or cmd line. In this case we are connecting to a database namedstudent_db.

Ifmongodb is runningonaportdifferent fromdefaultport ,portnumberandhost need to be specified. If mongoldb needs authentication username andpasswordneedtobespecified.

9.2.3Basics

Mongodbdoesnotenforceschemas.ComparingtoRDBMS,Rowinmongoldbis called a “document” and table can be compared toCollection. Defining aschemaishelpfulasitminimizescodingerror’s.Todefineaschemawecreateaclassthatinheritsfromdocument.

�TODO:Canyoufixthecodesectionsandlookattheexamplesweprovided.

Fields are notmandatory but if needed, set the required keyword argument toTrue. There are multiple values available for field types. Each field can becustomizedbybykeywordargument.IfeachstudentissendingtextmessagestoUniversitiescentraldatabase,thesecanbestoredusingMongodb.Eachtextcanhavedifferentdatatypes,somemighthaveimagesorsomemighthaveurl’s.SowecancreateaclasstextandlinkittostudentbyusingReferencefield(similar

$pipinstallmongoengine

frommongoengineimport*connect(‘student_db’)

frommongoengineimport*

classStudent(Document):

first_name=StringField(max_length=50)

last_name=StringField(max_length=50)

Page 140: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

toforeignkeyinRDBMS).

MongoDb supports adding tags to individual texts rather then storing themseparately and then having them referenced.Similarly Comments can also bestoreddirectlyinaText.

Foraccessingdata:ifweneedtogettitles.

Searchingtextswithtags.

classText(Document):

title=StringField(max_length=120,required=True)

author=ReferenceField(Student)

meta={'allow_inheritance':True}

classOnlyText(Text):

content=StringField()

classImagePost(Text):

image_path=StringField()

classLinkPost(Text):

link_url=StringField()

classText(Document):

title=StringField(max_length=120,required=True)

author=ReferenceField(User)

tags=ListField(StringField(max_length=30))

comments=ListField(EmbeddedDocumentField(Comment))

fortextinOnlyText.objects:

print(text.title)

fortextinText.objects(tags='mongodb'):

print(text.title)

Page 141: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10OTHER

10.1WORDCOUNTWITHPARALLELPYTHON☁�We will demonstrate Python’s multiprocessing API for parallel computation bywriting a program that counts how many times each word in a collection ofdocumentsappear.

10.1.1GeneratingaDocumentCollection

Beforewebegin,letuswriteascriptthatwillgeneratedocumentcollectionsbyspecifying the number of documents and the number ofwords per document.Thiswillmakebenchmarkingstraightforward.

To keep it simple, the vocabulary of the document collection will consist ofrandomnumbersratherthanthewordsofanactuallanguage:'''Usage:generate_nums.py[-h]NUM_LISTSINTS_PER_LISTMIN_INTMAX_INTDEST_DIR

Generaterandomlistsofintegersandsavethem

as1.txt,2.txt,etc.

Arguments:

NUM_LISTSThenumberofliststocreate.

INTS_PER_LISTThenumberofintegersineachlist.

MIN_NUMEachgeneratedintegerwillbe>=MIN_NUM.

MAX_NUMEachgeneratedintegerwillbe<=MAX_NUM.

DEST_DIRAdirectorywherethegeneratednumberswillbestored.

Options:

-h--help

'''

from__future__importprint_function

importos,random,logging

fromdocoptimportdocopt

defgenerate_random_lists(num_lists,

ints_per_list,min_int,max_int):

return[[random.randint(min_int,max_int)\

foriinrange(ints_per_list)]foriinrange(num_lists)]

if__name__=='__main__':

args=docopt(__doc__)

num_lists,ints_per_list,min_int,max_int,dest_dir=[

int(args['NUM_LISTS']),

int(args['INTS_PER_LIST']),

int(args['MIN_INT']),

int(args['MAX_INT']),

args['DEST_DIR']

]

ifnotos.path.exists(dest_dir):

Page 142: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Notice thatwe are using the docoptmodule that you should be familiar withfromtheSection[PythonDocOpts](#s-python-docopts}tomakethescripteasytorunfromthecommandline.

Youcangenerateadocumentcollectionwiththisscriptasfollows:

10.1.2SerialImplementation

Afirstserialimplementationofwordcountisstraightforward:

os.makedirs(dest_dir)

lists=generate_random_lists(num_lists,

ints_per_list,

min_int,

max_int)

curr_list=1

forlstinlists:

withopen(os.path.join(dest_dir,'%d.txt'%curr_list),'w')asf:

f.write(os.linesep.join(map(str,lst)))

curr_list+=1

logging.debug('Numberswritten.')

pythongenerate_nums.py1000100000100docs-1000-10000

'''Usage:wordcount.py[-h]DATA_DIR

Readacollectionof.txtdocumentsandcounthowmanytimeseachword

appearsinthecollection.

Arguments:

DATA_DIRAdirectorywithdocuments(.txtfiles).

Options:

-h--help

'''

from__future__importdivision,print_function

importos,glob,logging

fromdocoptimportdocopt

logging.basicConfig(level=logging.DEBUG)

defwordcount(files):

counts={}

forfilepathinfiles:

withopen(filepath,'r')asf:

words=[word.strip()forwordinf.read().split()]

forwordinwords:

ifwordnotincounts:

counts[word]=0

counts[word]+=1

returncounts

if__name__=='__main__':

args=docopt(__doc__)

ifnotos.path.exists(args['DATA_DIR']):

raiseValueError('Invaliddatadirectory:%s'%args['DATA_DIR'])

counts=wordcount(glob.glob(os.path.join(args['DATA_DIR'],'*.txt')))

logging.debug(counts)

Page 143: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.1.3SerialImplementationUsingmapandreduce

We can improve the serial implementation in anticipation of parallelizing theprogrambymakinguseofPython’smapandreducefunctions.

In short, you can use map to apply the same function to the members of acollection.Forexample,toconvertalistofnumberstostrings,youcoulddo:

We can use reduce to apply the same function cumulatively to the items of asequence.Forexample,tofindthetotalofthenumbersinourlist,wecouldusereduceasfollows:

Wecansimplifythisevenmorebyusingalambdafunction:

YoucanreadmoreaboutPython’slambdafunctioninthedocs.

Withthisinmind,wecanreimplementthewordcountexampleasfollows:

importrandom

nums=[random.randint(1,2)for_inrange(10)]

print(nums)

[2,1,1,1,2,2,2,2,2,2]

print(map(str,nums))

['2','1','1','1','2','2','2','2','2','2']

defadd(x,y):

returnx+y

print(reduce(add,nums))

17

print(reduce(lambdax,y:x+y,nums))

17

'''Usage:wordcount_mapreduce.py[-h]DATA_DIR

Readacollectionof.txtdocumentsandcounthow

manytimeseachword

appearsinthecollection.

Arguments:

DATA_DIRAdirectorywithdocuments(.txtfiles).

Options:

-h--help

'''

from__future__importdivision,print_function

importos,glob,logging

fromdocoptimportdocopt

logging.basicConfig(level=logging.DEBUG)

defcount_words(filepath):

counts={}

withopen(filepath,'r')asf:

words=[word.strip()forwordinf.read().split()]

Page 144: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.1.4ParallelImplementation

Drawingon theprevious implementationusing mapand reduce,we can parallelizetheimplementationusingPython’smultiprocessingAPI:

10.1.5Benchmarking

forwordinwords:

ifwordnotincounts:

counts[word]=0

counts[word]+=1

returncounts

defmerge_counts(counts1,counts2):

forword,countincounts2.items():

ifwordnotincounts1:

counts1[word]=0

counts1[word]+=counts2[word]

returncounts1

if__name__=='__main__':

args=docopt(__doc__)

ifnotos.path.exists(args['DATA_DIR']):

raiseValueError('Invaliddatadirectory:%s'%args['DATA_DIR'])

per_doc_counts=map(count_words,

glob.glob(os.path.join(args['DATA_DIR'],

'*.txt')))

counts=reduce(merge_counts,[{}]+per_doc_counts)

logging.debug(counts)

'''Usage:wordcount_mapreduce_parallel.py[-h]DATA_DIRNUM_PROCESSES

Readacollectionof.txtdocumentsandcount,inparallel,howmany

timeseachwordappearsinthecollection.

Arguments:

DATA_DIRAdirectorywithdocuments(.txtfiles).

NUM_PROCESSESThenumberofparallelprocessestouse.

Options:

-h--help

'''

from__future__importdivision,print_function

importos,glob,logging

fromdocoptimportdocopt

fromwordcount_mapreduceimportcount_words,merge_counts

frommultiprocessingimportPool

logging.basicConfig(level=logging.DEBUG)

if__name__=='__main__':

args=docopt(__doc__)

ifnotos.path.exists(args['DATA_DIR']):

raiseValueError('Invaliddatadirectory:%s'%args['DATA_DIR'])

num_processes=int(args['NUM_PROCESSES'])

pool=Pool(processes=num_processes)

per_doc_counts=pool.map(count_words,

glob.glob(os.path.join(args['DATA_DIR'],

'*.txt')))

counts=reduce(merge_counts,[{}]+per_doc_counts)

logging.debug(counts)

Page 145: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Totimeeachoftheexamples,enteritintoitsownPythonfileanduseLinux’stimecommand:

Theoutputcontains the real run timeand theuser run time. real iswall clocktime-timefromstarttofinishofthecall.useristheamountofCPUtimespentin user-mode code (outside the kernel)within the process, that is, only actualCPUtimeusedinexecutingtheprocess.

10.1.6Excersises

E.python.wordcount.1:

Run the threedifferentprograms (serial, serialw/mapandreduce,parallel)andanswerthefollowingquestions:

1. Is there any performance difference between the differentversionsoftheprogram?

2. Doesusertimesignificantlydifferfromrealtimeforanyoftheversionsoftheprogram?

3. Experimentwithdifferentnumbersofprocessesfortheparallelexample, starting with 1. What is the performance gain whenyougoalfrom1to2processes?From2to3?Whendoyoustopseeing improvement? (this will depend on your machinearchitecture)

10.1.7References

Map,FilterandReducemultiprocessingAPI

10.2NUMPY☁�NumPyisapopularlibrarythatisusedbymanyotherPythonpackagessuchasPandas, SciPy, and scikit-learn. It provides a fast, simple-to-use way ofinteracting with numerical data organized in vectors and matrices. In thissection,wewillprovideashortintroductiontoNumPy.

$timepythonwordcount.pydocs-1000-10000

Page 146: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.2.1InstallingNumPy

The most common way of installing NumPy, if it wasn’t included with yourPythoninstallation,istoinstallitviapip:

IfNumPyhasalreadybeeninstalled,youcanupdatetothemostrecentversionusing:

YoucanverifythatNumPyisinstalledbytryingtouseitinaPythonprogram:

Notethat,byconvention,weimportNumPyusingthealias‘np’-wheneveryousee ‘np’ sprinkled in example Python code, it’s a good bet that it is usingNumPy.

10.2.2NumPyBasics

At its core, NumPy is a container for n-dimensional data. Typically, 1-dimensional data is called an array and 2-dimensional data is called amatrix.Beyond2-dimenionswouldbeconsideredamultidimensionalarray.Exampleswhereyou’llencounterthesedimenionsmayinclude:

1 Dimensional: time series data such as audio, stock prices, or a singleobservationinadataset.2 Dimensional: connectivity data between network nodes, user-productrecommendations,anddatabasetables.3+ Dimensional: network latency between nodes over time, video(RGB+time),andversioncontrolleddatasets.

All of these data can be placed into NumPy’s array object, just with varyingdimensions.

10.2.3DataTypes:TheBasicBuildingBlocks

Beforewedelveintoarraysandmatrices,wewillstartoffwiththemostbasic

$pipinstallnumpy

$pipinstall-Unumpy

importnumpyasnp

Page 147: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

element of those: a single value. NumPy can represent data utilizing manydifferentstandarddatatypessuchasuint8(an8-bitusigned integer), float64(a64-bitfloat),orstr(astring).Anexhaustivelistingcanbefoundat:

https://docs.scipy.org/doc/numpy-1.15.0/user/basics.types.html

Beforemovingon,itisimportanttoknowaboutthetradeoffmadewhenusingdifferentdatatypes.Forexample,auint8canonlycontainvaluesbetween0and255.This,however,contrastswithfloat64whichcanexpressanyvaluefrom+/-1.80e+308.Sowhywouldn’twejustalwaysusefloat64s?Thoughtheyallowustobemoreexpressiveintermsofnumbers,theyalsoconsumemorememory.Ifwewereworkingwith a12megapixel image, for example, storing that imageusinguint8valueswouldrequire3000*4000*8=96millionbits,or91.55MBof memory. If we were to store the same image utilizing float64, our imagewouldconsume8 timesasmuchmemory:768millionbitsor732.42MB.It’simportant use the right datatype for the job to avoid consuming unneccessaryresourcesorslowingdownprocessing.

Finally,whileNumPywillconvenientlyconvertbetweendatatypes,onemustbeawareofoverflowswhenusingsmallerdatatypes.Forexample:

In this example, it makes sense that 6+7=13. But how does 13+245=2? Putsimply, the object type (uint8) simply ran out of space to store the value andwrapped back around to the beginning. An 8-bit number is only capable ofstoring2^8,or256,uniquevalues.Anoperationthatresultsinavalueabovethatrangewill‘overflow’andcausethevaluetowrapbackaroundtozero.Likewise,anythingbelowthatrangewill‘underflow’andwrapbackaroundtotheend.Inour example, 13+245 became 258,whichwas too large to store in 8 bits andwrappedbackaroundto0andendedupat2.

NumPywill, generally, try to avoid this situation by dynamically retyping towhateverdatatypewillsupporttheresult:

a=np.array([6],dtype=np.uint8)

print(a)

>>>[6]

a=a+np.array([7],dtype=np.uint8)

print(a)

>>>[13]

a=a+np.array([245],dtype=np.uint8)

print(a)

>>>[2]

a=a+260

Page 148: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Here,ouradditioncausedourarray,‘a’,tobeupscaledtouseuint16insteadofuint8. Finally, NumPy offers convenience functions akin to Python’s range()functiontocreatearraysofsequentialnumbers:

Wecanusethisfunctiontoalsogenerateparametersspacesthatcanbeiteratedon:

10.2.4Arrays:StringingThingsTogether

With our knowledge of datatypes in hand, we can begin to explore arrays.Simply put, arrays can be thought of as a sequence of values (not neccesarilynumbers).Arraysare1dimensionalandcanbecreatedandaccessedsimply:

Arrays (and, later,matrices)arezero-indexed.Thismakes it convenientwhen,forexample,usingPython’srange()functiontoiteratethroughanarray:

Arraysare,also,mutableandcanbechangedeasily:

NumPy also includes incredibly powerful broadcasting features.Thismakes itvery simple to perform mathematical operations on arrays that also makes

print(test)

>>>[262]

X=np.arange(0.2,1,.1)

print(X)

>>>array([0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9],dtype=float32)

P=10.0**np.arange(-7,1,1)

print(P)

forx,pinzip(X,P):

print('%f,%f'%(x,p))

a=np.array([1,2,3])

print(type(a))

>>><class'numpy.ndarray'>

print(a)

>>>[123]

print(a.shape)

>>>(3,)

a[0]

>>>1

foriinrange(3):

print(a[i])

>>>1

>>>2

>>>3

a[0]=42

print(a)

>>>array([42,2,3])

Page 149: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

intuitivesense:

Arrayscanalsointeractwithotherarrays:

In this example, the result of multiplying together two arrays is to take theelement-wiseproductwhilemultiplyingbyaconstantwillmultiplyeachelementin the array by that constant. NumPy supports all of the basic mathematicaloperations: addition, subtraction, multiplication, division, and powers. It alsoincludesanextensivesuiteofmathematicalfunctions,suchaslog()andmax(),whicharecoveredlater.

10.2.5Matrices:AnArrayofArrays

Matrices can be thought of as an extension of arrays - rather than having onedimension,matriceshave2(ormore).Muchlikearrays,matricescanbecreatedeasilywithinNumPy:

Accessingindividualelementsissimilartohowwediditforarrays.Wesimplyneedtopassinanumberofargumentsequaltothenumberofdimensions:

In this example, our first index selected the row and the second selected thecolumn-givingusourresultof3.Matricescanbeextendingouttoanynumberofdimensionsbysimplyusingmoreindicestoaccessspecificelements(thoughuse-casesbeyond4maybesomewhatrare).

Matricessupportallofthenormalmathematialfunctionssuchas+,-,*,and/.Aspecialnote:the*operatorwillresultinanelement-wisemultiplication.Using@ornp.matmul()formatrixmultiplication:

a*3

>>>array([3,6,9])

a**2

>>>array([1,4,9],dtype=int32)

b=np.array([2,3,4])

print(a*b)

>>>array([2,6,12])

m=np.array([[1,2],[3,4]])

print(m)

>>>[[12]

>>>[34]]

m[1][0]

>>>3

Page 150: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

MorecomplexmathematicalfunctionscantypicallybefoundwithintheNumPylibraryitself:

A full listing can be found at:https://docs.scipy.org/doc/numpy/reference/routines.math.html

10.2.6SlicingArraysandMatrices

As one can imagine, accessing elements one-at-a-time is both slow and canpotentially require many lines of code to iterate over every dimension in thematrix. Thankfully, NumPy incorporate a very powerful slicing engine thatallowsustoaccessrangesofelementseasily:

The ‘:’value tellsNumPy toselectallelements in thegivendimension.Here,we’ve requested all elements in the first row. We can also use indexing torequestelementswithinagivenrange:

Here,weaskedNumPy togiveuselements4 through7 (ranges inPythonareinclusiveatthestartandnon-inclusiveattheend).Wecanevengobackwards:

Inthepreviousexample,thenegativevalueisaskingNumPytoreturnthelast5elementsof thearray.Had theargumentbeen‘:-5’,NumPywould’vereturnedeverythingBUTthelastfiveelements:

Becoming more familiar with NumPy’s accessor conventions will allow youwritemoreefficient,clearercodeasitiseasiertoreadasimpleone-lineaccessor

print(m-m)

print(m*m)

print(m/m)

print(np.sin(x))

print(np.sum(x))

m[1,:]

>>>array([3,4])

a=np.arange(0,10,1)

print(a)

>>>[0123456789]

a[4:8]

>>>array([4,5,6,7])

a[-5:]

>>>array([5,6,7,8,9])

a[:-5]

>>>array([0,1,2,3,4])

Page 151: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

than it is a multi-line, nested loop when extracting values from an array ormatrix.

10.2.7UsefulFunctions

The NumPy library provides several convenient mathematical functions thatusers can use. These functions provide several advantages to codewritten byusers:

They are open source typically have multiple contributors checking forerrors.Many of them utilize a C interface andwill runmuch faster than nativePythoncode.They’rewrittentoveryflexible.

NumPyarraysandmatricescontainmanyusefulaggregatingfunctionssuchasmax(),min(),mean(), etc These functions are usually able to run an order ofmagnitudefasterthanloopingthroughtheobject,soit’simportanttounderstandwhatfunctionsareavailabletoavoid‘reinventingthewheel.’Inaddition,manyof the functions are able to sum or average across axes, which make themextremely useful if your data has inherent grouping. To return to a previousexample:

In thisexample,wecreateda2x2matrixcontaining thenumbers1 through4.Thesumof thematrix returned theelement-wiseadditionof theentirematrix.Summing across axis 0 (rows) returned a new array with the element-wiseadditionacrosseachrow.Likewise,summingacrossaxis1(columns)returnedthecolumnarsummation.

10.2.8LinearAlgebra

Perhaps one of the most important uses for NumPy is its robust support for

m=np.array([[1,2],[3,4]])

print(m)

>>>[[12]

>>>[34]]

m.sum()

>>>10

m.sum(axis=1)

>>>[3,7]

m.sum(axis=0)

>>>[4,6]

Page 152: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Linear Algebra functions. Like the aggregation functions described in theprevious section, these functions are optimized to be much faster than userimplementationsandcanutilizeprocessesorlevelfeaturestoprovideveryquickcomputations. These functions can be accessed very easily from the NumPypackage:

Included in within np.linalg are functions for calculating theEigendecompositionofsquarematricesandsymmetricmatrices.Finally,togiveaquickexampleofhoweasy it is to implementalgorithms inNumPy,wecaneasilyuseittocalculatethecostandgradientwhenusingsimpleMean-Squared-Error(MSE):

Finally, more advanced functions are easily available to users via the linalglibraryofNumPyas:

10.2.9NumPyResources

https://docs.scipy.org/doc/numpyhttp://cs231n.github.io/python-numpy-tutorial/#numpyhttps://docs.scipy.org/doc/numpy-1.15.1/reference/routines.linalg.htmlhttps://en.wikipedia.org/wiki/Mean_squared_error

10.3SCIPY☁�SciPy is a library built around numpy and has a number of off-the-shelfalgorithmsandoperationsimplemented.Theseincludealgorithmsfromcalculus(such as integration), statistics, linear algebra, image-processing, signal

a=np.array([[1,2],[3,4]])

b=np.array([[5,6],[7,8]])

print(np.matmul(a,b))

>>>[[1922]

[4350]]

cost=np.power(Y-np.matmul(X,weights)),2).mean(axis=1)

gradient=np.matmul(X.T,np.matmul(X,weights)-y)

fromnumpyimportlinalg

A=np.diag((1,2,3))

w,v=linalg.eig(A)

print('w=',w)

print('v=',v)

Page 153: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

processing,machinelearning.

To achieve this, SciPy bundels a number of useful open-source software formathematics,science,andengineering.Itincludesthefollowingpackages:

NumPy,

formanagingN-dimensionalarrays

SciPylibrary,

toaccessfundamentalscientificcomputingcapabilities

Matplotlib,

toconduct2Dplotting

IPython,

foranInteractiveconsole(seejupyter)

Sympy,

forsymbolicmathematics

pandas,

forprovidingdatastructuresandanalysis

10.3.1Introduction

First we add the usual scientific computing modules with the typicalabbreviations, including sp for scipy. We could invoke scipy’s statisticalpackageassp.stats,butforthesakeoflazinessweabbreviatethattoo.

Nowwecreatesomerandomdatatoplaywith.Wegenerate100samplesfroma

importnumpyasnp#importnumpy

importscipyassp#importscipy

fromscipyimportstats#referdirectlytostatsratherthansp.stats

importmatplotlibasmpl#forvisualization

frommatplotlibimportpyplotasplt#referdirectlytopyplot

#ratherthanmpl.pyplot

Page 154: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Gaussiandistributioncenteredatzero.

Howmanyelementsareintheset?

Whatisthemean(average)oftheset?

Whatistheminimumoftheset?

Whatisthemaximumoftheset?

Wecanusethescipyfunctionstoo.What’sthemedian?

Whataboutthestandarddeviationandvariance?

Isn’tthevariancethesquareofthestandarddeviation?

How close are the measures? The differences are close as the followingcalculationshows

Howdoesthislookasahistogram?SeeFigure18,Figure19,Figure20

s=sp.randn(100)

print('Thereare',len(s),'elementsintheset')

print('Themeanofthesetis',s.mean())

print('Theminimumofthesetis',s.min())

print('Themaximumofthesetis',s.max())

print('Themedianofthesetis',sp.median(s))

print('Thestandarddeviationis',sp.std(s),

'andthevarianceis',sp.var(s))

print('Thesquareofthestandarddeviationis',sp.std(s)**2)

print('Thedifferenceis',abs(sp.std(s)**2-sp.var(s)))

print('Andindecimalform,thedifferenceis%0.16f'%

(abs(sp.std(s)**2-sp.var(s))))

plt.hist(s)#yes,onelineofcodeforahistogram

plt.show()

Page 155: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure18:Histogram1

Letusaddsometitles.

Figure19:Histogram2

Typically we do not include titles when we prepare images for inclusion inLaTeX.Thereweusethecaptiontodescribewhatthefigureisabout.

plt.clf()#clearoutthepreviousplot

plt.hist(s)

plt.title("HistogramExample")

plt.xlabel("Value")

plt.ylabel("Frequency")

plt.show()

plt.clf()#clearoutthepreviousplot

plt.hist(s)

plt.xlabel("Value")

Page 156: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure20:Histogram3

Let us try out some linear regression, or curve fitting. See @#fig:scipy-output_30_0

plt.ylabel("Frequency")

plt.show()

importrandom

defF(x):

return2*x-2

defadd_noise(x):

returnx+random.uniform(-1,1)

X=range(0,10,1)

Y=[]

foriinrange(len(X)):

Y.append(add_noise(X[i]))

plt.clf()#clearouttheoldfigure

plt.plot(X,Y,'.')

plt.show()

Page 157: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure21:Result1

Nowlet’strylinearregressiontofitthecurve.

Whatistheslopeandy-interceptofthefittedcurve?

Nowlet’sseehowwellthecurvefitsthedata.We’llcallthefittedcurveF’.

To save images into a PDF file for inclusion into LaTeX documents you cansavetheimagesasfollows.Otherformatssuchaspngarealsopossible,butthequalityisnaturallynotsufficientforinclusioninpapersanddocuments.Forthat

m,b,r,p,est_std_err=stats.linregress(X,Y)

print('Theslopeis',m,'andthey-interceptis',b)

defFprime(x):#thefittedcurve

returnm*x+b

X=range(0,10,1)

Yprime=[]

foriinrange(len(X)):

Yprime.append(Fprime(X[i]))

plt.clf()#clearouttheoldfigure

#theobservedpoints,bluedots

plt.plot(X,Y,'.',label='observedpoints')

#theinterpolatedcurve,connectedredline

plt.plot(X,Yprime,'r-',label='estimatedpoints')

plt.title("LinearRegressionExample")#title

plt.xlabel("x")#horizontalaxistitle

plt.ylabel("y")#verticalaxistitle

#legendlabelstoplot

plt.legend(['obseredpoints','estimatedpoints'])

#commentoutsothatyoucansavethefigure

#plt.show()

Page 158: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

youcertainlywant tousePDF.Thesaveof thefigurehas tooccurbeforeyouusetheshow()command.SeeFigure22

Figure22:Result2

10.3.2References

Formore informationaboutSciPywe recommend thatyouvisit the followinglink

https://www.scipy.org/getting-started.html#learning-to-work-with-scipy

Additionalmaterialandinspirationforthissectionarefrom

[ ]“GettingStartedguide”https://www.scipy.org/getting-started.html

[ ]Prasanth.“SimplestatisticswithSciPy.”Comfortat1AU.February28, 2011. https://oneau.wordpress.com/2011/02/28/simple-statistics-with-scipy/.

[ ] SciPy Cookbook. Lasted updated: 2015. http://scipy-cookbook.readthedocs.io/.

plt.savefig("regression.pdf",bbox_inches='tight')

plt.savefig('regression.png')

plt.show()

Page 159: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

createbibtexentries

10.4SCIKIT-LEARN☁�

LearningObjectives

ExploratorydataanalysisPipelinetopreparedataFulllearningpipelineFinetunethemodelSignificancetests

10.4.1IntroductiontoScikit-learn

Scikit learnisaMachineLearningspecific libraryusedinPython.Librarycanbeusedfordataminingandanalysis.ItisbuiltontopofNumPy,matplotlibandSciPy.ScikitLearnfeaturesDimensionalityreduction,clustering,regressionandclassificationalgorithms.Italsofeaturesmodelselectionusinggridsearch,crossvalidationandmetrics.

Scikitlearnalsoenablesuserstopreprocessthedatawhichcanthenbeusedformachinelearningusingmoduleslikepreprocessingandfeatureextraction.

Inthissectionwedemonstratehowsimpleitistousek-meansinscikitlearn.

10.4.2Installation

Ifyoualreadyhaveaworkinginstallationofnumpyandscipy,theeasiestwaytoinstallscikit-learnisusingpip

10.4.3SupervisedLearning

$pipinstallnumpy

$pipinstallscipy-U

$pipinstall-Uscikit-learn

Page 160: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

SupervisedLearningisusedinmachinelearningwhenwealreadyknowasetofoutputpredictionsbasedon input characteristics andbasedon thatweneed topredictthetargetforanewinput.Trainingdataisusedtotrainthemodelwhichthencanbeusedtopredicttheoutputfromaboundedset.

Problemscanbeoftwotypes

1. Classification:Trainingdatabelongstothreeorfourclasses/categoriesandbasedon the labelwewant topredict theclass/categoryfor theunlabeleddata.

2. Regression : Training data consists of vectorswithout any correspondingtargetvalues.Clusteringcanbeusedforthesetypeofdatasetstodeterminediscover groups of similar examples. Another way is density estimationwhichdeterminethedistributionofdatawithintheinputspace.Histogramisthemostbasicform.

10.4.4UnsupervisedLearning

UnsupervisedLearning isused inmachine learningwhenwehave the trainingsetavailablebutwithoutanycorrespondingtarget.Theoutcomeoftheproblemistodiscovergroupswithintheprovidedinput.Itcanbedoneinmanyways.

Fewofthemarelistedhere

1. Clustering:Discovergroupsofsimilarcharacteristics.2. Density Estimation : Finding the distribution of datawithin the provided

inputorchanging thedata fromahighdimensional space to twoor threedimension.

10.4.5BuildingaendtoendpipelineforSupervisedmachinelearningusingScikit-learn

Adatapipelineisasetofprocessingcomponentsthataresequencedtoproducemeaningfuldata.PipelinesarecommonlyusedinMachinelearning,sincethereislotofdatatransformationandmanipulationthatneedstobeappliedtomakedatauseful formachine learning.All components are sequenced in away thatthe output of one component becomes input for the next and each of thecomponentisselfcontained.Componentsinteractwitheachotherusingdata.

Page 161: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Evenifacomponentbreaks,thedownstreamcomponentcanrunnormallyusingthe last output. Sklearn provide the ability to build pipelines that can betransformedandmodeledformachinelearning.

10.4.6Stepsfordevelopingamachinelearningmodel

1. Explorethedomainspace2. Extracttheproblemdefinition3. Getthedatathatcanbeusedtomakethesystemlearntosolvetheproblem

definition.4. DiscoverandVisualizethedatatogaininsights5. Featureengineeringandpreparethedata6. Finetuneyourmodel7. Evaluateyoursolutionusingmetrics8. Onceprovenlaunchandmaintainthemodel.

10.4.7ExploratoryDataAnalysis

Exampleproject=Frauddetectionsystem

Firststepistoloadthedataintoadataframeinorderforaproperanalysistobedoneontheattributes.

Performthebasicanalysisonthedatashapeandnullvalueinformation.

Hereistheexampleoffewofthevisualdataanalysismethods.

10.4.7.1Barplot

Abarchartorgraphisagraphwithrectangularbarsorbinsthatareusedtoplotcategoricalvalues.Eachbarinthegraphrepresentsacategoricalvariableandtheheightofthebarisproportionaltothevaluerepresentedbyit.

data=pd.read_csv('dataset/data_file.csv')

data.head()

print(data.shape)

print(data.info())

data.isnull().values.any()

Page 162: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Bargraphsareused:

TomakecomparisonsbetweenvariablesTovisualizeanytrendinthedata,i.e.,they show the dependence of one variable on another Estimate values of avariable

Figure23:Exampleofscikit-learnbarplots

10.4.7.2Correlationbetweenattributes

Attributesinadatasetcanberelatedbasedondifferntaspects.

Examplesincludeattributesdependentonanotherorcouldbelooselyortightlycoupled.Alsoexampleincludestwovariablescanbeassociatedwithathirdone.

Inordertounderstandtherelationshipbetweenattributes,correlationrepresentsthebestvisualwaytogetaninsight.Positivecorrelationmeaningbothattributesmovingintothesamedirection.Negativecorrelationreferstooppostedirections.One attributes values increase results in value decrease for other. Zero

plt.ylabel('Transactions')

plt.xlabel('Type')

data.type.value_counts().plot.bar()

Page 163: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

correlationiswhentheattributesareunrelated.

Figure24:scikit-learncorrelationarray

10.4.7.3HistogramAnalysisofdatasetattributes

Ahistogramconsistsofasetofcountsthatrepresentthenumberoftimessome

#computethecorrelationmatrix

corr=data.corr()

#generateamaskforthelowertriangle

mask=np.zeros_like(corr,dtype=np.bool)

mask[np.triu_indices_from(mask)]=True

#setupthematplotlibfigure

f,ax=plt.subplots(figsize=(18,18))

#generateacustomdivergingcolormap

cmap=sns.diverging_palette(220,10,as_cmap=True)

#drawtheheatmapwiththemaskandcorrectaspectratio

sns.heatmap(corr,mask=mask,cmap=cmap,vmax=.3,

square=True,

linewidths=.5,cbar_kws={"shrink":.5},ax=ax);

Page 164: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

eventoccurred.

Figure25:scikit-learn

10.4.7.4BoxplotAnalysis

Box plot analysis is useful in detecting whether a distribution is skewed anddetectoutliersinthedata.

%matplotlibinline

data.hist(bins=30,figsize=(20,15))

plt.show()

fig,axs=plt.subplots(2,2,figsize=(10,10))

tmp=data.loc[(data.type=='TRANSFER'),:]

a=sns.boxplot(x='isFlaggedFraud',y='amount',data=tmp,ax=axs[0][0])

axs[0][0].set_yscale('log')

b=sns.boxplot(x='isFlaggedFraud',y='oldbalanceDest',data=tmp,ax=axs[0][1])

axs[0][1].set(ylim=(0,0.5e8))

c=sns.boxplot(x='isFlaggedFraud',y='oldbalanceOrg',data=tmp,ax=axs[1][0])

axs[1][0].set(ylim=(0,3e7))

d=sns.regplot(x='oldbalanceOrg',y='amount',data=tmp.loc[(tmp.isFlaggedFraud==1),:],ax=axs[1][1])

plt.show()

Page 165: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure26:scikit-learn

10.4.7.5ScatterplotAnalysis

The scatter plot displays values of two numerical variables as Cartesiancoordinates.plt.figure(figsize=(12,8))

sns.pairplot(data[['amount','oldbalanceOrg','oldbalanceDest','isFraud']],hue='isFraud')

Page 166: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure27:scikit-learnscatterplots

10.4.8DataCleansing-RemovingOutliers

Ifthetransactionamountislowerthan5percentoftheallthetransactionsANDdoesnotexceedUSD3000,wewillexcludeitfromouranalysistoreduceType1costsIfthetransactionamountishigherthan95percentofallthetransactionsAND exceeds USD 500000, we will exclude it from our analysis, and use ablanketreviewprocessforsuchtransactions(similartoisFlaggedFraudcolumninoriginaldataset)toreduceType2costslow_exclude=np.round(np.minimum(fin_samp_data.amount.quantile(0.05),3000),2)

high_exclude=np.round(np.maximum(fin_samp_data.amount.quantile(0.95),500000),2)

###UpdatingDatatoexcluderecordspronetoType1andType2costs

low_data=fin_samp_data[fin_samp_data.amount>low_exclude]

data=low_data[low_data.amount<high_exclude]

Page 167: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.4.9PipelineCreation

Machinelearningpipelineisusedtohelpautomatemachinelearningworkflows.Theyoperateby enabling a sequenceofdata tobe transformedandcorrelatedtogether in a model that can be tested and evaluated to achieve an outcome,whetherpositiveornegative.

10.4.9.1DefiningDataFrameSelectortoseparateNumericalandCategoricalattributes

SamplefunctiontoseperateoutNumericalandcategoricalattributes.

10.4.9.2FeatureCreation/AdditionalFeatureEngineering

DuringEDAweidentifiedthattherearetransactionswherethebalancesdonottallyafterthetransactioniscompleted.Webelievethiscouldpotentiallybecaseswherefraudisoccurring.Toaccountforthiserrorinthetransactions,wedefinetwo new features“errorBalanceOrig” and “errorBalanceDest”, calculated byadjusting theamountwith thebeforeandafterbalances for theOriginatorandDestinationaccounts.

Below,wecreateafunctionthatallowsustocreatethesefeaturesinapipeline.

fromsklearn.baseimportBaseEstimator,TransformerMixin

#Createaclasstoselectnumericalorcategoricalcolumns

#sinceScikit-Learndoesn'thandleDataFramesyet

classDataFrameSelector(BaseEstimator,TransformerMixin):

def__init__(self,attribute_names):

self.attribute_names=attribute_names

deffit(self,X,y=None):

returnself

deftransform(self,X):

returnX[self.attribute_names].values

fromsklearn.baseimportBaseEstimator,TransformerMixin

#columnindex

amount_ix,oldbalanceOrg_ix,newbalanceOrig_ix,oldbalanceDest_ix,newbalanceDest_ix=0,1,2,3,4

classCombinedAttributesAdder(BaseEstimator,TransformerMixin):

def__init__(self):#no*argsor**kargs

pass

deffit(self,X,y=None):

returnself#nothingelsetodo

deftransform(self,X,y=None):

errorBalanceOrig=X[:,newbalanceOrig_ix]+X[:,amount_ix]-X[:,oldbalanceOrg_ix]

errorBalanceDest=X[:,oldbalanceDest_ix]+X[:,amount_ix]-X[:,newbalanceDest_ix]

returnnp.c_[X,errorBalanceOrig,errorBalanceDest]

Page 168: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.4.10CreatingTrainingandTestingdatasets

Trainingsetincludesthesetofinputexamplesthatthemodelwillbefitinto ortrained on by adjusting the parameters. Testing dataset is critical to test thegeneralizability of the model . By using this set, we can get the workingaccuracyofourmodel.

Testingsetshouldnotbeexposedtomodelunlessmodeltraininghasnotbeencompleted.Thiswaytheresultsfromtestingwillbemorereliable.

10.4.11Creatingpipelinefornumericalandcategoricalattributes

IdentifyingcolumnswithNumericalandCategoricalcharacteristics.

10.4.12Selectingthealgorithmtobeapplied

Algorithimselectionprimarilydependsontheobjectiveyouaretryingtosolveandwhatkindofdatasetisavailable.Therearediffernttypeofalgorithmswhichcanbeappliedandwewilllookintofewofthemhere.

10.4.12.1LinearRegression

This algorithm can be applied when you want to compute some continuousvalue.Topredictsomefuturevalueofaprocesswhichiscurrentlyrunning,you

fromsklearn.model_selectionimporttrain_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=42,stratify=y)

X_train_num=X_train[["amount","oldbalanceOrg","newbalanceOrig","oldbalanceDest","newbalanceDest"]]

X_train_cat=X_train[["type"]]

X_model_col=["amount","oldbalanceOrg","newbalanceOrig","oldbalanceDest","newbalanceDest","type"]

fromsklearn.pipelineimportPipeline

fromsklearn.preprocessingimportStandardScaler

fromsklearn.preprocessingimportImputer

num_attribs=list(X_train_num)

cat_attribs=list(X_train_cat)

num_pipeline=Pipeline([

('selector',DataFrameSelector(num_attribs)),

('attribs_adder',CombinedAttributesAdder()),

('std_scaler',StandardScaler())

])

cat_pipeline=Pipeline([

('selector',DataFrameSelector(cat_attribs)),

('cat_encoder',CategoricalEncoder(encoding="onehot-dense"))

])

Page 169: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

cangowithregressionalgorithm.

Exampleswherelinearregressioncanusedare:

1. Predictthetimetakentogofromoneplacetoanother2. Predictthesalesforafuturemonth3. Predictsalesdataandimproveyearlyprojections.

10.4.12.2LogisticRegression

Thisalgorithmcanbeusedtoperformbinaryclassification.Itcanbeusedifyouwantaprobabilisticframework.Alsoincaseyouexpecttoreceivemoretrainingdata in the future that you want to be able to quickly incorporate into yourmodel.

1. Customerchurnprediction.2. CreditScoring&FraudDetectionwhichisourexampleproblemwhichwe

aretryingtosolveinthischapter.3. Calculatingtheeffectivenessofmarketingcampaigns.

10.4.12.3Decisiontrees

Decision trees handle feature interactions and they’re non-parametric. Doesnt

fromsklearn.linear_modelimportLinearRegression

fromsklearn.preprocessingimportStandardScaler

importtime

scl=StandardScaler()

X_train_std=scl.fit_transform(X_train)

X_test_std=scl.transform(X_test)

start=time.time()

lin_reg=LinearRegression()

lin_reg.fit(X_train_std,y_train)#SKLearn'slinearregression

y_train_pred=lin_reg.predict(X_train_std)

train_time=time.time()-start

fromsklearn.linear_modelimportLogisticRegression

fromsklearn.model_selectionimporttrain_test_split

X_train,_,y_train,_=train_test_split(X_train,y_train,stratify=y_train,train_size=subsample_rate,random_state=42)

X_test,_,y_test,_=train_test_split(X_test,y_test,stratify=y_test,train_size=subsample_rate,random_state=42)

model_lr_sklearn=LogisticRegression(multi_class="multinomial",C=1e6,solver="sag",max_iter=15)

model_lr_sklearn.fit(X_train,y_train)

y_pred_test=model_lr_sklearn.predict(X_test)

acc=accuracy_score(y_test,y_pred_test)

results.loc[len(results)]=["LRSklearn",np.round(acc,3)]

results

Page 170: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

supportonlinelearningandtheentiretreeneedstoberebuildwhennewtraningdatasetcomesin.Memoryconsumptionisveryhigh.

Canbeusedforthefollowingcases

1. Investmentdecisions2. Customerchurn3. Banksloandefaulters4. BuildvsBuydecisions5. Salesleadqualifications

10.4.12.4KMeans

Thisalgorithmisusedwhenwearenotawareofthelabelsandoneneedstobecreatedbasedon the featuresofobjects.Examplewillbe todivideagroupofpeopleintodifferntsubgroupsbasedoncommonthemeorattribute.

ThemaindisadvantageofK-meanisthatyouneedtoknowexactlythenumberofclustersorgroupswhichisrequired.IttakesalotofiterationtocomeupwiththebestK.

10.4.12.5SupportVectorMachines

fromsklearn.treeimportDecisionTreeRegressor

dt=DecisionTreeRegressor()

start=time.time()

dt.fit(X_train_std,y_train)

y_train_pred=dt.predict(X_train_std)

train_time=time.time()-start

start=time.time()

y_test_pred=dt.predict(X_test_std)

test_time=time.time()-start

fromsklearn.neighborsimportKNeighborsClassifier

fromsklearn.model_selectionimporttrain_test_split,GridSearchCV,PredefinedSplit

fromsklearn.metricsimportaccuracy_score

X_train,_,y_train,_=train_test_split(X_train,y_train,stratify=y_train,train_size=subsample_rate,random_state=42)

X_test,_,y_test,_=train_test_split(X_test,y_test,stratify=y_test,train_size=subsample_rate,random_state=42)

model_knn_sklearn=KNeighborsClassifier(n_jobs=-1)

model_knn_sklearn.fit(X_train,y_train)

y_pred_test=model_knn_sklearn.predict(X_test)

acc=accuracy_score(y_test,y_pred_test)

results.loc[len(results)]=["KNNArbitarySklearn",np.round(acc,3)]

results

Page 171: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

SVM is a supervised ML technique and used for pattern recognition andclassification problems when your data has exactly two classes. Its popular intextclassificationproblems.

FewcaseswhereSVMcanbeusedis

1. Detectingpersonswithcommondiseases.2. Hand-writtencharacterrecognition3. Textcategorization4. Stockmarketpriceprediction

10.4.12.6NaiveBayes

NaiveBayesisusedforlargedatasets.ThisalgoritmworkswellevenwhenwehavealimitedCPUandmemoryavailable.Thisworksbycalculatingbunchofcounts.Itrequireslesstrainingdata.Thealgorthimcantlearninterationbetweenfeatures.

NaiveBayescanbeusedinreal-worldapplicationssuchas:

1. Sentimentanalysisandtextclassification2. RecommendationsystemslikeNetflix,Amazon3. Tomarkanemailasspamornotspam4. Facerecognition

10.4.12.7RandomForest

RanmdonforestissimilartoDecisiontree.Canbeusedforbothregressionandclassificationproblemswithlargedatasets.

Fewcasewhereitcanbeapplied.

1. Predictpatientsforhighrisks.2. Predictpartsfailuresinmanufacturing.3. Predictloandefaulters.

fromsklearn.ensembleimportRandomForestRegressor

forest=RandomForestRegressor(n_estimators=400,criterion='mse',random_state=1,n_jobs=-1)

start=time.time()

forest.fit(X_train_std,y_train)

Page 172: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.4.12.8Neuralnetworks

Neural network works based on weights of connections between neurons.Weights are trained and based on that the neural network can be utilized topredicttheclassoraquantity.Theyareresourceandmemoryintensive.

Fewcaseswhereitcanbeapplied.

1. Appliedtounsupervisedlearningtasks,suchasfeatureextraction.2. Extracts features from raw images or speech with much less human

intervention

10.4.12.9DeepLearningusingKeras

Keras is most powerful and easy-to-use Python libraries for developing andevaluating deep learning models. It has the efficient numerical computationlibrariesTheanoandTensorFlow.

10.4.12.10XGBoost

XGBooststandsforeXtremeGradientBoosting.XGBoostisanimplementationof gradient boosted decision trees designed for speed and performance. It isengineeredforefficiencyofcomputetimeandmemoryresources.

10.4.13ScikitCheatSheet

ScikitlearninghasputaveryindepthandwellexplainedflowcharttohelpyouchoosetherightalgorithmthatIfindveryhandy.

y_train_pred=forest.predict(X_train_std)

train_time=time.time()-start

start=time.time()

y_test_pred=forest.predict(X_test_std)

test_time=time.time()-start

Page 173: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure28:scikit-learn

10.4.14ParameterOptimization

Machinelearningmodelsareparameterizedsothattheirbehaviorcanbetunedforagivenproblem.Thesemodelscanhavemanyparametersand finding thebestcombinationofparameterscanbetreatedasasearchproblem.

Aparameterisaconfigurationthatispartofthemodelandvaluescanbederivedfromthegivendata.

1. Requiredbythemodelwhenmakingpredictions.2. Valuesdefinetheskillofthemodelonyourproblem.3. Estimatedorlearnedfromdata.4. Oftennotsetmanuallybythepractitioner.5. Oftensavedaspartofthelearnedmodel.

10.4.14.1Hyperparameteroptimization/tuningalgorithms

Gridsearchisanapproachtohyperparametertuningthatwillmethodicallybuildandevaluateamodelforeachcombinationofalgorithmparametersspecifiedin

Page 174: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

agrid.

Random search provide a statistical distribution for each hyperparameter fromwhichvaluesmayberandomlysampled.

10.4.15ExperimentswithKeras(deeplearning),XGBoost,andSVM(SVC)comparedtoLogisticRegression(Baseline)

10.4.15.1Creatingaparametergrid

10.4.15.2ImplementingGridsearchwithmodelsandalsocreatingmetricsfromeachofthemodel.

grid_param=[

[{#LogisticRegression

'model__penalty':['l1','l2'],

'model__C':[0.01,1.0,100]

}],

[{#keras

'model__optimizer':optimizer,

'model__loss':loss

}],

[{#SVM

'model__C':[0.01,1.0,100],

'model__gamma':[0.5,1],

'model__max_iter':[-1]

}],

[{#XGBClassifier

'model__min_child_weight':[1,3,5],

'model__gamma':[0.5],

'model__subsample':[0.6,0.8],

'model__colsample_bytree':[0.6],

'model__max_depth':[3]

}]

]

Pipeline(memory=None,

steps=[('preparation',FeatureUnion(n_jobs=None,

transformer_list=[('num_pipeline',Pipeline(memory=None,

steps=[('selector',DataFrameSelector(attribute_names=['amount','oldbalanceOrg','newbalanceOrig','oldbalanceDest'

tol=0.0001,verbose=0,warm_start=False))])

fromsklearn.metricsimportmean_squared_error

fromsklearn.metricsimportclassification_report

fromsklearn.metricsimportf1_score

fromxgboost.sklearnimportXGBClassifier

fromsklearn.svmimportSVC

test_scores=[]

#MachineLearningAlgorithm(MLA)SelectionandInitialization

MLA=[

linear_model.LogisticRegression(),

keras_model,

SVC(),

XGBClassifier()

Page 175: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.4.15.3ResultstablefromtheModelevaluationwithmetrics.

]

#createtabletocompareMLAmetrics

MLA_columns=['Name','Score','Accuracy_Score','ROC_AUC_score','final_rmse','Classification_error','Recall_Score','Precision_Score'

MLA_compare=pd.DataFrame(columns=MLA_columns)

Model_Scores=pd.DataFrame(columns=['Name','Score'])

row_index=0

foralginMLA:

#setnameandparameters

MLA_name=alg.__class__.__name__

MLA_compare.loc[row_index,'Name']=MLA_name

#MLA_compare.loc[row_index,'Parameters']=str(alg.get_params())

full_pipeline_with_predictor=Pipeline([

("preparation",full_pipeline),#combinationofnumericalandcategoricalpipelines

("model",alg)

])

grid_search=GridSearchCV(full_pipeline_with_predictor,grid_param[row_index],cv=4,verbose=2,scoring='f1',return_train_score

grid_search.fit(X_train[X_model_col],y_train)

y_pred=grid_search.predict(X_test)

MLA_compare.loc[row_index,'Accuracy_Score']=np.round(accuracy_score(y_pred,y_test),3)

MLA_compare.loc[row_index,'ROC_AUC_score']=np.round(metrics.roc_auc_score(y_test,y_pred),3)

MLA_compare.loc[row_index,'Score']=np.round(grid_search.score(X_test,y_test),3)

negative_mse=grid_search.best_score_

scores=np.sqrt(-negative_mse)

final_mse=mean_squared_error(y_test,y_pred)

final_rmse=np.sqrt(final_mse)

MLA_compare.loc[row_index,'final_rmse']=final_rmse

confusion_matrix_var=confusion_matrix(y_test,y_pred)

TP=confusion_matrix_var[1,1]

TN=confusion_matrix_var[0,0]

FP=confusion_matrix_var[0,1]

FN=confusion_matrix_var[1,0]

MLA_compare.loc[row_index,'Classification_error']=np.round(((FP+FN)/float(TP+TN+FP+FN)),5)

MLA_compare.loc[row_index,'Recall_Score']=np.round(metrics.recall_score(y_test,y_pred),5)

MLA_compare.loc[row_index,'Precision_Score']=np.round(metrics.precision_score(y_test,y_pred),5)

MLA_compare.loc[row_index,'F1_Score']=np.round(f1_score(y_test,y_pred),5)

MLA_compare.loc[row_index,'mean_test_score']=grid_search.cv_results_['mean_test_score'].mean()

MLA_compare.loc[row_index,'mean_fit_time']=grid_search.cv_results_['mean_fit_time'].mean()

Model_Scores.loc[row_index,'MLAName']=MLA_name

Model_Scores.loc[row_index,'MLScore']=np.round(metrics.roc_auc_score(y_test,y_pred),3)

#CollectMeanTestscoresforstatisticalsignificancetest

test_scores.append(grid_search.cv_results_['mean_test_score'])

row_index+=1

Page 176: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure29:scikit-learn

10.4.15.4ROCAUCScore

AUC-ROCcurve isaperformancemeasurementforclassificationproblematvarious thresholds settings. ROC is a probability curve and AUC representsdegree or measure of separability. It tells how much model is capable ofdistinguishing between classes. Higher the AUC, better the model is atpredicting0sas0sand1sas1s.

Figure30:scikit-learn

Page 177: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure31:scikit-learn

10.4.16K-meansinscikitlearn.

10.4.16.1Import

10.4.17K-meansAlgorithm

Inthissectionwedemonstratehowsimpleitistousek-meansinscikitlearn.

10.4.17.1Import

10.4.17.2Createsamples

fromtimeimporttime

importnumpyasnp

importmatplotlib.pyplotasplt

fromsklearnimportmetrics

fromsklearn.clusterimportKMeans

fromsklearn.datasetsimportload_digits

fromsklearn.decompositionimportPCA

fromsklearn.preprocessingimportscale

np.random.seed(42)

digits=load_digits()

data=scale(digits.data)

Page 178: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.4.17.3Createsamples

10.4.17.4Visualize

SeeFigure32

np.random.seed(42)

digits=load_digits()

data=scale(digits.data)

n_samples,n_features=data.shape

n_digits=len(np.unique(digits.target))

labels=digits.target

sample_size=300

print("n_digits:%d,\tn_samples%d,\tn_features%d"%(n_digits,n_samples,n_features))

print(79*'_')

print('%9s'%'init''timeinertiahomocomplv-measARIAMIsilhouette')

print("n_digits:%d,\tn_samples%d,\tn_features%d"

%(n_digits,n_samples,n_features))

print(79*'_')

print('%9s'%'init'

'timeinertiahomocomplv-measARIAMIsilhouette')

defbench_k_means(estimator,name,data):

t0=time()

estimator.fit(data)

print('%9s%.2fs%i%.3f%.3f%.3f%.3f%.3f%.3f'

%(name,(time()-t0),estimator.inertia_,

metrics.homogeneity_score(labels,estimator.labels_),

metrics.completeness_score(labels,estimator.labels_),

metrics.v_measure_score(labels,estimator.labels_),

metrics.adjusted_rand_score(labels,estimator.labels_),

metrics.adjusted_mutual_info_score(labels,estimator.labels_),

metrics.silhouette_score(data,estimator.labels_,metric='euclidean',sample_size=sample_size)))

bench_k_means(KMeans(init='k-means++',n_clusters=n_digits,n_init=10),name="k-means++",data=data)

bench_k_means(KMeans(init='random',n_clusters=n_digits,n_init=10),name="random",data=data)

metrics.silhouette_score(data,estimator.labels_,

metric='euclidean',

sample_size=sample_size)))

bench_k_means(KMeans(init='k-means++',n_clusters=n_digits,n_init=10),

name="k-means++",data=data)

bench_k_means(KMeans(init='random',n_clusters=n_digits,n_init=10),

name="random",data=data)

#inthiscasetheseedingofthecentersisdeterministic,hencewerunthe

#kmeansalgorithmonlyoncewithn_init=1

pca=PCA(n_components=n_digits).fit(data)

bench_k_means(KMeans(init=pca.components_,n_clusters=n_digits,n_init=1),name="PCA-based",data=data)

print(79*'_')

bench_k_means(KMeans(init=pca.components_,

n_clusters=n_digits,n_init=1),

name="PCA-based",

data=data)

Page 179: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.4.17.5Visualize

SeeFigure32

Figure32:Result

print(79*'_')

reduced_data=PCA(n_components=2).fit_transform(data)

kmeans=KMeans(init='k-means++',n_clusters=n_digits,n_init=10)

kmeans.fit(reduced_data)

#Stepsizeofthemesh.DecreasetoincreasethequalityoftheVQ.

h=.02#pointinthemesh[x_min,x_max]x[y_min,y_max].

#Plotthedecisionboundary.Forthat,wewillassignacolortoeach

x_min,x_max=reduced_data[:,0].min()-1,reduced_data[:,0].max()+1

y_min,y_max=reduced_data[:,1].min()-1,reduced_data[:,1].max()+1

xx,yy=np.meshgrid(np.arange(x_min,x_max,h),np.arange(y_min,y_max,h))

#Obtainlabelsforeachpointinmesh.Uselasttrainedmodel.

Z=kmeans.predict(np.c_[xx.ravel(),yy.ravel()])

#Puttheresultintoacolorplot

Z=Z.reshape(xx.shape)

plt.figure(1)

plt.clf()

plt.imshow(Z,interpolation='nearest',

extent=(xx.min(),xx.max(),yy.min(),yy.max()),

cmap=plt.cm.Paired,

aspect='auto',origin='lower')

plt.plot(reduced_data[:,0],reduced_data[:,1],'k.',markersize=2)

#PlotthecentroidsasawhiteX

centroids=kmeans.cluster_centers_

plt.scatter(centroids[:,0],centroids[:,1],

marker='x',s=169,linewidths=3,

color='w',zorder=10)

plt.title('K-meansclusteringonthedigitsdataset(PCA-reduceddata)\n'

'Centroidsaremarkedwithwhitecross')

plt.xlim(x_min,x_max)

plt.ylim(y_min,y_max)

plt.xticks(())

plt.yticks(())

plt.show()

Page 180: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.5DASK-RANDOMFORESTFEATUREDETECTION☁�

10.5.1Setup

First we need our tools. pandas gives us the DataFrame, very similar to R’sDataFrames.TheDataFrameisastructurethatallowsustoworkwithourdatamoreeasily.Ithasnicefeaturesforslicingandtransformationofdata,andeasywaystodobasicstatistics.

numpyhassomeveryhandyfunctionsthatworkonDataFrames.

10.5.2Dataset

We are using a dataset about the wine quality dataset, archived at UCI’sMachineLearningRepository(http://archive.ics.uci.edu/ml/index.php).

Nowwewillloadourdata.pandasmakesiteasy!

Like in R, there is a .describe() method that gives basic statistics for everycolumninthedataset.

fixedacidity volatileacidity citricacid residual

sugar chlorides

count 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000mean 8.319637 0.527821 0.270976 2.538806 0.087467std 1.741096 0.179060 0.194801 1.409928 0.047065min 4.600000 0.120000 0.000000 0.900000 0.01200025% 7.100000 0.390000 0.090000 1.900000 0.070000

importpandasaspd

importnumpyasnp

#redwinequalitydata,packedinaDataFrame

red_df=pd.read_csv('winequality-red.csv',sep=';',header=0,index_col=False)

#whitewinequalitydata,packedinaDataFrame

white_df=pd.read_csv('winequality-white.csv',sep=';',header=0,index_col=False)

#rose?otherfruitwines?plumwine?:(

#forredwines

red_df.describe()

Page 181: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

50% 7.900000 0.520000 0.260000 2.200000 0.079000

75% 9.200000 0.640000 0.420000 2.600000 0.090000max 15.900000 1.580000 1.000000 15.500000 0.611000

fixedacidity volatileacidity citricacid residual

sugar chlorides

count 4898.000000 4898.000000 4898.000000 4898.000000 4898.000000mean 6.854788 0.278241 0.334192 6.391415 0.045772std 0.843868 0.100795 0.121020 5.072058 0.021848min 3.800000 0.080000 0.000000 0.600000 0.00900025% 6.300000 0.210000 0.270000 1.700000 0.03600050% 6.800000 0.260000 0.320000 5.200000 0.04300075% 7.300000 0.320000 0.390000 9.900000 0.050000max 14.200000 1.100000 1.660000 65.800000 0.346000

Sometimesitiseasiertounderstandthedatavisually.Ahistogramofthewhitewinequalitydatacitricacidsamplesisshownnext.Youcanofcoursevisualizeother columns’ data or other datasets. Just replace theDataFrame and columnname(seeFigure33).

#forwhitewines

white_df.describe()

importmatplotlib.pyplotasplt

defextract_col(df,col_name):

returnlist(df[col_name])

col=extract_col(white_df,'citricacid')#canreplacewithanotherdataframeorcolumn

plt.hist(col)

#TODO:addaxesandsuchtosetagoodexample

plt.show()

Page 182: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure33:Histogram

10.5.3DetectingFeatures

Letustryoutasomeelementarymachinelearningmodels.Thesemodelsarenotalways for prediction. They are also useful to find what features are mostpredictiveofavariableofinterest.Dependingontheclassifieryouuse,youmayneedtotransformthedatapertainingtothatvariable.

10.5.3.1DataPreparation

LetusassumewewanttostudywhatfeaturesaremostcorrelatedwithpH.pHofcourseisreal-valued,andcontinuous.Theclassifierswewanttouseusuallyneed labeled or integer data.Hence,wewill transform the pH data, assigningwineswithpHhigherthanaverageashi(morebasicoralkaline)andwineswithpHlowerthanaverageaslo(moreacidic).#refreshtomakeJupyterhappy

red_df=pd.read_csv('winequality-red.csv',sep=';',header=0,index_col=False)

white_df=pd.read_csv('winequality-white.csv',sep=';',header=0,index_col=False)

#TODO:datacleansingfunctionshere,e.g.replacementofNaN

#ifthevariableyouwanttopredictiscontinuous,youcanmaprangesofvalues

#tointeger/binary/stringlabels

#forexample,mapthepHdatato'hi'and'lo'ifapHvalueismorethanor

#lessthanthemeanpH,respectively

M=np.mean(list(red_df['pH']))#expectinelegantcodeinthesemappings

Lf=lambdap:int(p<M)*'lo'+int(p>=M)*'hi'#someC-stylehackery

#createthenewclassifiablevariable

red_df['pH-hi-lo']=map(Lf,list(red_df['pH']))

#andremovethepredecessor

delred_df['pH']

Page 183: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Nowwe specifywhich dataset and variable youwant to predict by assigningvluestoSELECTED_DFandTARGET_VAR,respectively.

Weliketokeepaparameterfilewherewespecifydatasourcesandsuch.Thisletsmecreategenericanalyticscodethatiseasytoreuse.

Afterwehavespecifiedwhatdatasetwewanttostudy,wesplitthetrainingandtestdatasets.We thenscale (normalize) thedata,whichmakesmostclassifiersrunbetter.

Nowwepickaclassifier.Asyoucansee, therearemany to tryout,andevenmoreinscikit-learn’sdocumentationandmanyexamplesandtutorials.RandomForests are data scienceworkhorses.They are thego-tomethod formost datascientists. Be careful relying on them though–they tend to overfit.We try toavoidoverfittingbyseparatingthetrainingandtestdatasets.

10.5.4RandomForest

Nowwewilltestitoutwiththedefaultparameters.

Notethatthiscodeisboilerplate.Youcanuseitinterchangeablyformostscikit-

fromsklearn.model_selectionimporttrain_test_split

fromsklearn.preprocessingimportStandardScaler

fromsklearnimportmetrics

#makeselectionsherewithoutdiggingincode

SELECTED_DF=red_df#selecteddataset

TARGET_VAR='pH-hi-lo'#thepredictedvariable

#generatenamelessdatastructures

df=SELECTED_DF

target=np.array(df[TARGET_VAR]).ravel()

deldf[TARGET_VAR]#nocheating

#TODO:datacleansingfunctioncallshere

#splitdatasetsfortrainingandtesting

X_train,X_test,y_train,y_test=train_test_split(df,target,test_size=0.2)

#setupthescaler

scaler=StandardScaler()

scaler.fit(X_train)

#applythescaler

X_train=scaler.transform(X_train)

X_test=scaler.transform(X_test)

#pickaclassifier

fromsklearn.treeimportDecisionTreeClassifier,DecisionTreeRegressor,ExtraTreeClassifier,ExtraTreeRegressor

fromsklearn.ensembleimportRandomForestClassifier,ExtraTreesClassifier

clf=RandomForestClassifier()

Page 184: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

learnmodels.

Nowoutputtheresults.ForRandomForests,wegetafeatureranking.Relativeimportances usually exponentially decay. The first few highly-ranked featuresareusuallythemostimportant.

Featureranking:

fixed acidity 0.269778 citric acid 0.171337 density 0.089660 volatile acidity0.088965 chlorides 0.082945 alcohol 0.080437 total sulfur dioxide 0.067832sulphates0.047786freesulfurdioxide0.042727residualsugar0.037459quality0.021075

Sometimesit’seasiertovisualize.We’lluseabarchart.SeeFigure34

#testitout

model=clf.fit(X_train,y_train)

pred=clf.predict(X_test)

conf_matrix=metrics.confusion_matrix(y_test,pred)

var_score=clf.score(X_test,y_test)

#theresults

importances=clf.feature_importances_

indices=np.argsort(importances)[::-1]

#forthesakeofclarity

num_features=X_train.shape[1]

features=map(lambdax:df.columns[x],indices)

feature_importances=map(lambdax:importances[x],indices)

print'Featureranking:\n'

foriinrange(num_features):

feature_name=features[i]

feature_importance=feature_importances[i]

print'%s%f'%(feature_name.ljust(30),feature_importance)

plt.clf()

plt.bar(range(num_features),feature_importances)

plt.xticks(range(num_features),features,rotation=90)

plt.ylabel('relativeimportance(a.u.)')

plt.title('Relativeimportancesofmostpredictivefeatures')

plt.show()

Page 185: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure34:Result

10.5.5Acknowledgement

ThisnotebookwasdevelopedbyJulietteZerickandGregorvonLaszewski

10.6PARALLELCOMPUTINGINPYTHON☁�InthismodulewewillreviewtheavailablePythonmodulesthatcanbeusedforparallel computing. The parallel computing can be in form of either multi-threadingormulti-processing.Inmulti-threadingapproach,thethreadsruninthesame shared memory heap whereas in case of multi-processing, the memoryheaps of processes are separate and independent, therefore the communicationbetweentheprocessesarealittlebitmorecomplex.

10.6.1Multi-threadinginPython

ThreadinginPythonisperfectforI/Ooperationswheretheprocessisexpectedto be idle regularly, e.g. web scraping. This is a very useful feature because

importdask.dataframeasdd

red_df=dd.read_csv('winequality-red.csv',sep=';',header=0)

white_df=dd.read_csv('winequality-white.csv',sep=';',header=0)

Page 186: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

several applications and script might spend the majority of their runtime onwaiting for network or data I/O. In several cases, e.g. web scraping, theresources, i.e. downloading from different websites, are most of the timeindependent.Thereforetheprocessorcandownloadinparallelandjointheresultattheend.

10.6.1.1ThreadvsThreading

Therearetwobuilt-inmodulesinPythonthatarerelatedtothreading,namelythreadandthreading.TheformermoduleisdeprecatedforsometimeinPython2,andinPython3itisrenamedto_threadforthesakeofbackwardsincompatibilities.The_threadmoduleprovideslow-levelthreadingAPIformulti-threadinginPython,whereasthemodulethreadingbuildsahigh-levelthreadinginterfaceontopofit.

TheThread()isthemainmethodofthethreadingmodule,thetwoimportantargumentsof which are target, for specifying the callable object, and args to pass theargumentsforthetargetcallable.Weillustratetheseinthefollowingexample:

Thisistheoutputofthepreviousexample:

Incaseyouarenotfamiliarwiththeif__name__=='__main__:'statement,whatitdoesisbasicallymakingsurethatthecodenestedunderthisconditionwillberunonlyifyourunyourmoduleasaprogramanditwillnotrunincaseyourmoduleisimportedinanotherfile.

10.6.1.2Locks

Asmentionedprior,thememoryspaceissharedbetweenthethreads.Thisisat

importthreading

defhello_thread(thread_num):

print("HellofromThread",thread_num)

if__name__=='__main__':

forthread_numinrange(5):

t=threading.Thread(target=hello_thread,arg=(thread_num,))

t.start()

In[1]:%runthreading.py

HellofromThread0

HellofromThread1

HellofromThread2

HellofromThread3

HellofromThread4

Page 187: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

the same time beneficial and problematic: it is beneficial in a sense that thecommunication between the threads becomes easy, however, you mightexperience strange outcome if you let several threads change same variablewithoutcaution,e.g.thread2changesvariablexwhilethread1isworkingwithit.Thisiswhenlockcomesintoplay.Usinglock,youcanallowonlyonethreadtoworkwithavariable.Inotherwords,onlyasinglethreadcanholdthelock.Iftheother threadsneed toworkwith thatvariable, theyhave towaituntil theotherthreadisdoneandthevariableis“unlocked”.

Weillustratethiswithasimpleexample:

Supposewewant toprintmultiplesof3between1and12, i.e.3,6,9and12.Forthesakeofargument,wetrytodothisusing2threadsandanestedforloop.Thenwecreateaglobalvariablecalledcounterandweinitializeitwith0.Thenwhenever each of the incrementer1 or incrementer2 functions are called, the counter isincrementedby3 twice (counter is incrementedby6 in each functioncall). Ifyourunthepreviouscode,youshouldbereallyluckyifyougetthefollowingaspartofyouroutput:

Thereasonistheconflictthathappensbetweenthreadswhileincrementingthe

importthreading

globalcounter

counter=0

defincrementer1():

globalcounter

forjinrange(2):

foriinrange(3):

counter+=1

print("Greeter1incrementedthecounterby1")

print("Counteris%d"%counter)

defincrementer2():

globalcounter

forjinrange(2):

foriinrange(3):

counter+=1

print("Greeter2incrementedthecounterby1")

print("Counterisnow%d"%counter)

if__name__=='__main__':

t1=threading.Thread(target=incrementer1)

t2=threading.Thread(target=incrementer2)

t1.start()

t2.start()

Counterisnow3

Counterisnow6

Counterisnow9

Counterisnow12

Page 188: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

counter in thenestedfor loop.Asyouprobablynoticed, thefirst levelfor loopisequivalentofadding3 to thecounterand theconflict thatmighthappen isnoteffective on that level but the nested for loop.Accordingly, the output of thepreviouscodeisdifferentineveryrun.Thisisanexampleoutput:

We can fix this issue using a lock: whenever one of the function is going toincrementthevalueby3,itwillacquire()thelockandwhenitisdonethefunctionwillrelease()thelock.Thismechanismisillustratedinthefollowingcode:

Nomatterhowmanytimesyourunthiscode,theoutputwouldalwaysbeinthecorrectorder:

$python3lock_example.py

Greeter1incrementedthecounterby1

Greeter1incrementedthecounterby1

Greeter1incrementedthecounterby1

Counteris4

Greeter2incrementedthecounterby1

Greeter2incrementedthecounterby1

Greeter1incrementedthecounterby1

Greeter2incrementedthecounterby1

Greeter1incrementedthecounterby1

Counteris8

Greeter1incrementedthecounterby1

Greeter2incrementedthecounterby1

Counteris10

Greeter2incrementedthecounterby1

Greeter2incrementedthecounterby1

Counteris12

importthreading

increment_by_3_lock=threading.Lock()

globalcounter

counter=0

defincrementer1():

globalcounter

forjinrange(2):

increment_by_3_lock.acquire(True)

foriinrange(3):

counter+=1

print("Greeter1incrementedthecounterby1")

print("Counteris%d"%counter)

increment_by_3_lock.release()

defincrementer2():

globalcounter

forjinrange(2):

increment_by_3_lock.acquire(True)

foriinrange(3):

counter+=1

print("Greeter2incrementedthecounterby1")

print("Counteris%d"%counter)

increment_by_3_lock.release()

if__name__=='__main__':

t1=threading.Thread(target=incrementer1)

t2=threading.Thread(target=incrementer2)

t1.start()

t2.start()

$python3lock_example.py

Page 189: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Using the Threading module increases both the overhead associated with threadmanagementaswellasthecomplexityoftheprogramandthatiswhyinmanysituations,employingmultiprocessingmodulemightbeabetterapproach.

10.6.2Multi-processinginPython

We already mentioned that multi-threading might not be sufficient in manyapplicationsandwemightneedtousemultiprocessingsometime,orbettertosaymostof the times. That is why we are dedicating this subsection to this particularmodule.ThismoduleprovidesyouwithanAPIforspawningprocessesthewayyou spawn threads using threading module. Moreover, there are somefunctionalities that are not even available in threading module, e.g. the Pool classwhichallowsyoutorunabatchofjobsusingapoolofworkerprocesses.

10.6.2.1Process

Similartothreadingmodulewhichwasemployingthread(aka_thread)underthehood,multiprocessingemploystheProcessclass.Considerthefollowingexample:

In this example, after importing the Processmodulewecreated a greeter() function

Greeter1incrementedthecounterby1

Greeter1incrementedthecounterby1

Greeter1incrementedthecounterby1

Counteris3

Greeter1incrementedthecounterby1

Greeter1incrementedthecounterby1

Greeter1incrementedthecounterby1

Counteris6

Greeter2incrementedthecounterby1

Greeter2incrementedthecounterby1

Greeter2incrementedthecounterby1

Counteris9

Greeter2incrementedthecounterby1

Greeter2incrementedthecounterby1

Greeter2incrementedthecounterby1

Counteris12

frommultiprocessingimportProcess

importos

defgreeter(name):

proc_idx=os.getpid()

print("Process{0}:Hello{1}!".format(proc_idx,name))

if__name__=='__main__':

name_list=['Harry','George','Dirk','David']

process_list=[]

forname_idx,nameinenumerate(name_list):

current_process=Process(target=greeter,args=(name,))

process_list.append(current_process)

current_process.start()

forprocessinprocess_list:

process.join()

Page 190: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

thattakesanameandgreetsthatperson.Italsoprintsthepid(processidentifier)oftheprocessthatisrunningit.Notethatweusedtheosmoduletogetthepid.Inthebottomofthecodeaftercheckingthe__name__='__main__'condition,wecreateaseriesofProcessesandstartthem.Finallyinthelastforloopandusingthejoinmethod,wetell Python towait for the processes to terminate. This is one of the possibleoutputsofthecode:

10.6.2.2Pool

ConsiderthePoolclassasapoolofworkerprocesses.ThereareseveralwaysforassigningjobstothePoolclassandwewillintroducethemostimportantonesinthis section. These methods are categorized as blocking or non-blocking. The formermeans that after calling the API, it blocks the thread/process until it has theresultoranswerreadyandthecontrolreturnsonlywhenthecallcompletes.Inthenon-blockinontheotherhand,thecontrolreturnsimmediately.

10.6.2.2.1SynchronousPool.map()

WeillustratethePool.mapmethodbyre-implementingourpreviousgreeterexampleusingPool.map:

Asyoucansee,wehavesevennamesherebutwedonotwanttodedicateeachgreeting toa separateprocess. Insteadwedo thewhole jobof“greetingsevenpeople”using“twoprocesses”.Wecreateapoolof3processeswithPool(processes=3)syntax and then we map an iterable called names to the greeter function usingpool.map(greeter,names).Asweexpected,thegreetingsintheoutputwillbeprintedfromthreedifferentprocesses:

$python3process_example.py

Process23451:HelloHarry!

Process23452:HelloGeorge!

Process23453:HelloDirk!

Process23454:HelloDavid!

frommultiprocessingimportPool

importos

defgreeter(name):

pid=os.getpid()

print("Process{0}:Hello{1}!".format(pid,name))

if__name__=='__main__':

names=['Jenna','David','Marry','Ted','Jerry','Tom','Justin']

pool=Pool(processes=3)

sync_map=pool.map(greeter,names)

print("Done!")

Page 191: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Note that Pool.map() is in blocking category and does not return the control to yourscriptuntilitisdonecalculatingtheresults.ThatiswhyDone!isprintedafterallofthegreetingsareover.

10.6.2.2.2AsynchronousPool.map_async()

As the name implies, you can use the map_async method, when you want assignmany function calls to a pool of worker processes asynchronously. Note thatunlike map, the order of the results is not guaranteed (as oppose to map) and thecontrolisreturnedimmediately.Wenowimplementthepreviousexampleusingmap_async:

As you probably noticed, the only difference (clearly apart from the map_async

methodname)iscallingthe wait()methodin the last line.The wait()method tellsyourscripttowaitfortheresultofmap_asyncbeforeterminating:

Note that the order of the results are not preserved.Moreover, Done! is printerbefore anyof the results,meaning that ifwedonot use the wait()method, youprobablywillnotseetheresultatall.

$pythonpoolmap_example.py

Process30585:HelloJenna!

Process30586:HelloDavid!

Process30587:HelloMarry!

Process30585:HelloTed!

Process30585:HelloJerry!

Process30587:HelloTom!

Process30585:HelloJustin!

Done!

frommultiprocessingimportPool

importos

defgreeter(name):

pid=os.getpid()

print("Process{0}:Hello{1}!".format(pid,name))

if__name__=='__main__':

names=['Jenna','David','Marry','Ted','Jerry','Tom','Justin']

pool=Pool(processes=3)

async_map=pool.map_async(greeter,names)

print("Done!")

async_map.wait()

$pythonpoolmap_example.py

Done!

Process30740:HelloJenna!

Process30741:HelloDavid!

Process30740:HelloTed!

Process30742:HelloMarry!

Process30740:HelloJerry!

Process30741:HelloTom!

Process30742:HelloJustin!

Page 192: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

10.6.2.3Locks

Thewaymultiprocessingmoduleimplementslocksisalmostidenticaltothewaythethreadingmoduledoes.AfterimportingLockfrommultiprocessingallyouneedtodoistoacquireit,dosomecomputationandthenreleasethelock.WewillclarifytheuseofLockbyprovidinganexampleinnextsectionaboutprocesscommunication.

10.6.2.4ProcessCommunication

Process communication in multiprocessing is one of the most important, yetcomplicated,featuresforbetteruseofthismodule.Asopposetothreading,theProcessobjects will not have access to any shared variable by default, i.e. no sharedmemoryspacebetweentheprocessesbydefault.Thiseffectisillustratedinthefollowingexample:

Probably you already noticed that this is almost identical to our example inthreadingsection.Now,takealookatthestrangeoutput:

Asyoucansee,itisasiftheprocessesdoesnotseeeachother.Insteadofhavingtwoprocessesonecounting to6and theothercountingfrom6 to12,wehave

frommultiprocessingimportProcess,Lock,Value

importtime

globalcounter

counter=0

defincrementer1():

globalcounter

forjinrange(2):

foriinrange(3):

counter+=1

print("Greeter1:Counteris%d"%counter)

defincrementer2():

globalcounter

forjinrange(2):

foriinrange(3):

counter+=1

print("Greeter2:Counteris%d"%counter)

if__name__=='__main__':

t1=Process(target=incrementer1)

t2=Process(target=incrementer2)

t1.start()

t2.start()

$pythoncommunication_example.py

Greeter1:Counteris3

Greeter1:Counteris6

Greeter2:Counteris3

Greeter2:Counteris6

Page 193: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

twoprocessescountingto6.

Nevertheless, there are several ways that Processes from multiprocessing cancommunicatewitheachother,includingPipe,Queue,Value,ArrayandManager.Pipeand Queueare appropriate for inter-processmessage passing. To bemore specific, Pipe isuseful for process-to-process scenarios while Queue is more appropriate forprocesses-toprocessesones.ValueandArrayarebothusedtoprovideasynchronizedaccesstoashareddata(verymuchlikesharedmemory)andManagerscanbeusedondifferentdatatypes.Inthefollowingsub-sections,wecoverbothValueandArraysincetheyarebothlightweight,yetuseful,approaches.

10.6.2.4.1Value

The following example re-implements the broken example in the previoussection.Wefixthestrangeoutput,byusingbothLockandValue:

The usage of Lock object in this example is identical to the example in threadingsection.Theusageof counter ison theotherhand thenovelpart.First,note thatcounterisnotaglobalvariableanymoreandinsteaditisaValuewhichreturnsactypes object allocated from a shared memory between the processes. The firstargument 'i' indicates a signed integer, and the second argument defines the

frommultiprocessingimportProcess,Lock,Value

importtime

increment_by_3_lock=Lock()

defincrementer1(counter):

forjinrange(3):

increment_by_3_lock.acquire(True)

foriinrange(3):

counter.value+=1

time.sleep(0.1)

print("Greeter1:Counteris%d"%counter.value)

increment_by_3_lock.release()

defincrementer2(counter):

forjinrange(3):

increment_by_3_lock.acquire(True)

foriinrange(3):

counter.value+=1

time.sleep(0.05)

print("Greeter2:Counteris%d"%counter.value)

increment_by_3_lock.release()

if__name__=='__main__':

counter=Value('i',0)

t1=Process(target=incrementer1,args=(counter,))

t2=Process(target=incrementer2,args=(counter,))

t2.start()

t1.start()

Page 194: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

initializationvalue.Inthiscaseweareassigningasignedintegerinthesharedmemory initialized to size 0 to the counter variable.We thenmodified our twofunctionsandpassthissharedvariableasanargument.Finally,wechangethewayweincrementthecountersincecounterisnotanPythonintegeranymorebutactypes signed integerwherewe can access its value using the value attribute.Theoutputofthecodeisnowasweexpected:

Thelastexamplerelatedtoparallelprocessing,illustratestheuseofbothValueandArray,aswellasa technique topassmultiplearguments toa function.Note thattheProcessobjectdoesnotacceptmultipleargumentsforafunctionandthereforewe need this or similar techniques for passingmultiple arguments. Also, thistechniquecanalsobeusedwhenyouwanttopassmultipleargumentstomapormap_async:

$pythonmp_lock_example.py

Greeter2:Counteris3

Greeter2:Counteris6

Greeter1:Counteris9

Greeter1:Counteris12

frommultiprocessingimportProcess,Lock,Value,Array

importtime

fromctypesimportc_char_p

increment_by_3_lock=Lock()

defincrementer1(counter_and_names):

counter=counter_and_names[0]

names=counter_and_names[1]

forjinrange(2):

increment_by_3_lock.acquire(True)

foriinrange(3):

counter.value+=1

time.sleep(0.1)

name_idx=counter.value//3-1

print("Greeter1:Greeting{0}!Counteris{1}".format(names.value[name_idx],counter.value))

increment_by_3_lock.release()

defincrementer2(counter_and_names):

counter=counter_and_names[0]

names=counter_and_names[1]

forjinrange(2):

increment_by_3_lock.acquire(True)

foriinrange(3):

counter.value+=1

time.sleep(0.05)

name_idx=counter.value//3-1

print("Greeter2:Greeting{0}!Counteris{1}".format(names.value[name_idx],counter.value))

increment_by_3_lock.release()

if__name__=='__main__':

counter=Value('i',0)

names=Array(c_char_p,4)

names.value=['James','Tom','Sam','Larry']

t1=Process(target=incrementer1,args=((counter,names),))

t2=Process(target=incrementer2,args=((counter,names),))

t2.start()

t1.start()

Page 195: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Inthisexamplewecreatedamultiprocessing.Array()objectandassignedittoavariablecallednames.Aswementionedbefore,thefirstargumentisthectypedatatypeandsincewewanttocreateanarrayofstringswithlengthof4(secondargument),weimportedthec_char_pandpasseditasthefirstargument.

Instead of passing the arguments separately,wemerged both the Value and Arrayobjects in a tuple andpassed the tuple to the functions.We thenmodified thefunctions to unpack the objects in the first two lines in the both functions.Finally we changed the print statement in a way that each process greets aparticularname.Theoutputoftheexampleis:

10.7DASK☁�Dask is a python-based parallel computing library for analytics. Parallelcomputingisatypeofcomputationinwhichmanycalculationsortheexecutionofprocessesarecarriedoutsimultaneously.Largeproblemscanoftenbedividedintosmallerones,whichcanthenbesolvedconcurrently.

Daskiscomposedoftwocomponents:

1. Dynamic task scheduling optimized for computation. This is similar toAirflow, Luigi, Celery, or Make, but optimized for interactivecomputationalworkloads.

2. BigDatacollections like parallel arrays, dataframes, and lists that extendcommoninterfaceslikeNumPy,Pandas,orPythoniteratorstolarger-than-memoryordistributedenvironments.Theseparallelcollectionsrunontopofthedynamictaskschedulers.

Daskemphasizesthefollowingvirtues:

Familiar: Provides parallelized NumPy array and Pandas DataFrameobjects.Flexible:Providesa task scheduling interface formorecustomworkloadsandintegrationwithotherprojects.

$python3mp_lock_example.py

Greeter2:GreetingJames!Counteris3

Greeter2:GreetingTom!Counteris6

Greeter1:GreetingSam!Counteris9

Greeter1:GreetingLarry!Counteris12

Page 196: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Native: Enables distributed computing in Pure Pythonwith access to thePyDatastack.Fast:Operateswith lowoverhead, low latency, andminimal serializationnecessaryforfastnumericalalgorithmsScalesup:Runsresilientlyonclusterswith1000sofcoresScalesdown:TrivialtosetupandrunonalaptopinasingleprocessResponsive:Designedwithinteractivecomputinginminditprovidesrapidfeedbackanddiagnosticstoaidhumans

The section is structured in a number of subsections addressing the followingtopics:

Foundations:

anexplanationofwhatDaskis,howitworks,andhowtouselowerlevelprimitives to set up computations. Casual users may wish to skip thissection,althoughweconsideritusefulknowledgeforallusers.

DistributedFeatures:

information on runningDask on the distributed scheduler,which enablesscale-uptodistributedsettingsandenhancedmonitoringoftaskoperations.The distributed scheduler is now generally the recommended engine forexecutingtaskwork,evenonsingleworkstationsorlaptops.

Collections:

convenientabstractionsgivingafamiliarfeeltobigdata.

Bags:

Pythoniteratorswithafunctionalparadigm,suchasfoundinfunc/iter-toolsand toolz - generalize lists/generators to big data; this will seem veryfamiliartousersofPySpark’sRDD

Array:

massivemulti-dimensionalnumericaldata,withNumpyfunctionality

Page 197: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Dataframe:

massivetabulardata,withPandasfunctionality

10.7.1HowDaskWorks

Daskiscomputationtoolforlarger-than-memorydatasets,parallelexecutionordelayed/backgroundexecution.

WecansummarizethebasicsofDaskasfollows:

processdata thatdoesnot fit intomemorybybreaking it intoblocksandspecifyingtaskchainsparallelizeexecutionoftasksacrosscoresandevennodesofaclustermovecomputationtothedataratherthantheotherwayaround,tominimizecommunicationoverheads

Weusefor-loops tobuildbasic tasks,Python iterators,and theNumpy(array)and Pandas (dataframe) functions for multi-dimensional or tabular data,respectively.

Daskallowsus toconstructaprescriptionfor thecalculationwewant tocarryout.AmodulenamedDask.delayedletsusparallelizecustomcode.Itisusefulwhenever our problem doesn’t quite fit a high-level parallel object likedask.array or dask.dataframe but could still benefit from parallelism.Dask.delayedworksbydelayingourfunctionevaluationsandputtingthemintoadaskgraph.Hereisasmallexample:

Herewehaveusedthedelayedannotationtoshowthatwewantthesefunctionstooperatelazily-tosavethesetofinputsandexecuteonlyondemand.

10.7.2DaskBag

fromdaskimportdelayed

@delayed

definc(x):

returnx+1

@delayed

defadd(x,y):

returnx+y

Page 198: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Dask-bag excels in processing data that can be represented as a sequence ofarbitrary inputs. We’ll refer to this as “messy” data, because it can containcomplex nested structures, missing fields, mixtures of data types, etc. Thefunctional programming style fits very nicely with standard Python iteration,suchascanbefoundintheitertoolsmodule.

Messy data is often encountered at the beginning of data processing pipelineswhenlargevolumesofrawdataarefirstconsumed.TheinitialsetofdatamightbeJSON,CSV,XML,oranyotherformatthatdoesnotenforcestrictstructureanddatatypes.Forthisreason,theinitialdatamassagingandprocessingisoftendonewithPythonlists,dicts,andsets.

These core data structures are optimized for general-purpose storage andprocessing.Adding streaming computationwith iterators/generator expressionsorlibrarieslikeitertoolsortoolzletusprocesslargevolumesinasmallspace.Ifwe combine this with parallel processing then we can churn through a fairamountofdata.

Dask.bagisahighlevelDaskcollectiontoautomatecommonworkloadsofthisform.Inanutshell

YoucancreateaBagfromaPythonsequence,fromfiles,fromdataonS3,etc..

Bagobjectshold thestandardfunctionalAPIfoundinprojects like thePythonstandardlibrary,toolz,orpyspark,includingmap,filter,groupby,etc..

As with Array and DataFrame objects, operations on Bag objects create newbags.Callthe.compute()methodtotriggerexecution.

dask.bag=map,filter,toolz+parallelexecution

#eachelementisaninteger

importdask.bagasdb

b=db.from_sequence([1,2,3,4,5,6,7,8,9,10])

#eachelementisatextfileofJSONlines

importos

b=db.read_text(os.path.join('data','accounts.*.json.gz'))

#Requires`s3fs`library

#eachelementisaremoteCSVtextfile

b=db.read_text('s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv')

defis_even(n):

returnn%2==0

b=db.from_sequence([1,2,3,4,5,6,7,8,9,10])

Page 199: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

FormoredetailsonDaskBagcheckhttps://dask.pydata.org/en/latest/bag.html

10.7.3ConcurrencyFeatures

Dask supports a real-time task framework that extends Python’sconcurrent.futuresinterface.Thisinterfaceisgoodforarbitrarytaskscheduling,likedask.delayed,butisimmediateratherthanlazy,whichprovidessomemoreflexibility in situations where the computations may evolve over time. Thesefeatures depend on the second generation task scheduler found indask.distributed(which,despiteitsname,runsverywellonasinglemachine).

Daskallowsus tosimplyconstructgraphsof taskswithdependencies.Wecanfind that graphs can also be created automatically for us using functional,NumpyorPandassyntaxondatacollections.Noneofthiswouldbeveryuseful,if thereweren’talsoaway toexecute thesegraphs, inaparallelandmemory-awareway.Daskcomeswithfouravailableschedulers:

dask.threaded.get:aschedulerbackedbyathreadpooldask.multiprocessing.get:aschedulerbackedbyaprocesspooldask.async.get_sync:asynchronousscheduler,goodfordebuggingdistributed.Client.get: a distributed scheduler for executing graphs on multiplemachines.

Hereisasimpleprogramfordask.distributedlibrary:

For more details on Concurrent Features by Dask checkhttps://dask.pydata.org/en/latest/futures.html

10.7.4DaskArray

c=b.filter(is_even).map(lambdax:x**2)

c

#blockingform:waitforcompletion(whichisveryfastinthiscase)

c.compute()

fromdask.distributedimportClient

client=Client('scheduler:port')

futures=[]

forfninfilenames:

future=client.submit(load,fn)

futures.append(future)

summary=client.submit(summarize,futures)

summary.result()

Page 200: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Dask arrays implement a subset of theNumPy interface on large arrays usingblocked algorithms and task scheduling. These behave like numpy arrays, butbreakamassivejobintotasksthatarethenexecutedbyascheduler.Thedefaultschedulerusesthreadingbutyoucanalsousemultiprocessingordistributedorevenserialprocessing(mainlyfordebugging).Youcantellthedaskarrayhowtobreakthedataintochunksforprocessing.

For more details on Dask Array checkhttps://dask.pydata.org/en/latest/array.html

10.7.5DaskDataFrame

A Dask DataFrame is a large parallel dataframe composed of many smallerPandasdataframes,splitalongtheindex.Thesepandasdataframesmayliveondisk for larger-than-memory computing on a single machine, or on manydifferentmachines in a cluster.Dask.dataframe implements a commonly usedsubset of the Pandas interface including elementwise operations, reductions,groupingoperations,joins,timeseriesalgorithms,andmore.ItcopiesthePandasinterfacefor theseoperationsexactlyandsoshouldbeveryfamiliar toPandasusers.BecauseDask.dataframeoperationsmerelycoordinatePandasoperationstheyusuallyexhibitsimilarperformancecharacteristicsasarefoundinPandas.Torunthefollowingcode,save‘student.csv’fileinyourmachine.

importdask.arrayasda

f=h5py.File('myfile.hdf5')

x=da.from_array(f['/big-data'],chunks=(1000,1000))

x-x.mean(axis=1).compute()

importpandasaspd

df=pd.read_csv('student.csv')

d=df.groupby(df.HID).Serial_No.mean()

print(d)

ID

1011

1022

1043

1054

1065

1076

1097

1118

2019

20210

Name:Serial_No,dtype:int64

importdask.dataframeasdd

df=dd.read_csv('student.csv')

dt=df.groupby(df.HID).Serial_No.mean().compute()

print(dt)

ID

1011.0

Page 201: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

For more details on Dask DataFrame checkhttps://dask.pydata.org/en/latest/dataframe.html

10.7.6DaskDataFrameStorage

Efficient storage can dramatically improve performance, particularly whenoperatingrepeatedlyfromdisk.

Decompressing text and parsing CSV files is expensive. One of the mosteffective strategies with medium data is to use a binary storage format likeHDF5.

Createdataifwedon’thaveany

Firstwereadourcsvdataasbefore.

CSV and other text-based file formats are themost common storage for datafrommanysources,becausetheyrequireminimalpre-processing,canbewrittenline-by-lineandarehuman-readable.SincePandas’read_csviswell-optimized,CSVs are a reasonable input, but far from optimized, since reading requiredextensivetextparsing.

HDF5 and netCDF are binary array formats very commonly used in thescientificrealm.

1022.0

1043.0

1054.0

1065.0

1076.0

1097.0

1118.0

2019.0

20210.0

Name:Serial_No,dtype:float64

#besuretoshutdownotherkernelsrunningdistributedclients

fromdask.distributedimportClient

client=Client()

fromprepimportaccounts_csvs

accounts_csvs(3,1000000,500)

importos

filename=os.path.join('data','accounts.*.csv')

filename

importdask.dataframeasdd

df_csv=dd.read_csv(filename)

df_csv.head()

Page 202: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Pandas contains a specialized HDF5 format, HDFStore. Thedd.DataFrame.to_hdf method works exactly like the pd.DataFrame.to_hdfmethod.

For more information of Dask DataFrame Storage, clickhttp://dask.pydata.org/en/latest/dataframe-create.html

10.7.7Links

https://dask.pydata.org/en/latest/http://matthewrocklin.com/blog/work/2017/10/16/streaming-dataframes-1http://people.duke.edu/~ccc14/sta-663-2017/18A_Dask.htmlhttps://www.kdnuggets.com/2016/09/introducing-dask-parallel-programming.htmlhttps://pypi.python.org/pypi/dask/https://www.hdfgroup.org/2015/03/hdf5-as-a-zero-configuration-ad-hoc-scientific-database-for-python/https://github.com/dask/dask-tutorial

target=os.path.join('data','accounts.h5')

target

%timedf_csv.to_hdf(target,'/data')

df_hdf=dd.read_hdf(target,'/data')

df_hdf.head()

Page 203: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11APPLICATIONS

11.1FINGERPRINTMATCHING☁�

PleasenotethatNISThastemporarilyremovedtheFingerprintdataset.Weunfortunatelydonothaveacopyof thedataste. Ifyouhaveone,pleasenotifyus—

Pythonisaflexibleandpopularlanguageforrunningdataanalysispipelines.Inthissectionwewillimplementasolutionforafingerprintmatching.

11.1.1Overview

Fingerprint recognition refers to the automated method for verifying a matchbetweentwofingerprintsandthatisusedtoidentifyindividualsandverifytheiridentity. Fingerprints (Figure 35) are themost widely used form of biometricusedtoidentifyindividuals.

Figure35:Fingerprints

Theautomatedfingerprintmatchinggenerallyrequiredthedetectionofdifferentfingerprintfeatures(aggregatecharacteristicsofridges,andminutiapoints)andthen theuseof fingerprintmatchingalgorithm,whichcandobothone-to-oneand one-to- many matching operations. Based on the number of matches a

Page 204: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

proximityscore(distanceorsimilarity)canbecalculated.

WeusethefollowingNISTdatasetforthestudy:

Special Database 14 - NIST Mated Fingerprint Card Pairs 2.(http://www.nist.gov/itl/iad/ig/special\_dbases.cfm)

11.1.2Objectives

Match the fingerprint images from a probe set to a gallery set and report thematchscores.

11.1.3Prerequisites

Forthisworkwewillusethefollowingalgorithms:

MINDTCT:TheNISTminutiaedetector,whichautomatically locatesandrecords ridge ending and bifurcations in a fingerprint image.(http://www.nist.gov/itl/iad/ig/nbis.cfm)BOZORTH3:ANISTfingerprintmatchingalgorithm,whichisaminutiaebasedfingerprint-matchingalgorithm.Itcandobothone-to-oneandone-to-manymatchingoperations.(http://www.nist.gov/itl/iad/ig/nbis.cfm)

Inordertofollowalong,youmusthavetheNBIStoolswhichprovidemindtctandbozorth3 installed. If you are on Ubuntu 16.04 Xenial, the following steps willaccomplishthis:

11.1.4Implementation

1. Fetchthefingerprintimagesfromtheweb2. Callouttoexternalprogramstoprepareandcomputethematchscoreds3. Storetheresultsinadatabase4. Generateaplottoidentifylikelymatches.

$sudoapt-getupdate-qq

$sudoapt-getinstall-ybuild-essentialcmakeunzip

$wget"http://nigos.nist.gov:8080/nist/nbis/nbis_v5_0_0.zip"

$unzip-dnbisnbis_v5_0_0.zip

$cdnbis/Rel_5.0.0

$./setup.sh/usr/local--without-X11

$sudomake

Page 205: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

wewillbeinteractingwiththeoperatingsystemandmanipulatingfilesandtheirpathnames.

Somegeneralusefullutilities

Usingtheattrslibraryprovidessomeniceshortcutstodefiningobjects

wewill be randomly dividing the entire dataset, based on user input, into theprobeandgallerystets

wewill need to call out to theNBIS software.wewill also be usingmultipleprocessestotakeadvantageofallthecoresonourmachine

Asforplotting,wewillusematplotlib,thoughtherearemanyalternatives.

Finally,wewillwritetheresultstoadatabase.

11.1.5Utilityfunctions

Next,wewilldefinesomeutilityfunctions:

from__future__importprint_function

importurllib

importzipfile

importhashlib

importos.path

importos

importsys

importshutil

importtempfile

importitertools

importfunctools

importtypes

frompprintimportpprint

importattr

importsys

importrandom

importsubprocess

importmultiprocessing

importmatplotlib.pyplotasplt

importpandasaspd

importnumpyasnp

importsqlite3

Page 206: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11.1.6Dataset

wewillnowdefinesomeglobalparameters

First,thefingerprintdataset

deftake(n,iterable):

"Returnsageneratorofthefirst**n**elementsofaniterable"

returnitertools.islice(iterable,n)

defzipWith(function,*iterables):

"Zipasetof**iterables**togetherandapply**function**toeachtuple"

forgroupinitertools.izip(*iterables):

yieldfunction(*group)

defuncurry(function):

"TransformsanN-arry**function**sothatitacceptsasingleparameterofanN-tuple"

@functools.wraps(function)

defwrapper(args):

returnfunction(*args)

returnwrapper

deffetch_url(url,sha256,prefix='.',checksum_blocksize=2**20,dryRun=False):

"""Downloadaurl.

:paramurl:theurltothefileontheweb

:paramsha256:theSHA-256checksum.Usedtodetermineifthefilewaspreviouslydownloaded.

:paramprefix:directorytosavethefile

:paramchecksum_blocksize:blocksizetousedwhencomputingthechecksum

:paramdryRun:booleanindicatingthatcallingthisfunctionshoulddonothing

:returns:thelocalpathtothedownloadedfile

:rtype:

"""

ifnotos.path.exists(prefix):

os.makedirs(prefix)

local=os.path.join(prefix,os.path.basename(url))

ifdryRun:returnlocal

ifos.path.exists(local):

print('Verifyingchecksum')

chk=hashlib.sha256()

withopen(local,'rb')asfd:

whileTrue:

bits=fd.read(checksum_blocksize)

ifnotbits:break

chk.update(bits)

ifsha256==chk.hexdigest():

returnlocal

print('Downloading',url)

defreport(sofar,blocksize,totalsize):

msg='{}%\r'.format(100*sofar*blocksize/totalsize,100)

sys.stderr.write(msg)

urllib.urlretrieve(url,local,report)

returnlocal

DATASET_URL='https://s3.amazonaws.com/nist-srd/SD4/NISTSpecialDatabase4GrayScaleImagesofFIGS.zip'

DATASET_SHA256='4db6a8f3f9dc14c504180cbf67cdf35167a109280f121c901be37a80ac13c449'

Page 207: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

We’lldefinehowtodownloadthedataset.Thisfunctionisgeneralenoughthatitcouldbeusedtoretrievemostfiles,butwewilldefaultittousethevaluesfromprevious.

11.1.7DataModel

we will define some classes so we have a nice API for working with thedataflow. We set slots=True so that the resulting objects will be more space-efficient.

11.1.7.1Utilities

11.1.7.1.1Checksum

The checksum consists of the actual hash value (value) as well as a stringrepresentingthehashingalgorithm.Thevalidatorenforcesthatthealgorithcanonlybeoneofthelistedacceptablemethods

defprepare_dataset(url=None,sha256=None,prefix='.',skip=False):

url=urlorDATASET_URL

sha256=sha256orDATASET_SHA256

local=fetch_url(url,sha256=sha256,prefix=prefix,dryRun=skip)

ifnotskip:

print('Extracting',local,'to',prefix)

withzipfile.ZipFile(local,'r')aszip:

zip.extractall(prefix)

name,_=os.path.splitext(local)

returnname

deflocate_paths(path_md5list,prefix):

withopen(path_md5list)asfd:

forlineinitertools.imap(str.strip,fd):

parts=line.split()

ifnotlen(parts)==2:continue

md5sum,path=parts

chksum=Checksum(value=md5sum,kind='md5')

filepath=os.path.join(prefix,path)

yieldPath(checksum=chksum,filepath=filepath)

deflocate_images(paths):

defpredicate(path):

_,ext=os.path.splitext(path.filepath)

returnextin['.png']

forpathinitertools.ifilter(predicate,paths):

yieldimage(id=path.checksum.value,path=path)

@attr.s(slots=True)

classChecksum(object):

value=attr.ib()

kind=attr.ib(validator=lambdao,a,v:vin'md5sha1sha224sha256sha384sha512'.split())

Page 208: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11.1.7.1.2Path

Pathsrefertoanimage'sfilepathandassociatedChecksum.Wegetthechecksum"for"free"sincetheMD5hashisprovidedforeachimageinthedataset.

11.1.7.1.3Image

Thestartofthedatapipelineistheimage.Animagehasanid(themd5hash)andthepathtotheimage.

11.1.7.2Mindtct

Thenextstepinthepipelineis toapplythe mindtctprogramfromNBIS.A mindtctobjectthereforerepresentstheresultsofapplyingmindtctonanimage.Thexytoutputisneededforthenextstep,andtheimageattributerepresentstheimageid.

Weneedawaytoconstructamindtctobjectfromanimageobject.Astraightforwardway of doing this would be to have a from_image @staticmethod or @classmethod, but thatdoesn'tworkwellwithmultiprocessingastop-levelfunctionsworkbestastheyneedtobeserialized.

@attr.s(slots=True)

classPath(object):

checksum=attr.ib()

filepath=attr.ib()

@attr.s(slots=True)

classimage(object):

id=attr.ib()

path=attr.ib()

@attr.s(slots=True)

classmindtct(object):

image=attr.ib()

xyt=attr.ib()

defpretty(self):

d=dict(id=self.image.id,path=self.image.path)

returnpprint(d)

defmindtct_from_image(image):

imgpath=os.path.abspath(image.path.filepath)

tempdir=tempfile.mkdtemp()

oroot=os.path.join(tempdir,'result')

cmd=['mindtct',imgpath,oroot]

try:

subprocess.check_call(cmd)

withopen(oroot+'.xyt')asfd:

xyt=fd.read()

result=mindtct(image=image.id,xyt=xyt)

Page 209: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11.1.7.3Bozorth3

Thefinal step in thepipeline is running the bozorth3 fromNBIS.The bozorth3 classrepresentsthematchbeingdone:trackingtheidsoftheprobeandgalleryimagesaswellasthematchscore.

Sincewewillbewritingtheseinstanceouttoadatabase,weprovidesomestaticmethods for SQL statements. While there are many Object-Relational-Model(ORM) libraries available for Python, this approach keeps the currentimplementationsimple.

Inordertoworkwellwithmultiprocessing,wedefineaclassrepresentuingtheinputparamaters to bozorth3 and a helper function to run bozorth3. Thisway the pipelinedefinitioncanbekeptsimpletoamaptocreatetheinputandthenamaptoruntheprogram.

AsNBIS bozorth3 can be called to compare one-to-one or one-to-many,wewillalsodynamicallychoosebetween theseapproachesdependingon if thegalleryattributeisalistorasingleobject.

returnresult

finally:

shutil.rmtree(tempdir)

@attr.s(slots=True)

classbozorth3(object):

probe=attr.ib()

gallery=attr.ib()

score=attr.ib()

@staticmethod

defsql_stmt_create_table():

return'CREATETABLEIFNOTEXISTSbozorth3'\

+'(probeTEXT,galleryTEXT,scoreNUMERIC)'

@staticmethod

defsql_prepared_stmt_insert():

return'INSERTINTObozorth3VALUES(?,?,?)'

defsql_prepared_stmt_insert_values(self):

returnself.probe,self.gallery,self.score

@attr.s(slots=True)

classbozorth3_input(object):

probe=attr.ib()

gallery=attr.ib()

defrun(self):

ifisinstance(self.gallery,mindtct):

returnbozorth3_from_one_to_one(self.probe,self.gallery)

elifisinstance(self.gallery,types.ListType):

returnbozorth3_from_one_to_many(self.probe,self.gallery)

else:

raiseValueError('Unhandledtypeforgallery:{}'.format(type(gallery)))

Page 210: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

The next is the top-level function to running bozorth3. It accepts an instance ofbozorth3_input. The is implemented as a simple top-levelwrapper so that it can beeasilypassedtothemultiprocessinglibrary.

11.1.7.3.1RunningBozorth3

There are two cases to handle: 1.One-to-one probe to gallery sets 1.One-to-manyprobetogallerysets

Both approaches are implemented next. The implementations follow the samepattern:1.Createatemporarydirectorywithinwithtowork1.Writetheprobeand gallery images to files in the temporary directory 1. Call the bozorth3

executable 1. The match score is written to stdout which is captured and thenparsed.1.Returnabozorth3 instanceforeachmatch1.Makesuretocleanupthetemporarydirectory

11.1.7.3.1.1One-to-one

11.1.7.3.1.2One-to-many

defrun_bozorth3(input):

returninput.run()

defbozorth3_from_one_to_one(probe,gallery):

tempdir=tempfile.mkdtemp()

probeFile=os.path.join(tempdir,'probe.xyt')

galleryFile=os.path.join(tempdir,'gallery.xyt')

withopen(probeFile,'wb')asfd:fd.write(probe.xyt)

withopen(galleryFile,'wb')asfd:fd.write(gallery.xyt)

cmd=['bozorth3',probeFile,galleryFile]

try:

result=subprocess.check_output(cmd)

score=int(result.strip())

returnbozorth3(probe=probe.image,gallery=gallery.image,score=score)

finally:

shutil.rmtree(tempdir)

defbozorth3_from_one_to_many(probe,galleryset):

tempdir=tempfile.mkdtemp()

probeFile=os.path.join(tempdir,'probe.xyt')

galleryFiles=[os.path.join(tempdir,'gallery%d.xyt'%i)

fori,_inenumerate(galleryset)]

withopen(probeFile,'wb')asfd:fd.write(probe.xyt)

forgalleryFile,galleryinitertools.izip(galleryFiles,galleryset):

withopen(galleryFile,'wb')asfd:fd.write(gallery.xyt)

cmd=['bozorth3','-p',probeFile]+galleryFiles

try:

result=subprocess.check_output(cmd).strip()

scores=map(int,result.split('\n'))

Page 211: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11.1.8Plotting

Forplottingwewilloperateonlyonthedatabase.wewillselectasmallnumberof probe images and plot the score between them and the rest of the galleryimages.

Themk_short_labelshelperfunctionwillbedefinednext.

Theimageidsarelonghashstrings.Inorderetominimizetheamountofspaceon the figure the labelsoccupy,weprovideahelper function tocreatea shortlabelthatstilluniquelyidentifieseachprobeimageintheselectedsample

11.1.9PuttingitallTogether

First,setupatemporarydirectoryinwhichtowork:

NextwedownloadandextractthefingerprintimagesfromNIST:

return[bozorth3(probe=probe.image,gallery=gallery.image,score=score)

forscore,galleryinzip(scores,galleryset)]

finally:

shutil.rmtree(tempdir)

defplot(dbfile,nprobes=10):

conn=sqlite3.connect(dbfile)

results=pd.read_sql(

"SELECTDISTINCTprobeFROMbozorth3ORDERBYscoreLIMIT'%s'"%nprobes,

con=conn

)

shortlabels=mk_short_labels(results.probe)

plt.figure()

fori,probeinresults.probe.iteritems():

stmt='SELECTgallery,scoreFROMbozorth3WHEREprobe=?ORDERBYgalleryDESC'

matches=pd.read_sql(stmt,params=(probe,),con=conn)

xs=np.arange(len(matches),dtype=np.int)

plt.plot(xs,matches.score,label='probe%s'%shortlabels[i])

plt.ylabel('Score')

plt.xlabel('Gallery')

plt.legend(bbox_to_anchor=(0,0,1,-0.2))

plt.show()

defmk_short_labels(series,start=7):

forsizeinxrange(start,len(series[0])):

iflen(series)==len(set(map(lambdas:s[:size],series))):

break

returnmap(lambdas:s[:size],series)

pool=multiprocessing.Pool()

prefix='/tmp/fingerprint_example/'

ifnotos.path.exists(prefix):

os.makedirs(prefix)

%%time

dataprefix=prepare_dataset(prefix=prefix)

Page 212: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Nextwewill configure the location of of theMD5 checksum file that comeswiththedownload

Loadtheimagesfromthedownloadedfilestostarttheanalysispipeline

Wecan examineoneof the loaded image.Note that image is refers to theMD5checksumthatcamewiththeimageandthexytattributerepresentstherawimagedata.

For example purposeswewill only a use a small percentage of the database,randomlyselected,forpurprobeandgallerydatasets.

Wecannowcompute thematchingscoresbetween theprobeandgallerysets.Thiswilluseallcoresavailableonthisworkstation.

VerifyingchecksumExtracting

/tmp/fingerprint_example/NISTSpecialDatabase4GrayScaleImagesofFIGS.zip

to/tmp/fingerprint_example/CPUtimes:user3.34s,sys:645ms,

total:3.99sWalltime:4.01s

md5listpath=os.path.join(prefix,'NISTSpecialDatabase4GrayScaleImagesofFIGS/sd04/sd04_md5.lst')

%%time

print('Loadingimages')

paths=locate_paths(md5listpath,dataprefix)

images=locate_images(paths)

mindtcts=pool.map(mindtct_from_image,images)

print('Done')

LoadingimagesDoneCPUtimes:user187ms,sys:17ms,total:204ms

Walltime:1min21s

print(mindtcts[0].image)

print(mindtcts[0].xyt[:50])

98b15d56330cb17f1982ae79348f711d141462146252382237255118020

30332214

perc_probe=0.001

perc_gallery=0.1

%%time

print('Generatingsamples')

probes=random.sample(mindtcts,int(perc_probe*len(mindtcts)))

gallery=random.sample(mindtcts,int(perc_gallery*len(mindtcts)))

print('|Probes|=',len(probes))

print('|Gallery|=',len(gallery))

Generatingsamples=4=400CPUtimes:user2ms,sys:0ns,total:2

msWalltime:993µs

%%time

print('Matching')

input=[bozorth3_input(probe=probe,gallery=gallery)

forprobeinprobes]

bozorth3s=pool.map(run_bozorth3,input)

Page 213: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

bozorth3sisnowalistoflistsofbozorth3instances.

Nowaddtheresultstothedatabase

Wenowplottheresults.Figure36

MatchingCPUtimes:user19ms,sys:1ms,total:20msWalltime:1.07

s

print('|Probes|=',len(bozorth3s))

print('|Gallery|=',len(bozorth3s[0]))

print('Result:',bozorth3s[0][0])

=4=400Result:bozorth3(probe='caf9143b268701416fbed6a9eb2eb4cf',

gallery='22fa0f24998eaea39dea152e4a73f267',score=4)

dbfile=os.path.join(prefix,'scores.db')

conn=sqlite3.connect(dbfile)

cursor=conn.cursor()

cursor.execute(bozorth3.sql_stmt_create_table())

<sqlite3.Cursorat0x7f8a2f677490>

%%time

forgroupinbozorth3s:

vals=map(bozorth3.sql_prepared_stmt_insert_values,group)

cursor.executemany(bozorth3.sql_prepared_stmt_insert(),vals)

conn.commit()

print('Insertedresultsforprobe',group[0].probe)

Insertedresultsforprobecaf9143b268701416fbed6a9eb2eb4cfInserted

resultsforprobe55ac57f711eba081b9302eab74dea88eInsertedresultsfor

probe4ed2d53db3b5ab7d6b216ea0314beb4fInsertedresultsforprobe

20f68849ee2dad02b8fb33ecd3ece507CPUtimes:user2ms,sys:3ms,total:

5msWalltime:3.57ms

plot(dbfile,nprobes=len(probes))

Page 214: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure36:Result

11.2NISTPEDESTRIANANDFACEDETECTION �☁�

No

Pedestrian and Face Detection uses OpenCV to identify people standing in apicture or a video and NIST use case in this document is built with ApacheSparkandMesosclustersonmultiplecomputenodes.

The example in this tutorial deploys software packages on OpenStack usingAnsiblewithitsroles.SeeFigure37,Figure38,Figure39,Figure40

cursor.close()

Page 215: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure37:Original

Figure38:PedestrianDetected

Figure39:Original

Page 216: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure40:PedestrianandFace/eyesDetected

11.2.0.1Introduction

Human (pedestrian) detection and face detection have been studied during thelastseveralyearsandmodelsforthemhaveimprovedalongwithHistogramsofOriented Gradients (HOG) for Human Detection [1]. OpenCV is a ComputerVision library including the SVM classifier and the HOG object detector forpedestriandetectionandINRIAPersonDataset[2]isoneofpopularsamplesforboth trainingand testingpurposes. In thisdocument,wedeployApacheSparkon Mesos clusters to train and apply detection models from OpenCV usingPythonAPI.

11.2.0.1.1INRIAPersonDataset

Thisdatasetcontainspositiveandnegativeimagesfortrainingandtestpurposeswithannotationfilesforuprightpersonsineachimage.288positivetestimages,453 negative test images, 614 positive training images and 1218 negativetraining images are included along with normalized 64x128 pixel formats.970MBdatasetisavailabletodownload[3].

11.2.0.1.2HOGwithSVMmodel

HistogramofOrientedGradient(HOG)andSupportVectorMachine(SVM)areused as object detectors and classifiers and built-in python libraries fromOpenCVprovidethesemodelsforhumandetection.

Page 217: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11.2.0.1.3AnsibleAutomationTool

Ansible is a python tool to install/configure/manage software on multiplemachines with JSON files where system descriptions are defined. There arereasonswhyweuseAnsible:

Expandable:LeveragesPython(default)butmodulescanbewritteninanylanguage

Agentless:nosetuprequiredonmanagednode

Security:Allowsdeploymentfromuserspace;usessshforauthentication

Flexibility:onlyrequiressshaccesstoprivilegeduser

Transparency:YAMLBasedscriptfilesexpressthestepsofinstallingandconfiguringsoftware

Modularity: SingleAnsible Role (should) contain all required commandsandvariablestodeploysoftwarepackageindependently

Sharingandportability:rolesareavailablefromsource(github,bitbucket,gitlab,etc)ortheAnsibleGalaxyportal

We use Ansible roles to install software packages for Humand and FaceDetection which requires to run OpenCV Python libraries on Apache Mesoswithaclusterconfiguration.Datasetisalsodownloadedfromthewebusinganansiblerole.

11.2.0.2DeploymentbyAnsible

Ansible is to deploy applications and build clusters for batch-processing largedatasets towards targetmachines e.g.VM instancesonOpenStackandweuseansibleroleswithincludedirectivetoorganizelayersofbigdatasoftwarestacks(BDSS). Ansible provides abstractions by Playbook Roles and reusability byInclude statements.We defineX application inXAnsible Role, for example,anduseincludestatementstocombinewithotherapplicationse.g.YorZ.Thelayers exist in sub directories (see next) to add modularity to your Ansible

Page 218: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

deployment. For example, there are five roles used in this example that areApache Mesos in a scheduler layer, Apache Spark in a processing layer, aOpenCVlibraryinanapplicationlayer,INRIAPersonDatasetinadatasetlayerand a python script for human and facedetection in an analytics layer. If youhaveanadditionalsoftwarepackagetoadd,youcansimplyaddanewroleinamainansibleplaybookwithincludedirective.Withthis,yourAnsibleplaybookmaintainssimplebutflexibletoaddmoreroleswithouthavingalargesinglefilewhichisgettingdifficulttoreadwhenitdeploysmoreapplicationsonmultiplelayers.ThemainAnsibleplaybookrunsAnsiblerolesinorderwhichlooklike:

Directory names e.g. sched, proc, data, or anlys indicate BDSS layers like: -sched: scheduler layer - proc: data processing layer - apps: application layer -data:datasetlayer-anlys:analyticslayerandtwodigitsinthefilenameindicateanorderofrolestoberun.

11.2.0.3CloudmeshforProvisioning

It is assumed that virtual machines are created by cloudmesh, the cloudmanagementsoftware.ForexampleonOpenStack,cmclustercreate-N=6

commandstartsasetofvirtualmachineinstances.Thenumberofmachinesandgroups for clusters e.g. namenodes and datanodes are defined in the Ansibleinventory file, a list of target machines with groups, which will be generatedoncemachinesarereadytousebycloudmesh.Ansiblerolesinstallsoftwareanddatasetonvirtualclustersafterthatstage.

11.2.0.4RolesExplainedforInstallation

Mesos role is installed first as a scheduler layer formasters and slaveswheremesos-master runs on the masters group and mesos-slave runs on the slavesgroup.ApacheZookeeper is included in themesosrole thereforemesosslavesfind an electedmesos leader for the coordination. Spark, as a data processing

```

include:sched/00-mesos.yml

include:proc/01-spark.yml

include:apps/02-opencv.yml

include:data/03-inria-dataset.yml

Include:anlys/04-human-face-detection.yml

```

Page 219: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

layer,providestwooptionsfordistributedjobprocessing,batchjobprocessingvia a cluster mode and real-time processing via a client mode. The MesosdispatcherrunsonamastersgrouptoacceptabatchjobsubmissionandSparkinteractiveshell,whichistheclientmode,providesreal-timeprocessingonanynodeinthecluster.Eitherway,Sparkisinstalledlatertodetectamaster(leader)hostforajobsubmission.OtherrolesforOpenCV,INRIAPersonDatasetandHumanandFaceDetectionPythonapplicationsarefollowedby.

Thefollowingsoftwareareexpectedinthestacksaccordingtothegithub:

mesoscluster(master,worker)

spark(withdispatcherformesosclustermode)

openCV

zookeeper

INRIAPersonDataset

DetectionAnalyticsinPython

[1]Dalal,Navneet,andBillTriggs.“Histogramsoforientedgradients forhumandetection.”2005IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR’05).Vol.1.IEEE,

2005.[pdf]

[2]http://pascal.inrialpes.fr/data/human/

[3]ftp://ftp.inrialpes.fr/pub/lear/douze/data/INRIAPerson.tar

[4]https://docs.python.org/2/library/configparser.html

11.2.0.4.1ServergroupsforMasters/SlavesbyAnsibleinventory

Wemayseparatecomputenodesintwogroups:mastersandworkersthereforeMesos masters and zookeeper quorums manage job requests and leaders andworkers run actual tasks. Ansible needs group definitions in their inventory

Page 220: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

thereforesoftwareinstallationassociatedwithaproperpartcanbecompleted.

ExampleofAnsibleInventoryfile(inventory.txt)

11.2.0.5InstructionsforDeployment

The following commands complete NIST Pedestrian and Face DetectiondeploymentonOpenStack.

11.2.0.5.1CloningPedestrianDetectionRepositoryfromGithub

Rolesareincludedassubmoduleswhichrequire--recursiveoptiontocheckoutthemall.

Changethefollowingvariablewithactualipaddresses:

Createainventory.txtfilewiththevariableinyourlocaldirectory.

Addansible.cfgfilewithoptionsforsshhostkeycheckingandloginname.

Checkaccessibilitybyansiblepinglike:

[masters]

10.0.5.67

10.0.5.68

10.0.5.69

[slaves]

10.0.5.70

10.0.5.71

10.0.5.72

$gitclone--recursivehttps://github.com/futuresystems/pedestrian-and-face-detection.git

sample_inventory="""[masters]

10.0.5.67

10.0.5.68

10.0.5.69

[slaves]

10.0.5.70

10.0.5.71

10.0.5.72"""

!printf"$sample_inventory">inventory.txt

!catinventory.txt

ansible_config="""[defaults]

host_key_checking=false

remote_user=ubuntu"""

!printf"$ansible_config">ansible.cfg

!catansible.cfg

!ansible-mping-iinventory.txtall

Page 221: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Makesure thatyouhaveacorrectsshkey inyouraccountotherwiseyoumayencounter‘FAILURE’inthepreviouspingtest.

11.2.0.5.2AnsiblePlaybook

We use a main ansible playbook to deploy software packages for NISTPedestrian and Face detection which includes: - mesos - spark -zookeeper -opencv-INRIAPersondataset-Pythonscriptforthedetection

Theinstallationmaytake30minutesoranhourtocomplete.

11.2.0.6OpenCVinPython

Beforewe run our code for this project, let’s tryOpenCV first to see how itworks.

11.2.0.6.1Importcv2

Let us import opencv pythonmodule andwewill use images from the onlinedatabase image-net.org to test OpenCV image recognition. See Figure 41,Figure42

Letusdownloadamailboximagewitharedcolortoseeifopencvidentifiestheshapewithacolor.Theexamplefileinthistutorialis:

100167k100167k00686k0–:–:––:–:––:–:–684k

!cdpedestrian-and-face-detection/&&ansible-playbook-i../inventory.txtsite.yml

importcv2

$curlhttp://farm4.static.flickr.com/3061/2739199963_ee78af76ef.jpg>mailbox.jpg

%matplotlibinline

fromIPython.displayimportImage

mailbox_image="mailbox.jpg"

Image(filename=mailbox_image)

Page 222: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure41:Mailboximage

You can try other images. Check out the image-net.org for mailbox images:http://image-net.org/synset?wnid=n03710193

11.2.0.6.2ImageDetection

Justforatest,let’strytodetectaredcolorshapedmailboxusingopencvpythonfunctions.

Therearekeyfunctionsthatweuse:*cvtColor:toconvertacolorspaceofanimage * inRange: to detect a mailbox based on the range of red color pixelvalues * np.array: to define the range of red color using aNumpy library forbettercalculation*findContours:tofindaoutlineoftheobject*bitwise_and:toblack-outtheareaofcontoursfoundimportnumpyasnp

importmatplotlib.pyplotasplt

#imreadforloadinganimage

img=cv2.imread(mailbox_image)

#cvtColorforcolorconversion

hsv=cv2.cvtColor(img,cv2.COLOR_BGR2HSV)

#definerangeofredcolorinhsv

lower_red1=np.array([0,50,50])

upper_red1=np.array([10,255,255])

lower_red2=np.array([170,50,50])

upper_red2=np.array([180,255,255])

#thresholdthehsvimagetogetonlyredcolors

mask1=cv2.inRange(hsv,lower_red1,upper_red1)

mask2=cv2.inRange(hsv,lower_red2,upper_red2)

Page 223: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure42:Maskedimage

Theredcolormailboxisleftaloneintheimagewhichwewantedtofindinthisexamplebyopencvfunctions.Youcantryotherimageswithdifferentcolorstodetect the different shape of objects using findContours and inRange fromopencv.

Formoreinformation,seethenextusefullinks.

contours features:http://docs.opencv.org/3.1.0/dd/d49/tutorial/_py/_contour/_features.html

contours:http://docs.opencv.org/3.1.0/d4/d73/tutorial/_py/_contours/_begin.html

mask=mask1+mask2

#findaredcolormailboxfromtheimage

im2,contours,hierarchy=cv2.findContours(mask,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)

#bitwise_andtoremoveotherareasintheimageexceptthedetectedobject

res=cv2.bitwise_and(img,img,mask=mask)

#turnoff-x,yaxisbar

plt.axis("off")

#textforthemaskedimage

cv2.putText(res,"maskedimage",(20,300),cv2.FONT_HERSHEY_SIMPLEX,2,(255,255,255))

#display

plt.imshow(cv2.cvtColor(res,cv2.COLOR_BGR2RGB))

plt.show()

Page 224: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

redcolorinhsv:http://stackoverflow.com/questions/30331944/finding-red-color-using-python-opencv

inrange:http://docs.opencv.org/master/da/d97/tutorial/_threshold/_inRange.html

inrange: http://docs.opencv.org/3.0-beta/doc/py/_tutorials/py/_imgproc/py/_colorspaces/py/_colorspaces.html

numpy: http://docs.opencv.org/3.0-beta/doc/py/_tutorials/py/_core/py/_basic/_ops/py/_basic/_ops.html

11.2.0.7HumanandFaceDetectioninOpenCV

11.2.0.7.1INRIAPersonDataset

Weuse INRIAPersondataset to detect upright people and faces in images inthisexample.Letusdownloaditfirst.

100969M100969M008480k00:01:570:01:57–:–:–12.4M

11.2.0.7.2FaceDetectionusingHaarCascades

This section is prepared based on the opencv-python tutorial:http://docs.opencv.org/3.1.0/d7/d8b/tutorial/_py/_face/_detection.html#gsc.tab=0

Thereisapre-trainedclassifierforfacedetection,downloaditfromhere:

100908k100908k002225k0–:–:––:–:––:–:–2259k

This classifierXML filewill be used to detect faces in images. If you like tocreate a new classifier, find out more information about training from here:http://docs.opencv.org/3.1.0/dc/d88/tutorial/_traincascade.html

$curlftp://ftp.inrialpes.fr/pub/lear/douze/data/INRIAPerson.tar>INRIAPerson.tar

$tarxvfINRIAPerson.tar>logfile&&taillogfile

$curlhttps://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/haarcascade_frontalface_default.xml>haarcascade_frontalface_default.xml

Page 225: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

11.2.0.7.3FaceDetectionPythonCodeSnippet

Now, we detect faces from the first five images using the classifier. SeeFigure 43, Figure 44, Figure 45, Figure 46, Figure 47, Figure 48, Figure 49,Figure50,Figure51,Figure52,Figure53#importthenecessarypackages

from__future__importprint_function

importnumpyasnp

importcv2

fromosimportlistdir

fromos.pathimportisfile,join

importmatplotlib.pyplotasplt

mypath="INRIAPerson/Test/pos/"

face_cascade=cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

onlyfiles=[join(mypath,f)forfinlistdir(mypath)ifisfile(join(mypath,f))]

cnt=0

forfilenameinonlyfiles:

image=cv2.imread(filename)

image_grayscale=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

faces=face_cascade.detectMultiScale(image_grayscale,1.3,5)

iflen(faces)==0:

continue

cnt_faces=1

for(x,y,w,h)infaces:

cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,0),2)

cv2.putText(image,"face"+str(cnt_faces),(x,y-10),cv2.FONT_HERSHEY_SIMPLEX,1,(0,0,0),2)

plt.figure()

plt.axis("off")

plt.imshow(cv2.cvtColor(image[y:y+h,x:x+w],cv2.COLOR_BGR2RGB))

cnt_faces+=1

plt.figure()

plt.axis("off")

plt.imshow(cv2.cvtColor(image,cv2.COLOR_BGR2RGB))

cnt=cnt+1

ifcnt==5:

break

Page 226: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure43:Example

Figure44:Example

Page 227: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure45:Example

Figure46:Example

Page 228: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure47:Example

Figure48:Example

Page 229: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure49:Example

Figure50:Example

Page 230: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure51:Example

Figure52:Example

Page 231: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure53:Example

11.2.0.8PedestrianDetectionusingHOGDescriptor

WewilluseHistogramofOrientedGradients(HOG)todetectauprightpersonfrom images. See Figure 54, Figure 55, Figure 56, Figure 57, Figure 58,Figure59,Figure60,Figure61,Figure62,Figure63

11.2.0.8.1PythonCodeSnippet

#initializetheHOGdescriptor/persondetector

hog=cv2.HOGDescriptor()

hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

cnt=0

forfilenameinonlyfiles:

img=cv2.imread(filename)

orig=img.copy()

gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

#detectpeopleintheimage

(rects,weights)=hog.detectMultiScale(img,winStride=(8,8),

padding=(16,16),scale=1.05)

#drawthefinalboundingboxes

for(x,y,w,h)inrects:

cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)

plt.figure()

plt.axis("off")

plt.imshow(cv2.cvtColor(orig,cv2.COLOR_BGR2RGB))

plt.figure()

plt.axis("off")

plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))

cnt=cnt+1

ifcnt==5:

Page 232: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure54:Example

Figure55:Example

break

Page 233: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure56:Example

Figure57:Example

Page 234: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure58:Example

Figure59:Example

Page 235: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure60:Example

Figure61:Example

Page 236: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

Figure62:Example

Figure63:Example

11.2.0.9ProcessingbyApacheSpark

INRIAPersondatasetprovides100+ imagesandSparkcanbeused for image

Page 237: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

processinginparallel.Weload288imagesfrom“Test/pos”directory.

Spark provides a special object ‘sc’ to connect between a spark cluster andfunctionsinpythoncode.Therefore,wecanrunpythonfunctionsinparalleltodetetobjectsinthisexample.

map function is used to process pedestrian and face detection per imagefromtheparallelize()functionof‘sc’sparkcontext.

collectfonctionmergesresultsinanarray.

defapply_batch(imagePath):importcv2importnumpyasnp#initializetheHOG descriptor/person detector hog = cv2.HOGDescriptor()hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())image = cv2.imread(imagePath) # detect people in the image (rects,weights) = hog.detectMultiScale(image, winStride=(8, 8), padding=(16,16),scale=1.05)#drawthefinalboundingboxesfor(x,y,w,h) inrects:cv2.rectangle(image,(x,y),(x+w,y+h),(0,255,0),2)returnimage

11.2.0.9.1ParallelizeinSparkContext

Thelistofimagefilesisgiventoparallelize.

11.2.0.9.2MapFunction(apply_batch)

The‘apply_batch’functionthatwecreatedpreviouslyisgiventomapfunctiontoprocessinasparkcluster.

11.2.0.9.3CollectFunction

Theresultofeachmapprocessismergedtoanarray.

11.2.0.10Resultsfor100+imagesbySparkCluster

pd=sc.parallelize(onlyfiles)

pdc=pd.map(apply_batch)

result=pdc.collect()

forimageinresult:

Page 238: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

plt.figure()

plt.axis("off")

plt.imshow(cv2.cvtColor(image,cv2.COLOR_BGR2RGB))

Page 239: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

12REFERENCES

☁�

[1]L.Richardson,“Beautifulsouppythonpackageoverview.”WebPage,2019[Online].Available:https://www.crummy.com/software/BeautifulSoup/bs4/doc/

[2]C.WODEHOUSE,“Shouldyouusemongodb?Alookattheleadingnosqldatabase.” Web Page, 2018 [Online]. Available:https://www.upwork.com/hiring/data/should-you-use-mongodb-a-look-at-the-leading-nosql-database/

[3]Guru99, “Introduction tomongodb.”WebPage, 2018 [Online].Available:https://www.guru99.com/mongodb-tutorials.html#1

[4] MongoDB, “Https://www.mongodb.com/.” Web Page, 2018 [Online].Available:https://docs.mongodb.com/manual/introduction/

[5]M.Papiernik,“Howto installmongodbonubuntu18.04.”WebPage,Jun-2018 [Online]. Available:https://www.digitalocean.com/community/tutorials/how-to-install-mongodb-on-ubuntu-18-04

[6]J.Ellingwood,“Initialserversetupwithubuntu18.04.”WebPage,Apr-2018[Online]. Available: https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-18-04

[7]MongoDB,Databasesandcollections,4.0ed.NewYork,NewYork,USA:MongoDB Inc, 2008 [Online]. Available:https://docs.mongodb.com/manual/core/databases-and-collections/

[8]J.M.CraigBuckler,“Usingjoinsinmongodbnosqldatabases.”WebPage,Sep-2016 [Online]. Available: https://www.sitepoint.com/using-joins-in-mongodb-nosql-databases/

[9] MongoDB, Lookup (aggregation), 3.2 ed. New York City, New York,United States: MongoDB Inc, 2008 [Online]. Available:

Page 240: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

[10]MongoDB,MongoDB package components - mongoexport, 4.0 ed. NewYorkCity,NewYork,UnitedStates:MongoDBInc,2008[Online].Available:https://docs.mongodb.com/manual/reference/program/mongoexport/

[11]MongoDB, Security, 4.0 ed. New York City, New York, United States:MongoDB Inc, 2008 [Online]. Available:https://docs.mongodb.com/manual/security/

[12] MongoDB, “MongoDB atlas.” Web Page, 2018 [Online]. Available:https://www.mongodb.com/cloud/atlas

[13]I.MongoDB,“PyMongo3.7.1documentation.”WebPage,2008[Online].Available:https://api.mongodb.com/python/current/api

[14]A. J. J. Davis, “Announcing pymongo3.”Web Page,Apr-2015 [Online].Available:https://emptysqua.re/blog/announcing-pymongo-3/

[15] M. Dirolf, “PyMongo.” Web Page, Jul-2018 [Online]. Available:https://github.com/mongodb/mongo-python-driver

[16] N. Leite, “MongoDB and python.” Web Page, Mar-2015 [Online].Available:https://www.slideshare.net/NorbertoLeite/mongodb-and-python

[17]V.Oleynik, “How do you usemongodbwith python?”Web Page,Mar-2017 [Online]. Available: https://gearheart.io/blog/how-do-you-use-mongodb-with-python/

[18] I. MongoDB, “Installing / upgrading.” Web pages, 2008 [Online].Available:http://api.mongodb.com/python/current/installation.html

[19] R. Python, “Introduction to mongodb and python.” Web Page, 2016[Online]. Available: https://realpython.com/introduction-to-mongodb-and-python/

[20]W3Schools,“Pythonmongodbcreatedatabase.”WebPage,1999[Online].Available:https://www.w3schools.com/python/python_mongodb_create_db.asp

Page 241: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

[21]I.MongoDB,“PyMongo3.7.1documentation.”WebPage,2008[Online].Available:https://api.mongodb.com/python/current/tutorial.html

[22] N. O’Higgins, PyMongo & python. O’Reilly, 2011 [Online]. Available:http://img105.job1001.com/upload/adminnew/2015-04-07/1428393873-MHKX3LN.pdf

[23]I.MongoDB,“PyMongo3.7.1documentation.”WebPage,2008[Online].Available:https://api.mongodb.com/python/current/examples/aggregation.html

[24] MongoDB, “PyMongo 3.7.2 documenation.” Web Page, 2008 [Online].Available: https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/

[25] MongoDB, “PyMongo 3.7.2 documenation.” Web Page, 2008 [Online].Available:https://docs.mongodb.com/manual/core/map-reduce/

[26] MongoDB, “PyMongo v2.0 documentation.” Web Page, 2008 [Online].Available:https://api.mongodb.com/python/2.0/examples/map_reduce.html

[27] MongoDB, “PyMongo 3.7.2 documenation.” Web Page, 2008 [Online].Available:https://api.mongodb.com/python/current/examples/copydb.html

[28] MongoEngine, “MongoEngine user documentation.” Web Page, 2009[Online].Available:http://docs.mongoengine.org/

[29] Wikipedia, “Object-relational mapping.” Web Page, May-2009 [Online].Available:https://en.wikipedia.org/wiki/Object-relational_mapping

[30] MongoDB, “Flask-mongoengine.” Web Page, 2008 [Online]. Available:http://docs.mongoengine.org/guide/defining-documents.html

[31] MongoEngine, “User guide: Document instances.” Web Page, 2009[Online]. Available: http://docs.mongoengine.org/guide/document-instances.html

[32] MongoEngine, “2.1 installing mongoengine.” Web Page, 2009 [Online].Available:http://docs.mongoengine.org/guide/installing.html

Page 242: Introduction to Pythondsc.soic.indiana.edu/publications/CloudComputing-Python.pdfINTRODUCTION TO PYTHON 1 PREFACE 1.1 Disclaimer ☁ÿÿ 1.1.1 Acknowledgment 1.1.2 Extensions 2 INTRODUCTION

[33]MongoEngine, “2.2 connection to mongodb.”Web Page, 2009 [Online].Available:http://docs.mongoengine.org/guide/connecting.html

[34]MongoEngine,“Userguide2.5.Querying thedatabase.”WebPage,2009[Online].Available:http://docs.mongoengine.org/guide/querying.html

[35]Wikipedia,“Flask(webframework).”WebPage,2010[Online].Available:https://en.wikipedia.org/wiki/Flask_(web_framework)

[36] MongoDB, “Flask-pymongo.” Web Page, 2008 [Online]. Available:https://flask-pymongo.readthedocs.io/en/latest/

[37]MongoDB,“Flaskmongoalchemy.”WebPage,2008 [Online].Available:https://pythonhosted.org/Flask-MongoAlchemy/

[38] MongoDB, “Flask-mongoengine.” Web Page, 2008 [Online]. Available:http://docs.mongoengine.org/projects/flask-mongoengine/en/latest/

[39] Wikipedia, “Flask (web framework).” Web Page, Oct-2018 [Online].Available:https://en.wikipedia.org/wiki/Flask_(web_framework)