Hands-On Machine Learning with TensorFlow...Scikit-Learn is very easy to use, yet it implements many Machine Learning algorithms efficiently, so it makes for a great entry point to

Hands-OnMachineLearningwithScikit-Learn,Keras,and

TensorFlow

SECONDEDITION

Concepts,Tools,andTechniquestoBuildIntelligentSystems

AurélienGéron

Hands-OnMachineLearningwithScikit-Learn,Keras,andTensorFlowbyAurélienGéron

Copyright©2019AurélienGéron.Allrightsreserved.

PrintedinCanada.

PublishedbyO’ReillyMedia,Inc.,1005GravensteinHighwayNorth,Sebastopol,CA95472.

O’Reillybooksmaybepurchasedforeducational,business,orsalespromotionaluse.Onlineeditionsarealsoavailableformosttitles(http://oreilly.com).Formoreinformation,contactourcorporate/institutionalsalesdepartment:[email protected].

Editors:RachelRoumeliotisandNicoleTache

ProductionEditor:KristenBrown

Copyeditor:AmandaKersey

Proofreader:RachelHead

Indexer:JudithMcConville

InteriorDesigner:DavidFutato

CoverDesigner:KarenMontgomery

Illustrator:RebeccaDemarest

September2019:SecondEdition

RevisionHistoryfortheSecondEdition

2019-09-05:FirstRelease

Seehttp://oreilly.com/catalog/errata.csp?isbn=9781492032649forreleasedetails.

http://oreilly.com

http://oreilly.com/catalog/errata.csp?isbn=9781492032649

TheO’ReillylogoisaregisteredtrademarkofO’ReillyMedia,Inc.Hands-OnMachineLearningwithScikit-Learn,Keras,andTensorFlow,thecoverimage,andrelatedtradedressaretrademarksofO’ReillyMedia,Inc.

Theviewsexpressedinthisworkarethoseoftheauthor,anddonotrepresentthepublisher’sviews.Whilethepublisherandtheauthorhaveusedgoodfaitheffortstoensurethattheinformationandinstructionscontainedinthisworkareaccurate,thepublisherandtheauthordisclaimallresponsibilityforerrorsoromissions,includingwithoutlimitationresponsibilityfordamagesresultingfromtheuseoforrelianceonthiswork.Useoftheinformationandinstructionscontainedinthisworkisatyourownrisk.Ifanycodesamplesorothertechnologythisworkcontainsordescribesissubjecttoopensourcelicensesortheintellectualpropertyrightsofothers,itisyourresponsibilitytoensurethatyourusethereofcomplieswithsuchlicensesand/orrights.

978-1-492-03264-9

[TI]

Preface

TheMachineLearningTsunamiIn2006,GeoffreyHintonetal.publishedapaper showinghowtotrainadeepneuralnetworkcapableofrecognizinghandwrittendigitswithstate-of-the-artprecision(>98%).Theybrandedthistechnique“DeepLearning.”Adeepneuralnetworkisa(very)simplifiedmodelofourcerebralcortex,composedofastackoflayersofartificialneurons.Trainingadeepneuralnetwaswidelyconsideredimpossibleatthetime, andmostresearchershadabandonedtheideainthelate1990s.Thispaperrevivedtheinterestofthescientificcommunity,andbeforelongmanynewpapersdemonstratedthatDeepLearningwasnotonlypossible,butcapableofmind-blowingachievementsthatnootherMachineLearning(ML)techniquecouldhopetomatch(withthehelpoftremendouscomputingpowerandgreatamountsofdata).ThisenthusiasmsoonextendedtomanyotherareasofMachineLearning.

Adecadeorsolater,MachineLearninghasconqueredtheindustry:itisattheheartofmuchofthemagicintoday’shigh-techproducts,rankingyourwebsearchresults,poweringyoursmartphone’sspeechrecognition,recommendingvideos,andbeatingtheworldchampionatthegameofGo.Beforeyouknowit,itwillbedrivingyourcar.

MachineLearninginYourProjectsSo,naturallyyouareexcitedaboutMachineLearningandwouldlovetojointheparty!

Perhapsyouwouldliketogiveyourhomemaderobotabrainofitsown?Makeitrecognizefaces?Orlearntowalkaround?

Ormaybeyourcompanyhastonsofdata(userlogs,financialdata,productiondata,machinesensordata,hotlinestats,HRreports,etc.),andmorethanlikelyyoucouldunearthsomehiddengemsifyoujustknewwheretolook.WithMachineLearning,youcouldaccomplishthefollowingandmore:

1

2

https://homl.info/136

https://homl.info/usecases

Segmentcustomersandfindthebestmarketingstrategyforeachgroup.

Recommendproductsforeachclientbasedonwhatsimilarclientsbought.

Detectwhichtransactionsarelikelytobefraudulent.

Forecastnextyear’srevenue.

Whateverthereason,youhavedecidedtolearnMachineLearningandimplementitinyourprojects.Greatidea!

ObjectiveandApproachThisbookassumesthatyouknowclosetonothingaboutMachineLearning.Itsgoalistogiveyoutheconcepts,tools,andintuitionyouneedtoimplementprogramscapableoflearningfromdata.

Wewillcoveralargenumberoftechniques,fromthesimplestandmostcommonlyused(suchasLinearRegression)tosomeoftheDeepLearningtechniquesthatregularlywincompetitions.

Ratherthanimplementingourowntoyversionsofeachalgorithm,wewillbeusingproduction-readyPythonframeworks:

Scikit-Learnisveryeasytouse,yetitimplementsmanyMachineLearningalgorithmsefficiently,soitmakesforagreatentrypointtolearningMachineLearning.

TensorFlowisamorecomplexlibraryfordistributednumericalcomputation.Itmakesitpossibletotrainandrunverylargeneuralnetworksefficientlybydistributingthecomputationsacrosspotentiallyhundredsofmulti-GPU(graphicsprocessingunit)servers.TensorFlow(TF)wascreatedatGoogleandsupportsmanyofitslarge-scaleMachineLearningapplications.ItwasopensourcedinNovember2015.

Kerasisahigh-levelDeepLearningAPIthatmakesitverysimpletotrainandrunneuralnetworks.ItcanrunontopofeitherTensorFlow,Theano,orMicrosoftCognitiveToolkit(formerlyknownasCNTK).TensorFlowcomeswithitsownimplementationofthisAPI,called

http://scikit-learn.org/

https://tensorflow.org/

https://keras.io/

tf.keras,whichprovidessupportforsomeadvancedTensorFlowfeatures(e.g.,theabilitytoefficientlyloaddata).

Thebookfavorsahands-onapproach,growinganintuitiveunderstandingofMachineLearningthroughconcreteworkingexamplesandjustalittlebitoftheory.Whileyoucanreadthisbookwithoutpickingupyourlaptop,IhighlyrecommendyouexperimentwiththecodeexamplesavailableonlineasJupyternotebooksathttps://github.com/ageron/handson-ml2.

PrerequisitesThisbookassumesthatyouhavesomePythonprogrammingexperienceandthatyouarefamiliarwithPython’smainscientificlibraries—inparticular,NumPy,pandas,andMatplotlib.

Also,ifyoucareaboutwhat’sunderthehood,youshouldhaveareasonableunderstandingofcollege-levelmathaswell(calculus,linearalgebra,probabilities,andstatistics).

Ifyoudon’tknowPythonyet,http://learnpython.org/isagreatplacetostart.TheofficialtutorialonPython.orgisalsoquitegood.

IfyouhaveneverusedJupyter,Chapter2willguideyouthroughinstallationandthebasics:itisapowerfultooltohaveinyourtoolbox.

IfyouarenotfamiliarwithPython’sscientificlibraries,theprovidedJupyternotebooksincludeafewtutorials.Thereisalsoaquickmathtutorialforlinearalgebra.

RoadmapThisbookisorganizedintwoparts.PartI,TheFundamentalsofMachineLearning,coversthefollowingtopics:

WhatMachineLearningis,whatproblemsittriestosolve,andthemaincategoriesandfundamentalconceptsofitssystems

ThestepsinatypicalMachineLearningproject

https://github.com/ageron/handson-ml2

http://numpy.org/

http://pandas.pydata.org/

http://matplotlib.org/

http://learnpython.org/

https://docs.python.org/3/tutorial/

Learningbyfittingamodeltodata

Optimizingacostfunction

Handling,cleaning,andpreparingdata

Selectingandengineeringfeatures

Selectingamodelandtuninghyperparametersusingcross-validation

ThechallengesofMachineLearning,inparticularunderfittingandoverfitting(thebias/variancetrade-off)

Themostcommonlearningalgorithms:LinearandPolynomialRegression,LogisticRegression,k-NearestNeighbors,SupportVectorMachines,DecisionTrees,RandomForests,andEnsemblemethods

Reducingthedimensionalityofthetrainingdatatofightthe“curseofdimensionality”

Otherunsupervisedlearningtechniques,includingclustering,densityestimation,andanomalydetection

PartII,NeuralNetworksandDeepLearning,coversthefollowingtopics:

Whatneuralnetsareandwhatthey’regoodfor

BuildingandtrainingneuralnetsusingTensorFlowandKeras

Themostimportantneuralnetarchitectures:feedforwardneuralnetsfortabulardata,convolutionalnetsforcomputervision,recurrentnetsandlongshort-termmemory(LSTM)netsforsequenceprocessing,encoder/decodersandTransformersfornaturallanguageprocessing,autoencodersandgenerativeadversarialnetworks(GANs)forgenerativelearning

Techniquesfortrainingdeepneuralnets

Howtobuildanagent(e.g.,abotinagame)thatcanlearngoodstrategiesthroughtrialanderror,usingReinforcementLearning

Loadingandpreprocessinglargeamountsofdataefficiently

TraininganddeployingTensorFlowmodelsatscale

ThefirstpartisbasedmostlyonScikit-Learn,whilethesecondpartusesTensorFlowandKeras.

CAUTIONDon’tjumpintodeepwaterstoohastily:whileDeepLearningisnodoubtoneofthemostexcitingareasinMachineLearning,youshouldmasterthefundamentalsfirst.Moreover,mostproblemscanbesolvedquitewellusingsimplertechniquessuchasRandomForestsandEnsemblemethods(discussedinPartI).DeepLearningisbestsuitedforcomplexproblemssuchasimagerecognition,speechrecognition,ornaturallanguageprocessing,providedyouhaveenoughdata,computingpower,andpatience.

ChangesintheSecondEditionThissecondeditionhassixmainobjectives:

1. CoveradditionalMLtopics:moreunsupervisedlearningtechniques(includingclustering,anomalydetection,densityestimation,andmixturemodels);moretechniquesfortrainingdeepnets(includingself-normalizednetworks);additionalcomputervisiontechniques(includingXception,SENet,objectdetectionwithYOLO,andsemanticsegmentationusingR-CNN);handlingsequencesusingcovolutionalneuralnetworks(CNNs,includingWaveNet);naturallanguageprocessingusingrecurrentneuralnetworks(RNNs),CNNs,andTransformers;andGANs.

2. CoveradditionallibrariesandAPIs(Keras,theDataAPI,TF-AgentsforReinforcementLearning)andtraininganddeployingTFmodelsatscaleusingtheDistributionStrategiesAPI,TF-Serving,andGoogleCloudAIPlatform.AlsobrieflyintroduceTFTransform,TFLite,TFAddons/Seq2Seq,andTensorFlow.js.

3. DiscusssomeofthelatestimportantresultsfromDeepLearningresearch.

4. MigrateallTensorFlowchapterstoTensorFlow2,anduseTensorFlow’s

implementationoftheKerasAPI(tf.keras)wheneverpossible.

5. UpdatethecodeexamplestousethelatestversionsofScikit-Learn,NumPy,pandas,Matplotlib,andotherlibraries.

6. Clarifysomesectionsandfixsomeerrors,thankstoplentyofgreatfeedbackfromreaders.

Somechapterswereadded,otherswererewritten,andafewwerereordered.Seehttps://homl.info/changes2formoredetailsonwhatchangedinthesecondedition.

OtherResourcesManyexcellentresourcesareavailabletolearnaboutMachineLearning.Forexample,AndrewNg’sMLcourseonCourseraisamazing,althoughitrequiresasignificanttimeinvestment(thinkmonths).

TherearealsomanyinterestingwebsitesaboutMachineLearning,includingofcourseScikit-Learn’sexceptionalUserGuide.YoumayalsoenjoyDataquest,whichprovidesveryniceinteractivetutorials,andMLblogssuchasthoselistedonQuora.Finally,theDeepLearningwebsitehasagoodlistofresourcestocheckouttolearnmore.

TherearemanyotherintroductorybooksaboutMachineLearning.Inparticular:

JoelGrus’sDataSciencefromScratch(O’Reilly)presentsthefundamentalsofMachineLearningandimplementssomeofthemainalgorithmsinpurePython(fromscratch,asthenamesuggests).

StephenMarsland’sMachineLearning:AnAlgorithmicPerspective(Chapman&Hall)isagreatintroductiontoMachineLearning,coveringawiderangeoftopicsindepthwithcodeexamplesinPython(alsofromscratch,butusingNumPy).

SebastianRaschka’sPythonMachineLearning(PacktPublishing)isalsoagreatintroductiontoMachineLearningandleveragesPythonopensourcelibraries(Pylearn2andTheano).

FrançoisChollet’sDeepLearningwithPython(Manning)isavery

https://homl.info/changes2

https://homl.info/ngcourse

https://homl.info/skdoc

https://www.dataquest.io/

https://homl.info/1

http://deeplearning.net/

https://homl.info/grusbook

practicalbookthatcoversalargerangeoftopicsinaclearandconciseway,asyoumightexpectfromtheauthoroftheexcellentKeraslibrary.Itfavorscodeexamplesovermathematicaltheory.

AndriyBurkov’sTheHundred-PageMachineLearningBookisveryshortandcoversanimpressiverangeoftopics,introducingtheminapproachabletermswithoutshyingawayfromthemathequations.

YaserS.Abu-Mostafa,MalikMagdon-Ismail,andHsuan-TienLin’sLearningfromData(AMLBook)isarathertheoreticalapproachtoMLthatprovidesdeepinsights,inparticularonthebias/variancetrade-off(seeChapter4).

StuartRussellandPeterNorvig’sArtificialIntelligence:AModernApproach,3rdEdition(Pearson),isagreat(andhuge)bookcoveringanincredibleamountoftopics,includingMachineLearning.IthelpsputMLintoperspective.

Finally,joiningMLcompetitionwebsitessuchasKaggle.comwillallowyoutopracticeyourskillsonreal-worldproblems,withhelpandinsightsfromsomeofthebestMLprofessionalsoutthere.

ConventionsUsedinThisBookThefollowingtypographicalconventionsareusedinthisbook:

Italic

Indicatesnewterms,URLs,emailaddresses,filenames,andfileextensions.

Constant width

Usedforprogramlistings,aswellaswithinparagraphstorefertoprogramelementssuchasvariableorfunctionnames,databases,datatypes,environmentvariables,statementsandkeywords.

Constant width bold

Showscommandsorothertextthatshouldbetypedliterallybytheuser.

Constant width italic

https://www.kaggle.com/

Showstextthatshouldbereplacedwithuser-suppliedvaluesorbyvaluesdeterminedbycontext.

TIPThiselementsignifiesatiporsuggestion.

NOTEThiselementsignifiesageneralnote.

WARNINGThiselementindicatesawarningorcaution.

CodeExamplesThereisaseriesofJupyternotebooksfullofsupplementalmaterial,suchascodeexamplesandexercises,availablefordownloadathttps://github.com/ageron/handson-ml2.

SomeofthecodeexamplesinthebookleaveoutrepetitivesectionsordetailsthatareobviousorunrelatedtoMachineLearning.Thiskeepsthefocusontheimportantpartsofthecodeandsavesspacetocovermoretopics.Ifyouwantthefullcodeexamples,theyareallavailableintheJupyternotebooks.

Notethatwhenthecodeexamplesdisplaysomeoutputs,thesecodeexamplesareshownwithPythonprompts(>>>and...),asinaPythonshell,toclearlydistinguishthecodefromtheoutputs.Forexample,thiscodedefinesthesquare()function,thenitcomputesanddisplaysthesquareof3:

>>> def square(x):... return x ** 2...>>> result = square(3)

https://github.com/ageron/handson-ml2

>>> result9

Whencodedoesnotdisplayanything,promptsarenotused.However,theresultmaysometimesbeshownasacomment,likethis:

def square(x): return x ** 2

result = square(3) # result is 9

UsingCodeExamplesThisbookisheretohelpyougetyourjobdone.Ingeneral,ifexamplecodeisofferedwiththisbook,youmayuseitinyourprogramsanddocumentation.Youdonotneedtocontactusforpermissionunlessyou’rereproducingasignificantportionofthecode.Forexample,writingaprogramthatusesseveralchunksofcodefromthisbookdoesnotrequirepermission.SellingordistributingaCD-ROMofexamplesfromO’Reillybooksdoesrequirepermission.Answeringaquestionbycitingthisbookandquotingexamplecodedoesnotrequirepermission.Incorporatingasignificantamountofexamplecodefromthisbookintoyourproduct’sdocumentationdoesrequirepermission.

Weappreciate,butdonotrequire,attribution.Anattributionusuallyincludesthetitle,author,publisher,andISBN.Forexample:“Hands-OnMachineLearningwithScikit-Learn,Keras,andTensorFlow,2ndEdition,byAurélienGéron(O’Reilly).Copyright2019AurélienGéron,978-1-492-03264-9.”Ifyoufeelyouruseofcodeexamplesfallsoutsidefairuseorthepermissiongivenabove,[email protected].

O’ReillyOnlineLearning

NOTEForalmost40years,O’ReillyMediahasprovidedtechnologyandbusinesstraining,knowledge,andinsighttohelpcompaniessucceed.

mailto:[email protected]

http://oreilly.com

Ouruniquenetworkofexpertsandinnovatorssharetheirknowledgeandexpertisethroughbooks,articles,conferences,andouronlinelearningplatform.O’Reilly’sonlinelearningplatformgivesyouon-demandaccesstolivetrainingcourses,in-depthlearningpaths,interactivecodingenvironments,andavastcollectionoftextandvideofromO’Reillyand200+otherpublishers.Formoreinformation,pleasevisithttp://oreilly.com.

HowtoContactUsPleaseaddresscommentsandquestionsconcerningthisbooktothepublisher:

O’ReillyMedia,Inc.

1005GravensteinHighwayNorth

Sebastopol,CA95472

800-998-9938(intheUnitedStatesorCanada)

707-829-0515(internationalorlocal)

707-829-0104(fax)

Wehaveawebpageforthisbook,wherewelisterrata,examples,andanyadditionalinformation.Youcanaccessthispageathttps://homl.info/oreilly2.

Tocommentorasktechnicalquestionsaboutthisbook,[email protected].

Formoreinformationaboutourbooks,courses,conferences,andnews,seeourwebsiteathttp://www.oreilly.com.

FindusonFacebook:http://facebook.com/oreilly

FollowusonTwitter:http://twitter.com/oreillymedia

WatchusonYouTube:http://www.youtube.com/oreillymedia

http://oreilly.com

https://homl.info/oreilly2

mailto:[email protected]

http://www.oreilly.com

http://facebook.com/oreilly

http://twitter.com/oreillymedia

http://www.youtube.com/oreillymedia

AcknowledgmentsNeverinmywildestdreamsdidIimaginethatthefirsteditionofthisbookwouldgetsuchalargeaudience.Ireceivedsomanymessagesfromreaders,manyaskingquestions,somekindlypointingouterrata,andmostsendingmeencouragingwords.IcannotexpresshowgratefulIamtoallthesereadersfortheirtremendoussupport.Thankyouallsoverymuch!PleasedonothesitatetofileissuesonGitHubifyoufinderrorsinthecodeexamples(orjusttoaskquestions),ortosubmiterrataifyoufinderrorsinthetext.Somereadersalsosharedhowthisbookhelpedthemgettheirfirstjob,orhowithelpedthemsolveaconcreteproblemtheywereworkingon.Ifindsuchfeedbackincrediblymotivating.Ifyoufindthisbookhelpful,Iwouldloveitifyoucouldshareyourstorywithme,eitherprivately(e.g.,viaLinkedIn)orpublicly(e.g.,inatweetorthroughanAmazonreview).

Iamalsoincrediblythankfultoalltheamazingpeoplewhotooktimeoutoftheirbusylivestoreviewmybookwithsuchcare.Inparticular,IwouldliketothankFrançoisCholletforreviewingallthechaptersbasedonKerasandTensorFlowandgivingmesomegreatin-depthfeedback.SinceKerasisoneofthemainadditionstothissecondedition,havingitsauthorreviewthebookwasinvaluable.IhighlyrecommendFrançois’sbookDeepLearningwithPython(Manning):ithastheconciseness,clarity,anddepthoftheKeraslibraryitself.SpecialthanksaswelltoAnkurPatel,whoreviewedeverychapterofthissecondeditionandgavemeexcellentfeedback,inparticularonChapter9,whichcoversunsupervisedlearningtechniques.Hecouldwriteawholebookonthetopic…oh,wait,hedid!DocheckoutHands-OnUnsupervisedLearningUsingPython:HowtoBuildAppliedMachineLearningSolutionsfromUnlabeledData(O’Reilly).HugethanksaswelltoOlzhasAkpambetov,whoreviewedallthechaptersinthesecondpartofthebook,testedmuchofthecode,andofferedmanygreatsuggestions.I’mgratefultoMarkDaoust,JonKrohn,DominicMonn,andJoshPattersonforreviewingthesecondpartofthisbooksothoroughlyandofferingtheirexpertise.Theyleftnostoneunturnedandprovidedamazinglyusefulfeedback.

Whilewritingthissecondedition,IwasfortunateenoughtogetplentyofhelpfrommembersoftheTensorFlowteam—inparticularMartinWicke,whotirelesslyanswereddozensofmyquestionsanddispatchedtheresttotheright

https://homl.info/issues2

https://homl.info/errata2

https://www.linkedin.com/in/aurelien-geron/

https://homl.info/amazon2

https://homl.info/cholletbook

https://homl.info/patel

people,includingKarmelAllison,PaigeBailey,EugeneBrevdo,WilliamChargin,Daniel“Wolff”Dobson,NickFelt,BruceFontaine,GoldieGadde,SandeepGupta,PriyaGupta,KevinHaas,KonstantinosKatsiapis,ViacheslavKovalevskyi,AllenLavoie,ClemensMewald,DanMoldovan,SeanMorgan,TomO’Malley,AlexandrePassos,AndréSusanoPinto,AnthonyPlatanios,OscarRamirez,AnnaRevinskaya,SaurabhSaxena,RyanSepassi,JiriSimsa,XiaodanSong,ChristinaSorokin,DustinTran,ToddWang,PeteWarden(whoalsoreviewedthefirstedition)EddWilder-James,andYuefengZhou,allofwhomweretremendouslyhelpful.Hugethankstoallofyou,andtoallothermembersoftheTensorFlowteam,notjustforyourhelp,butalsoformakingsuchagreatlibrary!SpecialthankstoIreneGiannoumisandRobertCroweoftheTFXteamforreviewingChapters13and19indepth.

ManythanksaswelltoO’Reilly’sfantasticstaff,inparticularNicoleTaché,whogavemeinsightfulfeedbackandwasalwayscheerful,encouraging,andhelpful:Icouldnotdreamofabettereditor.BigthankstoMicheleCroninaswell,whowasveryhelpful(andpatient)atthestartofthissecondedition,andtoKristenBrown,theproductioneditorforthesecondedition,whosawitthroughallthesteps(shealsocoordinatedfixesandupdatesforeachreprintofthefirstedition).ThanksaswelltoRachelMonaghanandAmandaKerseyfortheirthoroughcopyediting(respectivelyforthefirstandsecondedition),andtoJohnnyO’ToolewhomanagedtherelationshipwithAmazonandansweredmanyofmyquestions.ThankstoMarieBeaugureau,BenLorica,MikeLoukides,andLaurelRumaforbelievinginthisprojectandhelpingmedefineitsscope.ThankstoMattHackerandalloftheAtlasteamforansweringallmytechnicalquestionsregardingformatting,AsciiDoc,andLaTeX,andthankstoNickAdams,RebeccaDemarest,RachelHead,JudithMcConville,HelenMonroe,KarenMontgomery,RachelRoumeliotis,andeveryoneelseatO’Reillywhocontributedtothisbook.

IwouldalsoliketothankmyformerGooglecolleagues,inparticulartheYouTubevideoclassificationteam,forteachingmesomuchaboutMachineLearning.Icouldneverhavestartedthefirsteditionwithoutthem.SpecialthankstomypersonalMLgurus:ClémentCourbet,JulienDubois,MathiasKende,DanielKitachewsky,JamesPack,AlexanderPak,AnoshRaj,VitorSessak,WiktorTomczak,IngridvonGlehn,andRichWashington.AndthankstoeveryoneelseIworkedwithatYouTubeandintheamazingGoogleresearch

teamsinMountainView.ManythanksaswelltoMartinAndrews,SamWitteveen,andJasonZamanforwelcomingmeintotheirGoogleDeveloperExpertsgroupinSingapore,withthekindsupportofSoonsonKwon,andforallthegreatdiscussionswehadaboutDeepLearningandTensorFlow.AnyoneinterestedinDeepLearninginSingaporeshoulddefinitelyjointheirDeepLearningSingaporemeetup.JasondeservesspecialthanksforsharingsomeofhisTFLiteexpertiseforChapter19!

Iwillneverforgetthekindpeoplewhoreviewedthefirsteditionofthisbook,includingDavidAndrzejewski,LukasBiewald,JustinFrancis,VincentGuilbeau,EddyHung,KarimMatrah,GrégoireMesnil,SalimSémaoune,IainSmears,MichelTessier,IngridvonGlehn,PeteWarden,andofcoursemydearbrotherSylvain.SpecialthankstoHaesunPark,whogavemeplentyofexcellentfeedbackandcaughtseveralerrorswhilehewaswritingtheKoreantranslationofthefirsteditionofthisbook.HealsotranslatedtheJupyternotebooksintoKorean,nottomentionTensorFlow’sdocumentation.IdonotspeakKorean,butjudgingbythequalityofhisfeedback,allhistranslationsmustbetrulyexcellent!Haesunalsokindlycontributedsomeofthesolutionstotheexercisesinthissecondedition.

Lastbutnotleast,Iaminfinitelygratefultomybelovedwife,Emmanuelle,andtoourthreewonderfulchildren,Alexandre,Rémi,andGabrielle,forencouragingmetoworkhardonthisbook.I’malsothankfultothemfortheirinsatiablecuriosity:explainingsomeofthemostdifficultconceptsinthisbooktomywifeandchildrenhelpedmeclarifymythoughtsanddirectlyimprovedmanypartsofit.Andtheykeepbringingmecookiesandcoffee!Whatmorecanonedreamof?

1 GeoffreyE.Hintonetal.,“AFastLearningAlgorithmforDeepBeliefNets,”NeuralComputation18(2006):1527–1554.

2 DespitethefactthatYannLeCun’sdeepconvolutionalneuralnetworkshadworkedwellforimagerecognitionsincethe1990s,althoughtheywerenotasgeneral-purpose.

https://homl.info/meetupsg

PartI.TheFundamentalsofMachineLearning

Chapter1.TheMachineLearningLandscape

Whenmostpeoplehear“MachineLearning,”theypicturearobot:adependablebutleroradeadlyTerminator,dependingonwhoyouask.ButMachineLearningisnotjustafuturisticfantasy;it’salreadyhere.Infact,ithasbeenaroundfordecadesinsomespecializedapplications,suchasOpticalCharacterRecognition(OCR).ButthefirstMLapplicationthatreallybecamemainstream,improvingthelivesofhundredsofmillionsofpeople,tookovertheworldbackinthe1990s:thespamfilter.It’snotexactlyaself-awareSkynet,butitdoestechnicallyqualifyasMachineLearning(ithasactuallylearnedsowellthatyouseldomneedtoflaganemailasspamanymore).ItwasfollowedbyhundredsofMLapplicationsthatnowquietlypowerhundredsofproductsandfeaturesthatyouuseregularly,frombetterrecommendationstovoicesearch.

WheredoesMachineLearningstartandwheredoesitend?Whatexactlydoesitmeanforamachinetolearnsomething?IfIdownloadacopyofWikipedia,hasmycomputerreallylearnedsomething?Isitsuddenlysmarter?InthischapterwewillstartbyclarifyingwhatMachineLearningisandwhyyoumaywanttouseit.

Then,beforewesetouttoexploretheMachineLearningcontinent,wewilltakealookatthemapandlearnaboutthemainregionsandthemostnotablelandmarks:supervisedversusunsupervisedlearning,onlineversusbatchlearning,instance-basedversusmodel-basedlearning.ThenwewilllookattheworkflowofatypicalMLproject,discussthemainchallengesyoumayface,andcoverhowtoevaluateandfine-tuneaMachineLearningsystem.

Thischapterintroducesalotoffundamentalconcepts(andjargon)thateverydatascientistshouldknowbyheart.Itwillbeahigh-leveloverview(it’stheonlychapterwithoutmuchcode),allrathersimple,butyoushouldmakesureeverythingiscrystalcleartoyoubeforecontinuingontotherestofthebook.Sograbacoffeeandlet’sgetstarted!

TIPIfyoualreadyknowalltheMachineLearningbasics,youmaywanttoskipdirectlytoChapter2.Ifyouarenotsure,trytoanswerallthequestionslistedattheendofthechapterbeforemovingon.

WhatIsMachineLearning?MachineLearningisthescience(andart)ofprogrammingcomputerssotheycanlearnfromdata.

Hereisaslightlymoregeneraldefinition:

[MachineLearningisthe]fieldofstudythatgivescomputerstheabilitytolearnwithoutbeingexplicitlyprogrammed.

—ArthurSamuel,1959

Andamoreengineering-orientedone:

AcomputerprogramissaidtolearnfromexperienceEwithrespecttosometaskTandsomeperformancemeasureP,ifitsperformanceonT,asmeasuredbyP,improveswithexperienceE.

—TomMitchell,1997

YourspamfilterisaMachineLearningprogramthat,givenexamplesofspamemails(e.g.,flaggedbyusers)andexamplesofregular(nonspam,alsocalled“ham”)emails,canlearntoflagspam.Theexamplesthatthesystemusestolearnarecalledthetrainingset.Eachtrainingexampleiscalledatraininginstance(orsample).Inthiscase,thetaskTistoflagspamfornewemails,theexperienceEisthetrainingdata,andtheperformancemeasurePneedstobedefined;forexample,youcanusetheratioofcorrectlyclassifiedemails.Thisparticularperformancemeasureiscalledaccuracy,anditisoftenusedinclassificationtasks.

IfyoujustdownloadacopyofWikipedia,yourcomputerhasalotmoredata,butitisnotsuddenlybetteratanytask.Thus,downloadingacopyofWikipediaisnotMachineLearning.

WhyUseMachineLearning?Considerhowyouwouldwriteaspamfilterusingtraditionalprogrammingtechniques(Figure1-1):

1. Firstyouwouldconsiderwhatspamtypicallylookslike.Youmightnoticethatsomewordsorphrases(suchas“4U,”“creditcard,”“free,”and“amazing”)tendtocomeupalotinthesubjectline.Perhapsyouwouldalsonoticeafewotherpatternsinthesender’sname,theemail’sbody,andotherpartsoftheemail.

2. Youwouldwriteadetectionalgorithmforeachofthepatternsthatyounoticed,andyourprogramwouldflagemailsasspamifanumberofthesepatternsweredetected.

3. Youwouldtestyourprogramandrepeatsteps1and2untilitwasgoodenoughtolaunch.

Figure1-1.Thetraditionalapproach

Sincetheproblemisdifficult,yourprogramwilllikelybecomealonglistofcomplexrules—prettyhardtomaintain.

Incontrast,aspamfilterbasedonMachineLearningtechniquesautomatically

learnswhichwordsandphrasesaregoodpredictorsofspambydetectingunusuallyfrequentpatternsofwordsinthespamexamplescomparedtothehamexamples(Figure1-2).Theprogramismuchshorter,easiertomaintain,andmostlikelymoreaccurate.

Whatifspammersnoticethatalltheiremailscontaining“4U”areblocked?Theymightstartwriting“ForU”instead.Aspamfilterusingtraditionalprogrammingtechniqueswouldneedtobeupdatedtoflag“ForU”emails.Ifspammerskeepworkingaroundyourspamfilter,youwillneedtokeepwritingnewrulesforever.

Incontrast,aspamfilterbasedonMachineLearningtechniquesautomaticallynoticesthat“ForU”hasbecomeunusuallyfrequentinspamflaggedbyusers,anditstartsflaggingthemwithoutyourintervention(Figure1-3).

Figure1-2.TheMachineLearningapproach

Figure1-3.Automaticallyadaptingtochange

AnotherareawhereMachineLearningshinesisforproblemsthateitheraretoocomplexfortraditionalapproachesorhavenoknownalgorithm.Forexample,considerspeechrecognition.Sayyouwanttostartsimpleandwriteaprogramcapableofdistinguishingthewords“one”and“two.”Youmightnoticethattheword“two”startswithahigh-pitchsound(“T”),soyoucouldhardcodeanalgorithmthatmeasureshigh-pitchsoundintensityandusethattodistinguishonesandtwos —butobviouslythistechniquewillnotscaletothousandsofwordsspokenbymillionsofverydifferentpeopleinnoisyenvironmentsandindozensoflanguages.Thebestsolution(atleasttoday)istowriteanalgorithmthatlearnsbyitself,givenmanyexamplerecordingsforeachword.

Finally,MachineLearningcanhelphumanslearn(Figure1-4).MLalgorithmscanbeinspectedtoseewhattheyhavelearned(althoughforsomealgorithmsthiscanbetricky).Forinstance,onceaspamfilterhasbeentrainedonenoughspam,itcaneasilybeinspectedtorevealthelistofwordsandcombinationsofwordsthatitbelievesarethebestpredictorsofspam.Sometimesthiswillrevealunsuspectedcorrelationsornewtrends,andtherebyleadtoabetterunderstandingoftheproblem.ApplyingMLtechniquestodigintolargeamountsofdatacanhelpdiscoverpatternsthatwerenotimmediatelyapparent.Thisiscalleddatamining.

Figure1-4.MachineLearningcanhelphumanslearn

Tosummarize,MachineLearningisgreatfor:

Problemsforwhichexistingsolutionsrequirealotoffine-tuningorlonglistsofrules:oneMachineLearningalgorithmcanoftensimplifycodeandperformbetterthanthetraditionalapproach.

Complexproblemsforwhichusingatraditionalapproachyieldsnogoodsolution:thebestMachineLearningtechniquescanperhapsfindasolution.

Fluctuatingenvironments:aMachineLearningsystemcanadapttonewdata.

Gettinginsightsaboutcomplexproblemsandlargeamountsofdata.

ExamplesofApplicationsLet’slookatsomeconcreteexamplesofMachineLearningtasks,alongwiththetechniquesthatcantacklethem:

Analyzingimagesofproductsonaproductionlinetoautomaticallyclassify

them

Thisisimageclassification,typicallyperformedusingconvolutionalneuralnetworks(CNNs;seeChapter14).

Detectingtumorsinbrainscans

Thisissemanticsegmentation,whereeachpixelintheimageisclassified(aswewanttodeterminetheexactlocationandshapeoftumors),typicallyusingCNNsaswell.

Automaticallyclassifyingnewsarticles

Thisisnaturallanguageprocessing(NLP),andmorespecificallytextclassification,whichcanbetackledusingrecurrentneuralnetworks(RNNs),CNNs,orTransformers(seeChapter16).

Automaticallyflaggingoffensivecommentsondiscussionforums

Thisisalsotextclassification,usingthesameNLPtools.

Summarizinglongdocumentsautomatically

ThisisabranchofNLPcalledtextsummarization,againusingthesametools.

Creatingachatbotorapersonalassistant

ThisinvolvesmanyNLPcomponents,includingnaturallanguageunderstanding(NLU)andquestion-answeringmodules.

Forecastingyourcompany’srevenuenextyear,basedonmanyperformancemetrics

Thisisaregressiontask(i.e.,predictingvalues)thatmaybetackledusinganyregressionmodel,suchasaLinearRegressionorPolynomialRegressionmodel(seeChapter4),aregressionSVM(seeChapter5),aregressionRandomForest(seeChapter7),oranartificialneuralnetwork(seeChapter10).Ifyouwanttotakeintoaccountsequencesofpastperformancemetrics,youmaywanttouseRNNs,CNNs,orTransformers(seeChapters15and16).

Makingyourappreacttovoicecommands

Thisisspeechrecognition,whichrequiresprocessingaudiosamples:sincetheyarelongandcomplexsequences,theyaretypicallyprocessedusingRNNs,CNNs,orTransformers(seeChapters15and16).

Detectingcreditcardfraud

Thisisanomalydetection(seeChapter9).

Segmentingclientsbasedontheirpurchasessothatyoucandesignadifferentmarketingstrategyforeachsegment

Thisisclustering(seeChapter9).

Representingacomplex,high-dimensionaldatasetinaclearandinsightfuldiagram

Thisisdatavisualization,ofteninvolvingdimensionalityreductiontechniques(seeChapter8).

Recommendingaproductthataclientmaybeinterestedin,basedonpastpurchases

Thisisarecommendersystem.Oneapproachistofeedpastpurchases(andotherinformationabouttheclient)toanartificialneuralnetwork(seeChapter10),andgetittooutputthemostlikelynextpurchase.Thisneuralnetwouldtypicallybetrainedonpastsequencesofpurchasesacrossallclients.

Buildinganintelligentbotforagame

ThisisoftentackledusingReinforcementLearning(RL;seeChapter18),whichisabranchofMachineLearningthattrainsagents(suchasbots)topicktheactionsthatwillmaximizetheirrewardsovertime(e.g.,abotmaygetarewardeverytimetheplayerlosessomelifepoints),withinagivenenvironment(suchasthegame).ThefamousAlphaGoprogramthatbeattheworldchampionatthegameofGowasbuiltusingRL.

Thislistcouldgoonandon,buthopefullyitgivesyouasenseoftheincrediblebreadthandcomplexityofthetasksthatMachineLearningcantackle,andthetypesoftechniquesthatyouwoulduseforeachtask.

TypesofMachineLearningSystemsTherearesomanydifferenttypesofMachineLearningsystemsthatitisusefultoclassifytheminbroadcategories,basedonthefollowingcriteria:

Whetherornottheyaretrainedwithhumansupervision(supervised,unsupervised,semisupervised,andReinforcementLearning)

Whetherornottheycanlearnincrementallyonthefly(onlineversusbatchlearning)

Whethertheyworkbysimplycomparingnewdatapointstoknowndatapoints,orinsteadbydetectingpatternsinthetrainingdataandbuildingapredictivemodel,muchlikescientistsdo(instance-basedversusmodel-basedlearning)

Thesecriteriaarenotexclusive;youcancombinetheminanywayyoulike.Forexample,astate-of-the-artspamfiltermaylearnontheflyusingadeepneuralnetworkmodeltrainedusingexamplesofspamandham;thismakesitanonline,model-based,supervisedlearningsystem.

Let’slookateachofthesecriteriaabitmoreclosely.

Supervised/UnsupervisedLearningMachineLearningsystemscanbeclassifiedaccordingtotheamountandtypeofsupervisiontheygetduringtraining.Therearefourmajorcategories:supervisedlearning,unsupervisedlearning,semisupervisedlearning,andReinforcementLearning.

SupervisedlearningInsupervisedlearning,thetrainingsetyoufeedtothealgorithmincludesthedesiredsolutions,calledlabels(Figure1-5).

Figure1-5.Alabeledtrainingsetforspamclassification(anexampleofsupervisedlearning)

Atypicalsupervisedlearningtaskisclassification.Thespamfilterisagoodexampleofthis:itistrainedwithmanyexampleemailsalongwiththeirclass(spamorham),anditmustlearnhowtoclassifynewemails.

Anothertypicaltaskistopredictatargetnumericvalue,suchasthepriceofacar,givenasetoffeatures(mileage,age,brand,etc.)calledpredictors.Thissortoftaskiscalledregression(Figure1-6). Totrainthesystem,youneedtogiveitmanyexamplesofcars,includingboththeirpredictorsandtheirlabels(i.e.,theirprices).

NOTEInMachineLearninganattributeisadatatype(e.g.,“mileage”),whileafeaturehasseveralmeanings,dependingonthecontext,butgenerallymeansanattributeplusitsvalue(e.g.,“mileage=15,000”).Manypeopleusethewordsattributeandfeatureinterchangeably.

Notethatsomeregressionalgorithmscanbeusedforclassificationaswell,andviceversa.Forexample,LogisticRegressioniscommonlyusedforclassification,asitcanoutputavaluethatcorrespondstotheprobabilityofbelongingtoagivenclass(e.g.,20%chanceofbeingspam).

1

Figure1-6.Aregressionproblem:predictavalue,givenaninputfeature(thereareusuallymultipleinputfeatures,andsometimesmultipleoutputvalues)

Herearesomeofthemostimportantsupervisedlearningalgorithms(coveredinthisbook):

k-NearestNeighbors

LinearRegression

LogisticRegression

SupportVectorMachines(SVMs)

DecisionTreesandRandomForests

Neuralnetworks

UnsupervisedlearningInunsupervisedlearning,asyoumightguess,thetrainingdataisunlabeled(Figure1-7).Thesystemtriestolearnwithoutateacher.

2

Figure1-7.Anunlabeledtrainingsetforunsupervisedlearning

Herearesomeofthemostimportantunsupervisedlearningalgorithms(mostofthesearecoveredinChapters8and9):

Clustering

K-Means

DBSCAN

HierarchicalClusterAnalysis(HCA)

Anomalydetectionandnoveltydetection

One-classSVM

IsolationForest

Visualizationanddimensionalityreduction

PrincipalComponentAnalysis(PCA)

KernelPCA

LocallyLinearEmbedding(LLE)

t-DistributedStochasticNeighborEmbedding(t-SNE)

Associationrulelearning

Apriori

Eclat

Forexample,sayyouhavealotofdataaboutyourblog’svisitors.Youmaywanttorunaclusteringalgorithmtotrytodetectgroupsofsimilarvisitors(Figure1-8).Atnopointdoyoutellthealgorithmwhichgroupavisitorbelongsto:itfindsthoseconnectionswithoutyourhelp.Forexample,itmightnoticethat40%ofyourvisitorsaremaleswholovecomicbooksandgenerallyreadyourblogintheevening,while20%areyoungsci-filoverswhovisitduringtheweekends.Ifyouuseahierarchicalclusteringalgorithm,itmayalsosubdivideeachgroupintosmallergroups.Thismayhelpyoutargetyourpostsforeachgroup.

Figure1-8.Clustering

Visualizationalgorithmsarealsogoodexamplesofunsupervisedlearningalgorithms:youfeedthemalotofcomplexandunlabeleddata,andtheyoutputa2Dor3Drepresentationofyourdatathatcaneasilybeplotted(Figure1-9).Thesealgorithmstrytopreserveasmuchstructureastheycan(e.g.,tryingtokeepseparateclustersintheinputspacefromoverlappinginthevisualization)sothatyoucanunderstandhowthedataisorganizedandperhapsidentifyunsuspectedpatterns.

Figure1-9.Exampleofat-SNEvisualizationhighlightingsemanticclusters

Arelatedtaskisdimensionalityreduction,inwhichthegoalistosimplifythedatawithoutlosingtoomuchinformation.Onewaytodothisistomergeseveralcorrelatedfeaturesintoone.Forexample,acar’smileagemaybestronglycorrelatedwithitsage,sothedimensionalityreductionalgorithmwillmergethemintoonefeaturethatrepresentsthecar’swearandtear.Thisiscalledfeatureextraction.

TIPItisoftenagoodideatotrytoreducethedimensionofyourtrainingdatausingadimensionalityreductionalgorithmbeforeyoufeedittoanotherMachineLearningalgorithm(suchasasupervisedlearningalgorithm).Itwillrunmuchfaster,thedatawilltakeuplessdiskandmemoryspace,andinsomecasesitmayalsoperformbetter.

Yetanotherimportantunsupervisedtaskisanomalydetection—forexample,detectingunusualcreditcardtransactionstopreventfraud,catchingmanufacturingdefects,orautomaticallyremovingoutliersfromadatasetbefore

3

feedingittoanotherlearningalgorithm.Thesystemisshownmostlynormalinstancesduringtraining,soitlearnstorecognizethem;then,whenitseesanewinstance,itcantellwhetheritlookslikeanormaloneorwhetheritislikelyananomaly(seeFigure1-10).Averysimilartaskisnoveltydetection:itaimstodetectnewinstancesthatlookdifferentfromallinstancesinthetrainingset.Thisrequireshavingavery“clean”trainingset,devoidofanyinstancethatyouwouldlikethealgorithmtodetect.Forexample,ifyouhavethousandsofpicturesofdogs,and1%ofthesepicturesrepresentChihuahuas,thenanoveltydetectionalgorithmshouldnottreatnewpicturesofChihuahuasasnovelties.Ontheotherhand,anomalydetectionalgorithmsmayconsiderthesedogsassorareandsodifferentfromotherdogsthattheywouldlikelyclassifythemasanomalies(nooffensetoChihuahuas).

Figure1-10.Anomalydetection

Finally,anothercommonunsupervisedtaskisassociationrulelearning,inwhichthegoalistodigintolargeamountsofdataanddiscoverinterestingrelationsbetweenattributes.Forexample,supposeyouownasupermarket.Runninganassociationruleonyoursaleslogsmayrevealthatpeoplewhopurchasebarbecuesauceandpotatochipsalsotendtobuysteak.Thus,youmaywanttoplacetheseitemsclosetooneanother.

SemisupervisedlearningSincelabelingdataisusuallytime-consumingandcostly,youwilloftenhave

plentyofunlabeledinstances,andfewlabeledinstances.Somealgorithmscandealwithdatathat’spartiallylabeled.Thisiscalledsemisupervisedlearning(Figure1-11).

Figure1-11.Semisupervisedlearningwithtwoclasses(trianglesandsquares):theunlabeledexamples(circles)helpclassifyanewinstance(thecross)intothetriangleclassratherthanthesquareclass,even

thoughitisclosertothelabeledsquares

Somephoto-hostingservices,suchasGooglePhotos,aregoodexamplesofthis.Onceyouuploadallyourfamilyphotostotheservice,itautomaticallyrecognizesthatthesamepersonAshowsupinphotos1,5,and11,whileanotherpersonBshowsupinphotos2,5,and7.Thisistheunsupervisedpartofthealgorithm(clustering).Nowallthesystemneedsisforyoutotellitwhothesepeopleare.Justaddonelabelperperson anditisabletonameeveryoneineveryphoto,whichisusefulforsearchingphotos.

Mostsemisupervisedlearningalgorithmsarecombinationsofunsupervisedandsupervisedalgorithms.Forexample,deepbeliefnetworks(DBNs)arebasedonunsupervisedcomponentscalledrestrictedBoltzmannmachines(RBMs)stackedontopofoneanother.RBMsaretrainedsequentiallyinanunsupervisedmanner,andthenthewholesystemisfine-tunedusingsupervisedlearningtechniques.

ReinforcementLearningReinforcementLearningisagaverydifferentbeast.Thelearningsystem,calledanagentinthiscontext,canobservetheenvironment,selectandperform

4

actions,andgetrewardsinreturn(orpenaltiesintheformofnegativerewards,asshowninFigure1-12).Itmustthenlearnbyitselfwhatisthebeststrategy,calledapolicy,togetthemostrewardovertime.Apolicydefineswhatactiontheagentshouldchoosewhenitisinagivensituation.

Figure1-12.ReinforcementLearning

Forexample,manyrobotsimplementReinforcementLearningalgorithmstolearnhowtowalk.DeepMind’sAlphaGoprogramisalsoagoodexampleofReinforcementLearning:itmadetheheadlinesinMay2017whenitbeattheworldchampionKeJieatthegameofGo.Itlearneditswinningpolicybyanalyzingmillionsofgames,andthenplayingmanygamesagainstitself.Notethatlearningwasturnedoffduringthegamesagainstthechampion;AlphaGowasjustapplyingthepolicyithadlearned.

BatchandOnlineLearningAnothercriterionusedtoclassifyMachineLearningsystemsiswhetherornotthesystemcanlearnincrementallyfromastreamofincomingdata.

BatchlearningInbatchlearning,thesystemisincapableoflearningincrementally:itmustbetrainedusingalltheavailabledata.Thiswillgenerallytakealotoftimeandcomputingresources,soitistypicallydoneoffline.Firstthesystemistrained,andthenitislaunchedintoproductionandrunswithoutlearninganymore;itjustapplieswhatithaslearned.Thisiscalledofflinelearning.

Ifyouwantabatchlearningsystemtoknowaboutnewdata(suchasanewtypeofspam),youneedtotrainanewversionofthesystemfromscratchonthefulldataset(notjustthenewdata,butalsotheolddata),thenstoptheoldsystemandreplaceitwiththenewone.

Fortunately,thewholeprocessoftraining,evaluating,andlaunchingaMachineLearningsystemcanbeautomatedfairlyeasily(asshowninFigure1-3),soevenabatchlearningsystemcanadapttochange.Simplyupdatethedataandtrainanewversionofthesystemfromscratchasoftenasneeded.

Thissolutionissimpleandoftenworksfine,buttrainingusingthefullsetofdatacantakemanyhours,soyouwouldtypicallytrainanewsystemonlyevery24hoursorevenjustweekly.Ifyoursystemneedstoadapttorapidlychangingdata(e.g.,topredictstockprices),thenyouneedamorereactivesolution.

Also,trainingonthefullsetofdatarequiresalotofcomputingresources(CPU,memoryspace,diskspace,diskI/O,networkI/O,etc.).Ifyouhavealotofdataandyouautomateyoursystemtotrainfromscratcheveryday,itwillendupcostingyoualotofmoney.Iftheamountofdataishuge,itmayevenbeimpossibletouseabatchlearningalgorithm.

Finally,ifyoursystemneedstobeabletolearnautonomouslyandithaslimitedresources(e.g.,asmartphoneapplicationoraroveronMars),thencarryingaroundlargeamountsoftrainingdataandtakingupalotofresourcestotrainforhourseverydayisashowstopper.

Fortunately,abetteroptioninallthesecasesistousealgorithmsthatarecapableoflearningincrementally.

OnlinelearningInonlinelearning,youtrainthesystemincrementallybyfeedingitdatainstancessequentially,eitherindividuallyorinsmallgroupscalledmini-batches.Eachlearningstepisfastandcheap,sothesystemcanlearnaboutnewdataonthefly,asitarrives(seeFigure1-13).

Figure1-13.Inonlinelearning,amodelistrainedandlaunchedintoproduction,andthenitkeepslearningasnewdatacomesin

Onlinelearningisgreatforsystemsthatreceivedataasacontinuousflow(e.g.,stockprices)andneedtoadapttochangerapidlyorautonomously.Itisalsoagoodoptionifyouhavelimitedcomputingresources:onceanonlinelearningsystemhaslearnedaboutnewdatainstances,itdoesnotneedthemanymore,soyoucandiscardthem(unlessyouwanttobeabletorollbacktoapreviousstateand“replay”thedata).Thiscansaveahugeamountofspace.

Onlinelearningalgorithmscanalsobeusedtotrainsystemsonhugedatasetsthatcannotfitinonemachine’smainmemory(thisiscalledout-of-corelearning).Thealgorithmloadspartofthedata,runsatrainingsteponthatdata,andrepeatstheprocessuntilithasrunonallofthedata(seeFigure1-14).

WARNINGOut-of-corelearningisusuallydoneoffline(i.e.,notonthelivesystem),soonlinelearningcanbeaconfusingname.Thinkofitasincrementallearning.

Oneimportantparameterofonlinelearningsystemsishowfasttheyshouldadapttochangingdata:thisiscalledthelearningrate.Ifyousetahighlearningrate,thenyoursystemwillrapidlyadapttonewdata,butitwillalsotendtoquicklyforgettheolddata(youdon’twantaspamfiltertoflagonlythelatestkindsofspamitwasshown).Conversely,ifyousetalowlearningrate,thesystemwillhavemoreinertia;thatis,itwilllearnmoreslowly,butitwillalsobelesssensitivetonoiseinthenewdataortosequencesofnonrepresentativedatapoints(outliers).

Figure1-14.Usingonlinelearningtohandlehugedatasets

Abigchallengewithonlinelearningisthatifbaddataisfedtothesystem,thesystem’sperformancewillgraduallydecline.Ifit’salivesystem,yourclientswillnotice.Forexample,baddatacouldcomefromamalfunctioningsensoronarobot,orfromsomeonespammingasearchenginetotrytorankhighinsearchresults.Toreducethisrisk,youneedtomonitoryoursystemcloselyandpromptlyswitchlearningoff(andpossiblyreverttoapreviouslyworkingstate)

ifyoudetectadropinperformance.Youmayalsowanttomonitortheinputdataandreacttoabnormaldata(e.g.,usingananomalydetectionalgorithm).

Instance-BasedVersusModel-BasedLearningOnemorewaytocategorizeMachineLearningsystemsisbyhowtheygeneralize.MostMachineLearningtasksareaboutmakingpredictions.Thismeansthatgivenanumberoftrainingexamples,thesystemneedstobeabletomakegoodpredictionsfor(generalizeto)examplesithasneverseenbefore.Havingagoodperformancemeasureonthetrainingdataisgood,butinsufficient;thetruegoalistoperformwellonnewinstances.

Therearetwomainapproachestogeneralization:instance-basedlearningandmodel-basedlearning.

Instance-basedlearningPossiblythemosttrivialformoflearningissimplytolearnbyheart.Ifyouweretocreateaspamfilterthisway,itwouldjustflagallemailsthatareidenticaltoemailsthathavealreadybeenflaggedbyusers—nottheworstsolution,butcertainlynotthebest.

Insteadofjustflaggingemailsthatareidenticaltoknownspamemails,yourspamfiltercouldbeprogrammedtoalsoflagemailsthatareverysimilartoknownspamemails.Thisrequiresameasureofsimilaritybetweentwoemails.A(verybasic)similaritymeasurebetweentwoemailscouldbetocountthenumberofwordstheyhaveincommon.Thesystemwouldflaganemailasspamifithasmanywordsincommonwithaknownspamemail.

Thisiscalledinstance-basedlearning:thesystemlearnstheexamplesbyheart,thengeneralizestonewcasesbyusingasimilaritymeasuretocomparethemtothelearnedexamples(orasubsetofthem).Forexample,inFigure1-15thenewinstancewouldbeclassifiedasatrianglebecausethemajorityofthemostsimilarinstancesbelongtothatclass.

Figure1-15.Instance-basedlearning

Model-basedlearningAnotherwaytogeneralizefromasetofexamplesistobuildamodeloftheseexamplesandthenusethatmodeltomakepredictions.Thisiscalledmodel-basedlearning(Figure1-16).

Figure1-16.Model-basedlearning

Forexample,supposeyouwanttoknowifmoneymakespeoplehappy,soyoudownloadtheBetterLifeIndexdatafromtheOECD’swebsiteandstatsaboutgrossdomesticproduct(GDP)percapitafromtheIMF’swebsite.Thenyoujoin

https://homl.info/4

https://homl.info/5

thetablesandsortbyGDPpercapita.Table1-1showsanexcerptofwhatyouget.

Table1-1.Doesmoneymakepeoplehappier?

Country GDPpercapita(USD) Lifesatisfaction

Hungary 12,240 4.9

Korea 27,195 5.8

France 37,675 6.5

Australia 50,962 7.3

UnitedStates 55,805 7.2

Let’splotthedataforthesecountries(Figure1-17).

Figure1-17.Doyouseeatrendhere?

Theredoesseemtobeatrendhere!Althoughthedataisnoisy(i.e.,partlyrandom),itlookslikelifesatisfactiongoesupmoreorlesslinearlyasthecountry’sGDPpercapitaincreases.SoyoudecidetomodellifesatisfactionasalinearfunctionofGDPpercapita.Thisstepiscalledmodelselection:you

selectedalinearmodeloflifesatisfactionwithjustoneattribute,GDPpercapita(Equation1-1).

Equation1-1.Asimplelinearmodel

Thismodelhastwomodelparameters,θ andθ . Bytweakingtheseparameters,youcanmakeyourmodelrepresentanylinearfunction,asshowninFigure1-18.

Figure1-18.Afewpossiblelinearmodels

Beforeyoucanuseyourmodel,youneedtodefinetheparametervaluesθ andθ .Howcanyouknowwhichvalueswillmakeyourmodelperformbest?Toanswerthisquestion,youneedtospecifyaperformancemeasure.Youcaneitherdefineautilityfunction(orfitnessfunction)thatmeasureshowgoodyourmodelis,oryoucandefineacostfunctionthatmeasureshowbaditis.ForLinearRegressionproblems,peopletypicallyuseacostfunctionthatmeasuresthedistancebetweenthelinearmodel’spredictionsandthetrainingexamples;theobjectiveistominimizethisdistance.

ThisiswheretheLinearRegressionalgorithmcomesin:youfeedityourtrainingexamples,anditfindstheparametersthatmakethelinearmodelfitbest

0 15

0

1

toyourdata.Thisiscalledtrainingthemodel.Inourcase,thealgorithmfindsthattheoptimalparametervaluesareθ =4.85andθ =4.91×10 .

WARNINGConfusingly,thesameword“model”canrefertoatypeofmodel(e.g.,LinearRegression),toafullyspecifiedmodelarchitecture(e.g.,LinearRegressionwithoneinputandoneoutput),ortothefinaltrainedmodelreadytobeusedforpredictions(e.g.,LinearRegressionwithoneinputandoneoutput,usingθ =4.85andθ =4.91×10 ).Modelselectionconsistsinchoosingthetypeofmodelandfullyspecifyingitsarchitecture.Trainingamodelmeansrunninganalgorithmtofindthemodelparametersthatwillmakeitbestfitthetrainingdata(andhopefullymakegoodpredictionsonnewdata).

Nowthemodelfitsthetrainingdataascloselyaspossible(foralinearmodel),asyoucanseeinFigure1-19.

Figure1-19.Thelinearmodelthatfitsthetrainingdatabest

Youarefinallyreadytorunthemodeltomakepredictions.Forexample,sayyouwanttoknowhowhappyCypriotsare,andtheOECDdatadoesnothavetheanswer.Fortunately,youcanuseyourmodeltomakeagoodprediction:youlookupCyprus’sGDPpercapita,find$22,587,andthenapplyyourmodelandfindthatlifesatisfactionislikelytobesomewherearound4.85+22,587×4.91

0 1–5

0 1–5

-5

×10 =5.96.

Towhetyourappetite,Example1-1showsthePythoncodethatloadsthedata,preparesit, createsascatterplotforvisualization,andthentrainsalinearmodelandmakesaprediction.

Example1-1.TrainingandrunningalinearmodelusingScikit-Learnimport matplotlib.pyplot as pltimport numpy as npimport pandas as pdimport sklearn.linear_model

# Load the dataoecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='\t', encoding='latin1', na_values="n/a")

# Prepare the datacountry_stats = prepare_country_stats(oecd_bli, gdp_per_capita)X = np.c_[country_stats["GDP per capita"]]y = np.c_[country_stats["Life satisfaction"]]

# Visualize the datacountry_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')plt.show()

# Select a linear modelmodel = sklearn.linear_model.LinearRegression()

# Train the modelmodel.fit(X, y)

# Make a prediction for CyprusX_new = [[22587]] # Cyprus's GDP per capitaprint(model.predict(X_new)) # outputs [[ 5.96242338]]

NOTEIfyouhadusedaninstance-basedlearningalgorithminstead,youwouldhavefoundthatSloveniahastheclosestGDPpercapitatothatofCyprus($20,732),andsincetheOECDdatatellsusthatSlovenians’lifesatisfactionis5.7,youwouldhavepredictedalifesatisfactionof5.7forCyprus.Ifyouzoomoutabitandlookatthetwonext-closestcountries,youwillfindPortugalandSpainwithlifesatisfactionsof5.1and6.5,respectively.Averagingthesethreevalues,youget5.77,whichisprettyclosetoyourmodel-basedprediction.Thissimplealgorithmiscalledk-NearestNeighborsregression(inthisexample,k=3).

-5

6

7

ReplacingtheLinearRegressionmodelwithk-NearestNeighborsregressioninthepreviouscodeisassimpleasreplacingthesetwolines:

import sklearn.linear_modelmodel = sklearn.linear_model.LinearRegression()

withthesetwo:

import sklearn.neighborsmodel = sklearn.neighbors.KNeighborsRegressor( n_neighbors=3)

Ifallwentwell,yourmodelwillmakegoodpredictions.Ifnot,youmayneedtousemoreattributes(employmentrate,health,airpollution,etc.),getmoreorbetter-qualitytrainingdata,orperhapsselectamorepowerfulmodel(e.g.,aPolynomialRegressionmodel).

Insummary:

Youstudiedthedata.

Youselectedamodel.

Youtraineditonthetrainingdata(i.e.,thelearningalgorithmsearchedforthemodelparametervaluesthatminimizeacostfunction).

Finally,youappliedthemodeltomakepredictionsonnewcases(thisiscalledinference),hopingthatthismodelwillgeneralizewell.

ThisiswhatatypicalMachineLearningprojectlookslike.InChapter2youwillexperiencethisfirsthandbygoingthroughaprojectendtoend.

Wehavecoveredalotofgroundsofar:younowknowwhatMachineLearningisreallyabout,whyitisuseful,whatsomeofthemostcommoncategoriesofMLsystemsare,andwhatatypicalprojectworkflowlookslike.Nowlet’slookatwhatcangowronginlearningandpreventyoufrommakingaccuratepredictions.

Documents

Hands-On Machine Learning with TensorFlow...Scikit-Learn is very easy to use, yet it implements many Machine Learning algorithms efficiently, so it makes for a great entry point to