Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Hands-OnMachineLearningwithScikit-Learn,Keras,and
TensorFlow
SECONDEDITION
Concepts,Tools,andTechniquestoBuildIntelligentSystems
AurélienGéron
Hands-OnMachineLearningwithScikit-Learn,Keras,andTensorFlowbyAurélienGéron
Copyright©2019AurélienGéron.Allrightsreserved.
PrintedinCanada.
PublishedbyO’ReillyMedia,Inc.,1005GravensteinHighwayNorth,Sebastopol,CA95472.
O’Reillybooksmaybepurchasedforeducational,business,orsalespromotionaluse.Onlineeditionsarealsoavailableformosttitles(http://oreilly.com).Formoreinformation,contactourcorporate/institutionalsalesdepartment:[email protected].
Editors:RachelRoumeliotisandNicoleTache
ProductionEditor:KristenBrown
Copyeditor:AmandaKersey
Proofreader:RachelHead
Indexer:JudithMcConville
InteriorDesigner:DavidFutato
CoverDesigner:KarenMontgomery
Illustrator:RebeccaDemarest
September2019:SecondEdition
RevisionHistoryfortheSecondEdition
2019-09-05:FirstRelease
Seehttp://oreilly.com/catalog/errata.csp?isbn=9781492032649forreleasedetails.
TheO’ReillylogoisaregisteredtrademarkofO’ReillyMedia,Inc.Hands-OnMachineLearningwithScikit-Learn,Keras,andTensorFlow,thecoverimage,andrelatedtradedressaretrademarksofO’ReillyMedia,Inc.
Theviewsexpressedinthisworkarethoseoftheauthor,anddonotrepresentthepublisher’sviews.Whilethepublisherandtheauthorhaveusedgoodfaitheffortstoensurethattheinformationandinstructionscontainedinthisworkareaccurate,thepublisherandtheauthordisclaimallresponsibilityforerrorsoromissions,includingwithoutlimitationresponsibilityfordamagesresultingfromtheuseoforrelianceonthiswork.Useoftheinformationandinstructionscontainedinthisworkisatyourownrisk.Ifanycodesamplesorothertechnologythisworkcontainsordescribesissubjecttoopensourcelicensesortheintellectualpropertyrightsofothers,itisyourresponsibilitytoensurethatyourusethereofcomplieswithsuchlicensesand/orrights.
978-1-492-03264-9
[TI]
Preface
TheMachineLearningTsunamiIn2006,GeoffreyHintonetal.publishedapaper showinghowtotrainadeepneuralnetworkcapableofrecognizinghandwrittendigitswithstate-of-the-artprecision(>98%).Theybrandedthistechnique“DeepLearning.”Adeepneuralnetworkisa(very)simplifiedmodelofourcerebralcortex,composedofastackoflayersofartificialneurons.Trainingadeepneuralnetwaswidelyconsideredimpossibleatthetime, andmostresearchershadabandonedtheideainthelate1990s.Thispaperrevivedtheinterestofthescientificcommunity,andbeforelongmanynewpapersdemonstratedthatDeepLearningwasnotonlypossible,butcapableofmind-blowingachievementsthatnootherMachineLearning(ML)techniquecouldhopetomatch(withthehelpoftremendouscomputingpowerandgreatamountsofdata).ThisenthusiasmsoonextendedtomanyotherareasofMachineLearning.
Adecadeorsolater,MachineLearninghasconqueredtheindustry:itisattheheartofmuchofthemagicintoday’shigh-techproducts,rankingyourwebsearchresults,poweringyoursmartphone’sspeechrecognition,recommendingvideos,andbeatingtheworldchampionatthegameofGo.Beforeyouknowit,itwillbedrivingyourcar.
MachineLearninginYourProjectsSo,naturallyyouareexcitedaboutMachineLearningandwouldlovetojointheparty!
Perhapsyouwouldliketogiveyourhomemaderobotabrainofitsown?Makeitrecognizefaces?Orlearntowalkaround?
Ormaybeyourcompanyhastonsofdata(userlogs,financialdata,productiondata,machinesensordata,hotlinestats,HRreports,etc.),andmorethanlikelyyoucouldunearthsomehiddengemsifyoujustknewwheretolook.WithMachineLearning,youcouldaccomplishthefollowingandmore:
1
2
Segmentcustomersandfindthebestmarketingstrategyforeachgroup.
Recommendproductsforeachclientbasedonwhatsimilarclientsbought.
Detectwhichtransactionsarelikelytobefraudulent.
Forecastnextyear’srevenue.
Whateverthereason,youhavedecidedtolearnMachineLearningandimplementitinyourprojects.Greatidea!
ObjectiveandApproachThisbookassumesthatyouknowclosetonothingaboutMachineLearning.Itsgoalistogiveyoutheconcepts,tools,andintuitionyouneedtoimplementprogramscapableoflearningfromdata.
Wewillcoveralargenumberoftechniques,fromthesimplestandmostcommonlyused(suchasLinearRegression)tosomeoftheDeepLearningtechniquesthatregularlywincompetitions.
Ratherthanimplementingourowntoyversionsofeachalgorithm,wewillbeusingproduction-readyPythonframeworks:
Scikit-Learnisveryeasytouse,yetitimplementsmanyMachineLearningalgorithmsefficiently,soitmakesforagreatentrypointtolearningMachineLearning.
TensorFlowisamorecomplexlibraryfordistributednumericalcomputation.Itmakesitpossibletotrainandrunverylargeneuralnetworksefficientlybydistributingthecomputationsacrosspotentiallyhundredsofmulti-GPU(graphicsprocessingunit)servers.TensorFlow(TF)wascreatedatGoogleandsupportsmanyofitslarge-scaleMachineLearningapplications.ItwasopensourcedinNovember2015.
Kerasisahigh-levelDeepLearningAPIthatmakesitverysimpletotrainandrunneuralnetworks.ItcanrunontopofeitherTensorFlow,Theano,orMicrosoftCognitiveToolkit(formerlyknownasCNTK).TensorFlowcomeswithitsownimplementationofthisAPI,called
tf.keras,whichprovidessupportforsomeadvancedTensorFlowfeatures(e.g.,theabilitytoefficientlyloaddata).
Thebookfavorsahands-onapproach,growinganintuitiveunderstandingofMachineLearningthroughconcreteworkingexamplesandjustalittlebitoftheory.Whileyoucanreadthisbookwithoutpickingupyourlaptop,IhighlyrecommendyouexperimentwiththecodeexamplesavailableonlineasJupyternotebooksathttps://github.com/ageron/handson-ml2.
PrerequisitesThisbookassumesthatyouhavesomePythonprogrammingexperienceandthatyouarefamiliarwithPython’smainscientificlibraries—inparticular,NumPy,pandas,andMatplotlib.
Also,ifyoucareaboutwhat’sunderthehood,youshouldhaveareasonableunderstandingofcollege-levelmathaswell(calculus,linearalgebra,probabilities,andstatistics).
Ifyoudon’tknowPythonyet,http://learnpython.org/isagreatplacetostart.TheofficialtutorialonPython.orgisalsoquitegood.
IfyouhaveneverusedJupyter,Chapter2willguideyouthroughinstallationandthebasics:itisapowerfultooltohaveinyourtoolbox.
IfyouarenotfamiliarwithPython’sscientificlibraries,theprovidedJupyternotebooksincludeafewtutorials.Thereisalsoaquickmathtutorialforlinearalgebra.
RoadmapThisbookisorganizedintwoparts.PartI,TheFundamentalsofMachineLearning,coversthefollowingtopics:
WhatMachineLearningis,whatproblemsittriestosolve,andthemaincategoriesandfundamentalconceptsofitssystems
ThestepsinatypicalMachineLearningproject
Learningbyfittingamodeltodata
Optimizingacostfunction
Handling,cleaning,andpreparingdata
Selectingandengineeringfeatures
Selectingamodelandtuninghyperparametersusingcross-validation
ThechallengesofMachineLearning,inparticularunderfittingandoverfitting(thebias/variancetrade-off)
Themostcommonlearningalgorithms:LinearandPolynomialRegression,LogisticRegression,k-NearestNeighbors,SupportVectorMachines,DecisionTrees,RandomForests,andEnsemblemethods
Reducingthedimensionalityofthetrainingdatatofightthe“curseofdimensionality”
Otherunsupervisedlearningtechniques,includingclustering,densityestimation,andanomalydetection
PartII,NeuralNetworksandDeepLearning,coversthefollowingtopics:
Whatneuralnetsareandwhatthey’regoodfor
BuildingandtrainingneuralnetsusingTensorFlowandKeras
Themostimportantneuralnetarchitectures:feedforwardneuralnetsfortabulardata,convolutionalnetsforcomputervision,recurrentnetsandlongshort-termmemory(LSTM)netsforsequenceprocessing,encoder/decodersandTransformersfornaturallanguageprocessing,autoencodersandgenerativeadversarialnetworks(GANs)forgenerativelearning
Techniquesfortrainingdeepneuralnets
Howtobuildanagent(e.g.,abotinagame)thatcanlearngoodstrategiesthroughtrialanderror,usingReinforcementLearning
Loadingandpreprocessinglargeamountsofdataefficiently
TraininganddeployingTensorFlowmodelsatscale
ThefirstpartisbasedmostlyonScikit-Learn,whilethesecondpartusesTensorFlowandKeras.
CAUTIONDon’tjumpintodeepwaterstoohastily:whileDeepLearningisnodoubtoneofthemostexcitingareasinMachineLearning,youshouldmasterthefundamentalsfirst.Moreover,mostproblemscanbesolvedquitewellusingsimplertechniquessuchasRandomForestsandEnsemblemethods(discussedinPartI).DeepLearningisbestsuitedforcomplexproblemssuchasimagerecognition,speechrecognition,ornaturallanguageprocessing,providedyouhaveenoughdata,computingpower,andpatience.
ChangesintheSecondEditionThissecondeditionhassixmainobjectives:
1. CoveradditionalMLtopics:moreunsupervisedlearningtechniques(includingclustering,anomalydetection,densityestimation,andmixturemodels);moretechniquesfortrainingdeepnets(includingself-normalizednetworks);additionalcomputervisiontechniques(includingXception,SENet,objectdetectionwithYOLO,andsemanticsegmentationusingR-CNN);handlingsequencesusingcovolutionalneuralnetworks(CNNs,includingWaveNet);naturallanguageprocessingusingrecurrentneuralnetworks(RNNs),CNNs,andTransformers;andGANs.
2. CoveradditionallibrariesandAPIs(Keras,theDataAPI,TF-AgentsforReinforcementLearning)andtraininganddeployingTFmodelsatscaleusingtheDistributionStrategiesAPI,TF-Serving,andGoogleCloudAIPlatform.AlsobrieflyintroduceTFTransform,TFLite,TFAddons/Seq2Seq,andTensorFlow.js.
3. DiscusssomeofthelatestimportantresultsfromDeepLearningresearch.
4. MigrateallTensorFlowchapterstoTensorFlow2,anduseTensorFlow’s
implementationoftheKerasAPI(tf.keras)wheneverpossible.
5. UpdatethecodeexamplestousethelatestversionsofScikit-Learn,NumPy,pandas,Matplotlib,andotherlibraries.
6. Clarifysomesectionsandfixsomeerrors,thankstoplentyofgreatfeedbackfromreaders.
Somechapterswereadded,otherswererewritten,andafewwerereordered.Seehttps://homl.info/changes2formoredetailsonwhatchangedinthesecondedition.
OtherResourcesManyexcellentresourcesareavailabletolearnaboutMachineLearning.Forexample,AndrewNg’sMLcourseonCourseraisamazing,althoughitrequiresasignificanttimeinvestment(thinkmonths).
TherearealsomanyinterestingwebsitesaboutMachineLearning,includingofcourseScikit-Learn’sexceptionalUserGuide.YoumayalsoenjoyDataquest,whichprovidesveryniceinteractivetutorials,andMLblogssuchasthoselistedonQuora.Finally,theDeepLearningwebsitehasagoodlistofresourcestocheckouttolearnmore.
TherearemanyotherintroductorybooksaboutMachineLearning.Inparticular:
JoelGrus’sDataSciencefromScratch(O’Reilly)presentsthefundamentalsofMachineLearningandimplementssomeofthemainalgorithmsinpurePython(fromscratch,asthenamesuggests).
StephenMarsland’sMachineLearning:AnAlgorithmicPerspective(Chapman&Hall)isagreatintroductiontoMachineLearning,coveringawiderangeoftopicsindepthwithcodeexamplesinPython(alsofromscratch,butusingNumPy).
SebastianRaschka’sPythonMachineLearning(PacktPublishing)isalsoagreatintroductiontoMachineLearningandleveragesPythonopensourcelibraries(Pylearn2andTheano).
FrançoisChollet’sDeepLearningwithPython(Manning)isavery
practicalbookthatcoversalargerangeoftopicsinaclearandconciseway,asyoumightexpectfromtheauthoroftheexcellentKeraslibrary.Itfavorscodeexamplesovermathematicaltheory.
AndriyBurkov’sTheHundred-PageMachineLearningBookisveryshortandcoversanimpressiverangeoftopics,introducingtheminapproachabletermswithoutshyingawayfromthemathequations.
YaserS.Abu-Mostafa,MalikMagdon-Ismail,andHsuan-TienLin’sLearningfromData(AMLBook)isarathertheoreticalapproachtoMLthatprovidesdeepinsights,inparticularonthebias/variancetrade-off(seeChapter4).
StuartRussellandPeterNorvig’sArtificialIntelligence:AModernApproach,3rdEdition(Pearson),isagreat(andhuge)bookcoveringanincredibleamountoftopics,includingMachineLearning.IthelpsputMLintoperspective.
Finally,joiningMLcompetitionwebsitessuchasKaggle.comwillallowyoutopracticeyourskillsonreal-worldproblems,withhelpandinsightsfromsomeofthebestMLprofessionalsoutthere.
ConventionsUsedinThisBookThefollowingtypographicalconventionsareusedinthisbook:
Italic
Indicatesnewterms,URLs,emailaddresses,filenames,andfileextensions.
Constant width
Usedforprogramlistings,aswellaswithinparagraphstorefertoprogramelementssuchasvariableorfunctionnames,databases,datatypes,environmentvariables,statementsandkeywords.
Constant width bold
Showscommandsorothertextthatshouldbetypedliterallybytheuser.
Constant width italic
Showstextthatshouldbereplacedwithuser-suppliedvaluesorbyvaluesdeterminedbycontext.
TIPThiselementsignifiesatiporsuggestion.
NOTEThiselementsignifiesageneralnote.
WARNINGThiselementindicatesawarningorcaution.
CodeExamplesThereisaseriesofJupyternotebooksfullofsupplementalmaterial,suchascodeexamplesandexercises,availablefordownloadathttps://github.com/ageron/handson-ml2.
SomeofthecodeexamplesinthebookleaveoutrepetitivesectionsordetailsthatareobviousorunrelatedtoMachineLearning.Thiskeepsthefocusontheimportantpartsofthecodeandsavesspacetocovermoretopics.Ifyouwantthefullcodeexamples,theyareallavailableintheJupyternotebooks.
Notethatwhenthecodeexamplesdisplaysomeoutputs,thesecodeexamplesareshownwithPythonprompts(>>>and...),asinaPythonshell,toclearlydistinguishthecodefromtheoutputs.Forexample,thiscodedefinesthesquare()function,thenitcomputesanddisplaysthesquareof3:
>>> def square(x):... return x ** 2...>>> result = square(3)
>>> result9
Whencodedoesnotdisplayanything,promptsarenotused.However,theresultmaysometimesbeshownasacomment,likethis:
def square(x): return x ** 2
result = square(3) # result is 9
UsingCodeExamplesThisbookisheretohelpyougetyourjobdone.Ingeneral,ifexamplecodeisofferedwiththisbook,youmayuseitinyourprogramsanddocumentation.Youdonotneedtocontactusforpermissionunlessyou’rereproducingasignificantportionofthecode.Forexample,writingaprogramthatusesseveralchunksofcodefromthisbookdoesnotrequirepermission.SellingordistributingaCD-ROMofexamplesfromO’Reillybooksdoesrequirepermission.Answeringaquestionbycitingthisbookandquotingexamplecodedoesnotrequirepermission.Incorporatingasignificantamountofexamplecodefromthisbookintoyourproduct’sdocumentationdoesrequirepermission.
Weappreciate,butdonotrequire,attribution.Anattributionusuallyincludesthetitle,author,publisher,andISBN.Forexample:“Hands-OnMachineLearningwithScikit-Learn,Keras,andTensorFlow,2ndEdition,byAurélienGéron(O’Reilly).Copyright2019AurélienGéron,978-1-492-03264-9.”Ifyoufeelyouruseofcodeexamplesfallsoutsidefairuseorthepermissiongivenabove,[email protected].
O’ReillyOnlineLearning
NOTEForalmost40years,O’ReillyMediahasprovidedtechnologyandbusinesstraining,knowledge,andinsighttohelpcompaniessucceed.
Ouruniquenetworkofexpertsandinnovatorssharetheirknowledgeandexpertisethroughbooks,articles,conferences,andouronlinelearningplatform.O’Reilly’sonlinelearningplatformgivesyouon-demandaccesstolivetrainingcourses,in-depthlearningpaths,interactivecodingenvironments,andavastcollectionoftextandvideofromO’Reillyand200+otherpublishers.Formoreinformation,pleasevisithttp://oreilly.com.
HowtoContactUsPleaseaddresscommentsandquestionsconcerningthisbooktothepublisher:
O’ReillyMedia,Inc.
1005GravensteinHighwayNorth
Sebastopol,CA95472
800-998-9938(intheUnitedStatesorCanada)
707-829-0515(internationalorlocal)
707-829-0104(fax)
Wehaveawebpageforthisbook,wherewelisterrata,examples,andanyadditionalinformation.Youcanaccessthispageathttps://homl.info/oreilly2.
Tocommentorasktechnicalquestionsaboutthisbook,[email protected].
Formoreinformationaboutourbooks,courses,conferences,andnews,seeourwebsiteathttp://www.oreilly.com.
FindusonFacebook:http://facebook.com/oreilly
FollowusonTwitter:http://twitter.com/oreillymedia
WatchusonYouTube:http://www.youtube.com/oreillymedia
AcknowledgmentsNeverinmywildestdreamsdidIimaginethatthefirsteditionofthisbookwouldgetsuchalargeaudience.Ireceivedsomanymessagesfromreaders,manyaskingquestions,somekindlypointingouterrata,andmostsendingmeencouragingwords.IcannotexpresshowgratefulIamtoallthesereadersfortheirtremendoussupport.Thankyouallsoverymuch!PleasedonothesitatetofileissuesonGitHubifyoufinderrorsinthecodeexamples(orjusttoaskquestions),ortosubmiterrataifyoufinderrorsinthetext.Somereadersalsosharedhowthisbookhelpedthemgettheirfirstjob,orhowithelpedthemsolveaconcreteproblemtheywereworkingon.Ifindsuchfeedbackincrediblymotivating.Ifyoufindthisbookhelpful,Iwouldloveitifyoucouldshareyourstorywithme,eitherprivately(e.g.,viaLinkedIn)orpublicly(e.g.,inatweetorthroughanAmazonreview).
Iamalsoincrediblythankfultoalltheamazingpeoplewhotooktimeoutoftheirbusylivestoreviewmybookwithsuchcare.Inparticular,IwouldliketothankFrançoisCholletforreviewingallthechaptersbasedonKerasandTensorFlowandgivingmesomegreatin-depthfeedback.SinceKerasisoneofthemainadditionstothissecondedition,havingitsauthorreviewthebookwasinvaluable.IhighlyrecommendFrançois’sbookDeepLearningwithPython(Manning):ithastheconciseness,clarity,anddepthoftheKeraslibraryitself.SpecialthanksaswelltoAnkurPatel,whoreviewedeverychapterofthissecondeditionandgavemeexcellentfeedback,inparticularonChapter9,whichcoversunsupervisedlearningtechniques.Hecouldwriteawholebookonthetopic…oh,wait,hedid!DocheckoutHands-OnUnsupervisedLearningUsingPython:HowtoBuildAppliedMachineLearningSolutionsfromUnlabeledData(O’Reilly).HugethanksaswelltoOlzhasAkpambetov,whoreviewedallthechaptersinthesecondpartofthebook,testedmuchofthecode,andofferedmanygreatsuggestions.I’mgratefultoMarkDaoust,JonKrohn,DominicMonn,andJoshPattersonforreviewingthesecondpartofthisbooksothoroughlyandofferingtheirexpertise.Theyleftnostoneunturnedandprovidedamazinglyusefulfeedback.
Whilewritingthissecondedition,IwasfortunateenoughtogetplentyofhelpfrommembersoftheTensorFlowteam—inparticularMartinWicke,whotirelesslyanswereddozensofmyquestionsanddispatchedtheresttotheright
people,includingKarmelAllison,PaigeBailey,EugeneBrevdo,WilliamChargin,Daniel“Wolff”Dobson,NickFelt,BruceFontaine,GoldieGadde,SandeepGupta,PriyaGupta,KevinHaas,KonstantinosKatsiapis,ViacheslavKovalevskyi,AllenLavoie,ClemensMewald,DanMoldovan,SeanMorgan,TomO’Malley,AlexandrePassos,AndréSusanoPinto,AnthonyPlatanios,OscarRamirez,AnnaRevinskaya,SaurabhSaxena,RyanSepassi,JiriSimsa,XiaodanSong,ChristinaSorokin,DustinTran,ToddWang,PeteWarden(whoalsoreviewedthefirstedition)EddWilder-James,andYuefengZhou,allofwhomweretremendouslyhelpful.Hugethankstoallofyou,andtoallothermembersoftheTensorFlowteam,notjustforyourhelp,butalsoformakingsuchagreatlibrary!SpecialthankstoIreneGiannoumisandRobertCroweoftheTFXteamforreviewingChapters13and19indepth.
ManythanksaswelltoO’Reilly’sfantasticstaff,inparticularNicoleTaché,whogavemeinsightfulfeedbackandwasalwayscheerful,encouraging,andhelpful:Icouldnotdreamofabettereditor.BigthankstoMicheleCroninaswell,whowasveryhelpful(andpatient)atthestartofthissecondedition,andtoKristenBrown,theproductioneditorforthesecondedition,whosawitthroughallthesteps(shealsocoordinatedfixesandupdatesforeachreprintofthefirstedition).ThanksaswelltoRachelMonaghanandAmandaKerseyfortheirthoroughcopyediting(respectivelyforthefirstandsecondedition),andtoJohnnyO’ToolewhomanagedtherelationshipwithAmazonandansweredmanyofmyquestions.ThankstoMarieBeaugureau,BenLorica,MikeLoukides,andLaurelRumaforbelievinginthisprojectandhelpingmedefineitsscope.ThankstoMattHackerandalloftheAtlasteamforansweringallmytechnicalquestionsregardingformatting,AsciiDoc,andLaTeX,andthankstoNickAdams,RebeccaDemarest,RachelHead,JudithMcConville,HelenMonroe,KarenMontgomery,RachelRoumeliotis,andeveryoneelseatO’Reillywhocontributedtothisbook.
IwouldalsoliketothankmyformerGooglecolleagues,inparticulartheYouTubevideoclassificationteam,forteachingmesomuchaboutMachineLearning.Icouldneverhavestartedthefirsteditionwithoutthem.SpecialthankstomypersonalMLgurus:ClémentCourbet,JulienDubois,MathiasKende,DanielKitachewsky,JamesPack,AlexanderPak,AnoshRaj,VitorSessak,WiktorTomczak,IngridvonGlehn,andRichWashington.AndthankstoeveryoneelseIworkedwithatYouTubeandintheamazingGoogleresearch
teamsinMountainView.ManythanksaswelltoMartinAndrews,SamWitteveen,andJasonZamanforwelcomingmeintotheirGoogleDeveloperExpertsgroupinSingapore,withthekindsupportofSoonsonKwon,andforallthegreatdiscussionswehadaboutDeepLearningandTensorFlow.AnyoneinterestedinDeepLearninginSingaporeshoulddefinitelyjointheirDeepLearningSingaporemeetup.JasondeservesspecialthanksforsharingsomeofhisTFLiteexpertiseforChapter19!
Iwillneverforgetthekindpeoplewhoreviewedthefirsteditionofthisbook,includingDavidAndrzejewski,LukasBiewald,JustinFrancis,VincentGuilbeau,EddyHung,KarimMatrah,GrégoireMesnil,SalimSémaoune,IainSmears,MichelTessier,IngridvonGlehn,PeteWarden,andofcoursemydearbrotherSylvain.SpecialthankstoHaesunPark,whogavemeplentyofexcellentfeedbackandcaughtseveralerrorswhilehewaswritingtheKoreantranslationofthefirsteditionofthisbook.HealsotranslatedtheJupyternotebooksintoKorean,nottomentionTensorFlow’sdocumentation.IdonotspeakKorean,butjudgingbythequalityofhisfeedback,allhistranslationsmustbetrulyexcellent!Haesunalsokindlycontributedsomeofthesolutionstotheexercisesinthissecondedition.
Lastbutnotleast,Iaminfinitelygratefultomybelovedwife,Emmanuelle,andtoourthreewonderfulchildren,Alexandre,Rémi,andGabrielle,forencouragingmetoworkhardonthisbook.I’malsothankfultothemfortheirinsatiablecuriosity:explainingsomeofthemostdifficultconceptsinthisbooktomywifeandchildrenhelpedmeclarifymythoughtsanddirectlyimprovedmanypartsofit.Andtheykeepbringingmecookiesandcoffee!Whatmorecanonedreamof?
1 GeoffreyE.Hintonetal.,“AFastLearningAlgorithmforDeepBeliefNets,”NeuralComputation18(2006):1527–1554.
2 DespitethefactthatYannLeCun’sdeepconvolutionalneuralnetworkshadworkedwellforimagerecognitionsincethe1990s,althoughtheywerenotasgeneral-purpose.
PartI.TheFundamentalsofMachineLearning
Chapter1.TheMachineLearningLandscape
Whenmostpeoplehear“MachineLearning,”theypicturearobot:adependablebutleroradeadlyTerminator,dependingonwhoyouask.ButMachineLearningisnotjustafuturisticfantasy;it’salreadyhere.Infact,ithasbeenaroundfordecadesinsomespecializedapplications,suchasOpticalCharacterRecognition(OCR).ButthefirstMLapplicationthatreallybecamemainstream,improvingthelivesofhundredsofmillionsofpeople,tookovertheworldbackinthe1990s:thespamfilter.It’snotexactlyaself-awareSkynet,butitdoestechnicallyqualifyasMachineLearning(ithasactuallylearnedsowellthatyouseldomneedtoflaganemailasspamanymore).ItwasfollowedbyhundredsofMLapplicationsthatnowquietlypowerhundredsofproductsandfeaturesthatyouuseregularly,frombetterrecommendationstovoicesearch.
WheredoesMachineLearningstartandwheredoesitend?Whatexactlydoesitmeanforamachinetolearnsomething?IfIdownloadacopyofWikipedia,hasmycomputerreallylearnedsomething?Isitsuddenlysmarter?InthischapterwewillstartbyclarifyingwhatMachineLearningisandwhyyoumaywanttouseit.
Then,beforewesetouttoexploretheMachineLearningcontinent,wewilltakealookatthemapandlearnaboutthemainregionsandthemostnotablelandmarks:supervisedversusunsupervisedlearning,onlineversusbatchlearning,instance-basedversusmodel-basedlearning.ThenwewilllookattheworkflowofatypicalMLproject,discussthemainchallengesyoumayface,andcoverhowtoevaluateandfine-tuneaMachineLearningsystem.
Thischapterintroducesalotoffundamentalconcepts(andjargon)thateverydatascientistshouldknowbyheart.Itwillbeahigh-leveloverview(it’stheonlychapterwithoutmuchcode),allrathersimple,butyoushouldmakesureeverythingiscrystalcleartoyoubeforecontinuingontotherestofthebook.Sograbacoffeeandlet’sgetstarted!
TIPIfyoualreadyknowalltheMachineLearningbasics,youmaywanttoskipdirectlytoChapter2.Ifyouarenotsure,trytoanswerallthequestionslistedattheendofthechapterbeforemovingon.
WhatIsMachineLearning?MachineLearningisthescience(andart)ofprogrammingcomputerssotheycanlearnfromdata.
Hereisaslightlymoregeneraldefinition:
[MachineLearningisthe]fieldofstudythatgivescomputerstheabilitytolearnwithoutbeingexplicitlyprogrammed.
—ArthurSamuel,1959
Andamoreengineering-orientedone:
AcomputerprogramissaidtolearnfromexperienceEwithrespecttosometaskTandsomeperformancemeasureP,ifitsperformanceonT,asmeasuredbyP,improveswithexperienceE.
—TomMitchell,1997
YourspamfilterisaMachineLearningprogramthat,givenexamplesofspamemails(e.g.,flaggedbyusers)andexamplesofregular(nonspam,alsocalled“ham”)emails,canlearntoflagspam.Theexamplesthatthesystemusestolearnarecalledthetrainingset.Eachtrainingexampleiscalledatraininginstance(orsample).Inthiscase,thetaskTistoflagspamfornewemails,theexperienceEisthetrainingdata,andtheperformancemeasurePneedstobedefined;forexample,youcanusetheratioofcorrectlyclassifiedemails.Thisparticularperformancemeasureiscalledaccuracy,anditisoftenusedinclassificationtasks.
IfyoujustdownloadacopyofWikipedia,yourcomputerhasalotmoredata,butitisnotsuddenlybetteratanytask.Thus,downloadingacopyofWikipediaisnotMachineLearning.
WhyUseMachineLearning?Considerhowyouwouldwriteaspamfilterusingtraditionalprogrammingtechniques(Figure1-1):
1. Firstyouwouldconsiderwhatspamtypicallylookslike.Youmightnoticethatsomewordsorphrases(suchas“4U,”“creditcard,”“free,”and“amazing”)tendtocomeupalotinthesubjectline.Perhapsyouwouldalsonoticeafewotherpatternsinthesender’sname,theemail’sbody,andotherpartsoftheemail.
2. Youwouldwriteadetectionalgorithmforeachofthepatternsthatyounoticed,andyourprogramwouldflagemailsasspamifanumberofthesepatternsweredetected.
3. Youwouldtestyourprogramandrepeatsteps1and2untilitwasgoodenoughtolaunch.
Figure1-1.Thetraditionalapproach
Sincetheproblemisdifficult,yourprogramwilllikelybecomealonglistofcomplexrules—prettyhardtomaintain.
Incontrast,aspamfilterbasedonMachineLearningtechniquesautomatically
learnswhichwordsandphrasesaregoodpredictorsofspambydetectingunusuallyfrequentpatternsofwordsinthespamexamplescomparedtothehamexamples(Figure1-2).Theprogramismuchshorter,easiertomaintain,andmostlikelymoreaccurate.
Whatifspammersnoticethatalltheiremailscontaining“4U”areblocked?Theymightstartwriting“ForU”instead.Aspamfilterusingtraditionalprogrammingtechniqueswouldneedtobeupdatedtoflag“ForU”emails.Ifspammerskeepworkingaroundyourspamfilter,youwillneedtokeepwritingnewrulesforever.
Incontrast,aspamfilterbasedonMachineLearningtechniquesautomaticallynoticesthat“ForU”hasbecomeunusuallyfrequentinspamflaggedbyusers,anditstartsflaggingthemwithoutyourintervention(Figure1-3).
Figure1-2.TheMachineLearningapproach
Figure1-3.Automaticallyadaptingtochange
AnotherareawhereMachineLearningshinesisforproblemsthateitheraretoocomplexfortraditionalapproachesorhavenoknownalgorithm.Forexample,considerspeechrecognition.Sayyouwanttostartsimpleandwriteaprogramcapableofdistinguishingthewords“one”and“two.”Youmightnoticethattheword“two”startswithahigh-pitchsound(“T”),soyoucouldhardcodeanalgorithmthatmeasureshigh-pitchsoundintensityandusethattodistinguishonesandtwos —butobviouslythistechniquewillnotscaletothousandsofwordsspokenbymillionsofverydifferentpeopleinnoisyenvironmentsandindozensoflanguages.Thebestsolution(atleasttoday)istowriteanalgorithmthatlearnsbyitself,givenmanyexamplerecordingsforeachword.
Finally,MachineLearningcanhelphumanslearn(Figure1-4).MLalgorithmscanbeinspectedtoseewhattheyhavelearned(althoughforsomealgorithmsthiscanbetricky).Forinstance,onceaspamfilterhasbeentrainedonenoughspam,itcaneasilybeinspectedtorevealthelistofwordsandcombinationsofwordsthatitbelievesarethebestpredictorsofspam.Sometimesthiswillrevealunsuspectedcorrelationsornewtrends,andtherebyleadtoabetterunderstandingoftheproblem.ApplyingMLtechniquestodigintolargeamountsofdatacanhelpdiscoverpatternsthatwerenotimmediatelyapparent.Thisiscalleddatamining.
Figure1-4.MachineLearningcanhelphumanslearn
Tosummarize,MachineLearningisgreatfor:
Problemsforwhichexistingsolutionsrequirealotoffine-tuningorlonglistsofrules:oneMachineLearningalgorithmcanoftensimplifycodeandperformbetterthanthetraditionalapproach.
Complexproblemsforwhichusingatraditionalapproachyieldsnogoodsolution:thebestMachineLearningtechniquescanperhapsfindasolution.
Fluctuatingenvironments:aMachineLearningsystemcanadapttonewdata.
Gettinginsightsaboutcomplexproblemsandlargeamountsofdata.
ExamplesofApplicationsLet’slookatsomeconcreteexamplesofMachineLearningtasks,alongwiththetechniquesthatcantacklethem:
Analyzingimagesofproductsonaproductionlinetoautomaticallyclassify
them
Thisisimageclassification,typicallyperformedusingconvolutionalneuralnetworks(CNNs;seeChapter14).
Detectingtumorsinbrainscans
Thisissemanticsegmentation,whereeachpixelintheimageisclassified(aswewanttodeterminetheexactlocationandshapeoftumors),typicallyusingCNNsaswell.
Automaticallyclassifyingnewsarticles
Thisisnaturallanguageprocessing(NLP),andmorespecificallytextclassification,whichcanbetackledusingrecurrentneuralnetworks(RNNs),CNNs,orTransformers(seeChapter16).
Automaticallyflaggingoffensivecommentsondiscussionforums
Thisisalsotextclassification,usingthesameNLPtools.
Summarizinglongdocumentsautomatically
ThisisabranchofNLPcalledtextsummarization,againusingthesametools.
Creatingachatbotorapersonalassistant
ThisinvolvesmanyNLPcomponents,includingnaturallanguageunderstanding(NLU)andquestion-answeringmodules.
Forecastingyourcompany’srevenuenextyear,basedonmanyperformancemetrics
Thisisaregressiontask(i.e.,predictingvalues)thatmaybetackledusinganyregressionmodel,suchasaLinearRegressionorPolynomialRegressionmodel(seeChapter4),aregressionSVM(seeChapter5),aregressionRandomForest(seeChapter7),oranartificialneuralnetwork(seeChapter10).Ifyouwanttotakeintoaccountsequencesofpastperformancemetrics,youmaywanttouseRNNs,CNNs,orTransformers(seeChapters15and16).
Makingyourappreacttovoicecommands
Thisisspeechrecognition,whichrequiresprocessingaudiosamples:sincetheyarelongandcomplexsequences,theyaretypicallyprocessedusingRNNs,CNNs,orTransformers(seeChapters15and16).
Detectingcreditcardfraud
Thisisanomalydetection(seeChapter9).
Segmentingclientsbasedontheirpurchasessothatyoucandesignadifferentmarketingstrategyforeachsegment
Thisisclustering(seeChapter9).
Representingacomplex,high-dimensionaldatasetinaclearandinsightfuldiagram
Thisisdatavisualization,ofteninvolvingdimensionalityreductiontechniques(seeChapter8).
Recommendingaproductthataclientmaybeinterestedin,basedonpastpurchases
Thisisarecommendersystem.Oneapproachistofeedpastpurchases(andotherinformationabouttheclient)toanartificialneuralnetwork(seeChapter10),andgetittooutputthemostlikelynextpurchase.Thisneuralnetwouldtypicallybetrainedonpastsequencesofpurchasesacrossallclients.
Buildinganintelligentbotforagame
ThisisoftentackledusingReinforcementLearning(RL;seeChapter18),whichisabranchofMachineLearningthattrainsagents(suchasbots)topicktheactionsthatwillmaximizetheirrewardsovertime(e.g.,abotmaygetarewardeverytimetheplayerlosessomelifepoints),withinagivenenvironment(suchasthegame).ThefamousAlphaGoprogramthatbeattheworldchampionatthegameofGowasbuiltusingRL.
Thislistcouldgoonandon,buthopefullyitgivesyouasenseoftheincrediblebreadthandcomplexityofthetasksthatMachineLearningcantackle,andthetypesoftechniquesthatyouwoulduseforeachtask.
TypesofMachineLearningSystemsTherearesomanydifferenttypesofMachineLearningsystemsthatitisusefultoclassifytheminbroadcategories,basedonthefollowingcriteria:
Whetherornottheyaretrainedwithhumansupervision(supervised,unsupervised,semisupervised,andReinforcementLearning)
Whetherornottheycanlearnincrementallyonthefly(onlineversusbatchlearning)
Whethertheyworkbysimplycomparingnewdatapointstoknowndatapoints,orinsteadbydetectingpatternsinthetrainingdataandbuildingapredictivemodel,muchlikescientistsdo(instance-basedversusmodel-basedlearning)
Thesecriteriaarenotexclusive;youcancombinetheminanywayyoulike.Forexample,astate-of-the-artspamfiltermaylearnontheflyusingadeepneuralnetworkmodeltrainedusingexamplesofspamandham;thismakesitanonline,model-based,supervisedlearningsystem.
Let’slookateachofthesecriteriaabitmoreclosely.
Supervised/UnsupervisedLearningMachineLearningsystemscanbeclassifiedaccordingtotheamountandtypeofsupervisiontheygetduringtraining.Therearefourmajorcategories:supervisedlearning,unsupervisedlearning,semisupervisedlearning,andReinforcementLearning.
SupervisedlearningInsupervisedlearning,thetrainingsetyoufeedtothealgorithmincludesthedesiredsolutions,calledlabels(Figure1-5).
Figure1-5.Alabeledtrainingsetforspamclassification(anexampleofsupervisedlearning)
Atypicalsupervisedlearningtaskisclassification.Thespamfilterisagoodexampleofthis:itistrainedwithmanyexampleemailsalongwiththeirclass(spamorham),anditmustlearnhowtoclassifynewemails.
Anothertypicaltaskistopredictatargetnumericvalue,suchasthepriceofacar,givenasetoffeatures(mileage,age,brand,etc.)calledpredictors.Thissortoftaskiscalledregression(Figure1-6). Totrainthesystem,youneedtogiveitmanyexamplesofcars,includingboththeirpredictorsandtheirlabels(i.e.,theirprices).
NOTEInMachineLearninganattributeisadatatype(e.g.,“mileage”),whileafeaturehasseveralmeanings,dependingonthecontext,butgenerallymeansanattributeplusitsvalue(e.g.,“mileage=15,000”).Manypeopleusethewordsattributeandfeatureinterchangeably.
Notethatsomeregressionalgorithmscanbeusedforclassificationaswell,andviceversa.Forexample,LogisticRegressioniscommonlyusedforclassification,asitcanoutputavaluethatcorrespondstotheprobabilityofbelongingtoagivenclass(e.g.,20%chanceofbeingspam).
1
Figure1-6.Aregressionproblem:predictavalue,givenaninputfeature(thereareusuallymultipleinputfeatures,andsometimesmultipleoutputvalues)
Herearesomeofthemostimportantsupervisedlearningalgorithms(coveredinthisbook):
k-NearestNeighbors
LinearRegression
LogisticRegression
SupportVectorMachines(SVMs)
DecisionTreesandRandomForests
Neuralnetworks
UnsupervisedlearningInunsupervisedlearning,asyoumightguess,thetrainingdataisunlabeled(Figure1-7).Thesystemtriestolearnwithoutateacher.
2
Figure1-7.Anunlabeledtrainingsetforunsupervisedlearning
Herearesomeofthemostimportantunsupervisedlearningalgorithms(mostofthesearecoveredinChapters8and9):
Clustering
K-Means
DBSCAN
HierarchicalClusterAnalysis(HCA)
Anomalydetectionandnoveltydetection
One-classSVM
IsolationForest
Visualizationanddimensionalityreduction
PrincipalComponentAnalysis(PCA)
KernelPCA
LocallyLinearEmbedding(LLE)
t-DistributedStochasticNeighborEmbedding(t-SNE)
Associationrulelearning
Apriori
Eclat
Forexample,sayyouhavealotofdataaboutyourblog’svisitors.Youmaywanttorunaclusteringalgorithmtotrytodetectgroupsofsimilarvisitors(Figure1-8).Atnopointdoyoutellthealgorithmwhichgroupavisitorbelongsto:itfindsthoseconnectionswithoutyourhelp.Forexample,itmightnoticethat40%ofyourvisitorsaremaleswholovecomicbooksandgenerallyreadyourblogintheevening,while20%areyoungsci-filoverswhovisitduringtheweekends.Ifyouuseahierarchicalclusteringalgorithm,itmayalsosubdivideeachgroupintosmallergroups.Thismayhelpyoutargetyourpostsforeachgroup.
Figure1-8.Clustering
Visualizationalgorithmsarealsogoodexamplesofunsupervisedlearningalgorithms:youfeedthemalotofcomplexandunlabeleddata,andtheyoutputa2Dor3Drepresentationofyourdatathatcaneasilybeplotted(Figure1-9).Thesealgorithmstrytopreserveasmuchstructureastheycan(e.g.,tryingtokeepseparateclustersintheinputspacefromoverlappinginthevisualization)sothatyoucanunderstandhowthedataisorganizedandperhapsidentifyunsuspectedpatterns.
Figure1-9.Exampleofat-SNEvisualizationhighlightingsemanticclusters
Arelatedtaskisdimensionalityreduction,inwhichthegoalistosimplifythedatawithoutlosingtoomuchinformation.Onewaytodothisistomergeseveralcorrelatedfeaturesintoone.Forexample,acar’smileagemaybestronglycorrelatedwithitsage,sothedimensionalityreductionalgorithmwillmergethemintoonefeaturethatrepresentsthecar’swearandtear.Thisiscalledfeatureextraction.
TIPItisoftenagoodideatotrytoreducethedimensionofyourtrainingdatausingadimensionalityreductionalgorithmbeforeyoufeedittoanotherMachineLearningalgorithm(suchasasupervisedlearningalgorithm).Itwillrunmuchfaster,thedatawilltakeuplessdiskandmemoryspace,andinsomecasesitmayalsoperformbetter.
Yetanotherimportantunsupervisedtaskisanomalydetection—forexample,detectingunusualcreditcardtransactionstopreventfraud,catchingmanufacturingdefects,orautomaticallyremovingoutliersfromadatasetbefore
3
feedingittoanotherlearningalgorithm.Thesystemisshownmostlynormalinstancesduringtraining,soitlearnstorecognizethem;then,whenitseesanewinstance,itcantellwhetheritlookslikeanormaloneorwhetheritislikelyananomaly(seeFigure1-10).Averysimilartaskisnoveltydetection:itaimstodetectnewinstancesthatlookdifferentfromallinstancesinthetrainingset.Thisrequireshavingavery“clean”trainingset,devoidofanyinstancethatyouwouldlikethealgorithmtodetect.Forexample,ifyouhavethousandsofpicturesofdogs,and1%ofthesepicturesrepresentChihuahuas,thenanoveltydetectionalgorithmshouldnottreatnewpicturesofChihuahuasasnovelties.Ontheotherhand,anomalydetectionalgorithmsmayconsiderthesedogsassorareandsodifferentfromotherdogsthattheywouldlikelyclassifythemasanomalies(nooffensetoChihuahuas).
Figure1-10.Anomalydetection
Finally,anothercommonunsupervisedtaskisassociationrulelearning,inwhichthegoalistodigintolargeamountsofdataanddiscoverinterestingrelationsbetweenattributes.Forexample,supposeyouownasupermarket.Runninganassociationruleonyoursaleslogsmayrevealthatpeoplewhopurchasebarbecuesauceandpotatochipsalsotendtobuysteak.Thus,youmaywanttoplacetheseitemsclosetooneanother.
SemisupervisedlearningSincelabelingdataisusuallytime-consumingandcostly,youwilloftenhave
plentyofunlabeledinstances,andfewlabeledinstances.Somealgorithmscandealwithdatathat’spartiallylabeled.Thisiscalledsemisupervisedlearning(Figure1-11).
Figure1-11.Semisupervisedlearningwithtwoclasses(trianglesandsquares):theunlabeledexamples(circles)helpclassifyanewinstance(thecross)intothetriangleclassratherthanthesquareclass,even
thoughitisclosertothelabeledsquares
Somephoto-hostingservices,suchasGooglePhotos,aregoodexamplesofthis.Onceyouuploadallyourfamilyphotostotheservice,itautomaticallyrecognizesthatthesamepersonAshowsupinphotos1,5,and11,whileanotherpersonBshowsupinphotos2,5,and7.Thisistheunsupervisedpartofthealgorithm(clustering).Nowallthesystemneedsisforyoutotellitwhothesepeopleare.Justaddonelabelperperson anditisabletonameeveryoneineveryphoto,whichisusefulforsearchingphotos.
Mostsemisupervisedlearningalgorithmsarecombinationsofunsupervisedandsupervisedalgorithms.Forexample,deepbeliefnetworks(DBNs)arebasedonunsupervisedcomponentscalledrestrictedBoltzmannmachines(RBMs)stackedontopofoneanother.RBMsaretrainedsequentiallyinanunsupervisedmanner,andthenthewholesystemisfine-tunedusingsupervisedlearningtechniques.
ReinforcementLearningReinforcementLearningisagaverydifferentbeast.Thelearningsystem,calledanagentinthiscontext,canobservetheenvironment,selectandperform
4
actions,andgetrewardsinreturn(orpenaltiesintheformofnegativerewards,asshowninFigure1-12).Itmustthenlearnbyitselfwhatisthebeststrategy,calledapolicy,togetthemostrewardovertime.Apolicydefineswhatactiontheagentshouldchoosewhenitisinagivensituation.
Figure1-12.ReinforcementLearning
Forexample,manyrobotsimplementReinforcementLearningalgorithmstolearnhowtowalk.DeepMind’sAlphaGoprogramisalsoagoodexampleofReinforcementLearning:itmadetheheadlinesinMay2017whenitbeattheworldchampionKeJieatthegameofGo.Itlearneditswinningpolicybyanalyzingmillionsofgames,andthenplayingmanygamesagainstitself.Notethatlearningwasturnedoffduringthegamesagainstthechampion;AlphaGowasjustapplyingthepolicyithadlearned.
BatchandOnlineLearningAnothercriterionusedtoclassifyMachineLearningsystemsiswhetherornotthesystemcanlearnincrementallyfromastreamofincomingdata.
BatchlearningInbatchlearning,thesystemisincapableoflearningincrementally:itmustbetrainedusingalltheavailabledata.Thiswillgenerallytakealotoftimeandcomputingresources,soitistypicallydoneoffline.Firstthesystemistrained,andthenitislaunchedintoproductionandrunswithoutlearninganymore;itjustapplieswhatithaslearned.Thisiscalledofflinelearning.
Ifyouwantabatchlearningsystemtoknowaboutnewdata(suchasanewtypeofspam),youneedtotrainanewversionofthesystemfromscratchonthefulldataset(notjustthenewdata,butalsotheolddata),thenstoptheoldsystemandreplaceitwiththenewone.
Fortunately,thewholeprocessoftraining,evaluating,andlaunchingaMachineLearningsystemcanbeautomatedfairlyeasily(asshowninFigure1-3),soevenabatchlearningsystemcanadapttochange.Simplyupdatethedataandtrainanewversionofthesystemfromscratchasoftenasneeded.
Thissolutionissimpleandoftenworksfine,buttrainingusingthefullsetofdatacantakemanyhours,soyouwouldtypicallytrainanewsystemonlyevery24hoursorevenjustweekly.Ifyoursystemneedstoadapttorapidlychangingdata(e.g.,topredictstockprices),thenyouneedamorereactivesolution.
Also,trainingonthefullsetofdatarequiresalotofcomputingresources(CPU,memoryspace,diskspace,diskI/O,networkI/O,etc.).Ifyouhavealotofdataandyouautomateyoursystemtotrainfromscratcheveryday,itwillendupcostingyoualotofmoney.Iftheamountofdataishuge,itmayevenbeimpossibletouseabatchlearningalgorithm.
Finally,ifyoursystemneedstobeabletolearnautonomouslyandithaslimitedresources(e.g.,asmartphoneapplicationoraroveronMars),thencarryingaroundlargeamountsoftrainingdataandtakingupalotofresourcestotrainforhourseverydayisashowstopper.
Fortunately,abetteroptioninallthesecasesistousealgorithmsthatarecapableoflearningincrementally.
OnlinelearningInonlinelearning,youtrainthesystemincrementallybyfeedingitdatainstancessequentially,eitherindividuallyorinsmallgroupscalledmini-batches.Eachlearningstepisfastandcheap,sothesystemcanlearnaboutnewdataonthefly,asitarrives(seeFigure1-13).
Figure1-13.Inonlinelearning,amodelistrainedandlaunchedintoproduction,andthenitkeepslearningasnewdatacomesin
Onlinelearningisgreatforsystemsthatreceivedataasacontinuousflow(e.g.,stockprices)andneedtoadapttochangerapidlyorautonomously.Itisalsoagoodoptionifyouhavelimitedcomputingresources:onceanonlinelearningsystemhaslearnedaboutnewdatainstances,itdoesnotneedthemanymore,soyoucandiscardthem(unlessyouwanttobeabletorollbacktoapreviousstateand“replay”thedata).Thiscansaveahugeamountofspace.
Onlinelearningalgorithmscanalsobeusedtotrainsystemsonhugedatasetsthatcannotfitinonemachine’smainmemory(thisiscalledout-of-corelearning).Thealgorithmloadspartofthedata,runsatrainingsteponthatdata,andrepeatstheprocessuntilithasrunonallofthedata(seeFigure1-14).
WARNINGOut-of-corelearningisusuallydoneoffline(i.e.,notonthelivesystem),soonlinelearningcanbeaconfusingname.Thinkofitasincrementallearning.
Oneimportantparameterofonlinelearningsystemsishowfasttheyshouldadapttochangingdata:thisiscalledthelearningrate.Ifyousetahighlearningrate,thenyoursystemwillrapidlyadapttonewdata,butitwillalsotendtoquicklyforgettheolddata(youdon’twantaspamfiltertoflagonlythelatestkindsofspamitwasshown).Conversely,ifyousetalowlearningrate,thesystemwillhavemoreinertia;thatis,itwilllearnmoreslowly,butitwillalsobelesssensitivetonoiseinthenewdataortosequencesofnonrepresentativedatapoints(outliers).
Figure1-14.Usingonlinelearningtohandlehugedatasets
Abigchallengewithonlinelearningisthatifbaddataisfedtothesystem,thesystem’sperformancewillgraduallydecline.Ifit’salivesystem,yourclientswillnotice.Forexample,baddatacouldcomefromamalfunctioningsensoronarobot,orfromsomeonespammingasearchenginetotrytorankhighinsearchresults.Toreducethisrisk,youneedtomonitoryoursystemcloselyandpromptlyswitchlearningoff(andpossiblyreverttoapreviouslyworkingstate)
ifyoudetectadropinperformance.Youmayalsowanttomonitortheinputdataandreacttoabnormaldata(e.g.,usingananomalydetectionalgorithm).
Instance-BasedVersusModel-BasedLearningOnemorewaytocategorizeMachineLearningsystemsisbyhowtheygeneralize.MostMachineLearningtasksareaboutmakingpredictions.Thismeansthatgivenanumberoftrainingexamples,thesystemneedstobeabletomakegoodpredictionsfor(generalizeto)examplesithasneverseenbefore.Havingagoodperformancemeasureonthetrainingdataisgood,butinsufficient;thetruegoalistoperformwellonnewinstances.
Therearetwomainapproachestogeneralization:instance-basedlearningandmodel-basedlearning.
Instance-basedlearningPossiblythemosttrivialformoflearningissimplytolearnbyheart.Ifyouweretocreateaspamfilterthisway,itwouldjustflagallemailsthatareidenticaltoemailsthathavealreadybeenflaggedbyusers—nottheworstsolution,butcertainlynotthebest.
Insteadofjustflaggingemailsthatareidenticaltoknownspamemails,yourspamfiltercouldbeprogrammedtoalsoflagemailsthatareverysimilartoknownspamemails.Thisrequiresameasureofsimilaritybetweentwoemails.A(verybasic)similaritymeasurebetweentwoemailscouldbetocountthenumberofwordstheyhaveincommon.Thesystemwouldflaganemailasspamifithasmanywordsincommonwithaknownspamemail.
Thisiscalledinstance-basedlearning:thesystemlearnstheexamplesbyheart,thengeneralizestonewcasesbyusingasimilaritymeasuretocomparethemtothelearnedexamples(orasubsetofthem).Forexample,inFigure1-15thenewinstancewouldbeclassifiedasatrianglebecausethemajorityofthemostsimilarinstancesbelongtothatclass.
Figure1-15.Instance-basedlearning
Model-basedlearningAnotherwaytogeneralizefromasetofexamplesistobuildamodeloftheseexamplesandthenusethatmodeltomakepredictions.Thisiscalledmodel-basedlearning(Figure1-16).
Figure1-16.Model-basedlearning
Forexample,supposeyouwanttoknowifmoneymakespeoplehappy,soyoudownloadtheBetterLifeIndexdatafromtheOECD’swebsiteandstatsaboutgrossdomesticproduct(GDP)percapitafromtheIMF’swebsite.Thenyoujoin
thetablesandsortbyGDPpercapita.Table1-1showsanexcerptofwhatyouget.
Table1-1.Doesmoneymakepeoplehappier?
Country GDPpercapita(USD) Lifesatisfaction
Hungary 12,240 4.9
Korea 27,195 5.8
France 37,675 6.5
Australia 50,962 7.3
UnitedStates 55,805 7.2
Let’splotthedataforthesecountries(Figure1-17).
Figure1-17.Doyouseeatrendhere?
Theredoesseemtobeatrendhere!Althoughthedataisnoisy(i.e.,partlyrandom),itlookslikelifesatisfactiongoesupmoreorlesslinearlyasthecountry’sGDPpercapitaincreases.SoyoudecidetomodellifesatisfactionasalinearfunctionofGDPpercapita.Thisstepiscalledmodelselection:you
selectedalinearmodeloflifesatisfactionwithjustoneattribute,GDPpercapita(Equation1-1).
Equation1-1.Asimplelinearmodel
Thismodelhastwomodelparameters,θ andθ . Bytweakingtheseparameters,youcanmakeyourmodelrepresentanylinearfunction,asshowninFigure1-18.
Figure1-18.Afewpossiblelinearmodels
Beforeyoucanuseyourmodel,youneedtodefinetheparametervaluesθ andθ .Howcanyouknowwhichvalueswillmakeyourmodelperformbest?Toanswerthisquestion,youneedtospecifyaperformancemeasure.Youcaneitherdefineautilityfunction(orfitnessfunction)thatmeasureshowgoodyourmodelis,oryoucandefineacostfunctionthatmeasureshowbaditis.ForLinearRegressionproblems,peopletypicallyuseacostfunctionthatmeasuresthedistancebetweenthelinearmodel’spredictionsandthetrainingexamples;theobjectiveistominimizethisdistance.
ThisiswheretheLinearRegressionalgorithmcomesin:youfeedityourtrainingexamples,anditfindstheparametersthatmakethelinearmodelfitbest
0 15
0
1
toyourdata.Thisiscalledtrainingthemodel.Inourcase,thealgorithmfindsthattheoptimalparametervaluesareθ =4.85andθ =4.91×10 .
WARNINGConfusingly,thesameword“model”canrefertoatypeofmodel(e.g.,LinearRegression),toafullyspecifiedmodelarchitecture(e.g.,LinearRegressionwithoneinputandoneoutput),ortothefinaltrainedmodelreadytobeusedforpredictions(e.g.,LinearRegressionwithoneinputandoneoutput,usingθ =4.85andθ =4.91×10 ).Modelselectionconsistsinchoosingthetypeofmodelandfullyspecifyingitsarchitecture.Trainingamodelmeansrunninganalgorithmtofindthemodelparametersthatwillmakeitbestfitthetrainingdata(andhopefullymakegoodpredictionsonnewdata).
Nowthemodelfitsthetrainingdataascloselyaspossible(foralinearmodel),asyoucanseeinFigure1-19.
Figure1-19.Thelinearmodelthatfitsthetrainingdatabest
Youarefinallyreadytorunthemodeltomakepredictions.Forexample,sayyouwanttoknowhowhappyCypriotsare,andtheOECDdatadoesnothavetheanswer.Fortunately,youcanuseyourmodeltomakeagoodprediction:youlookupCyprus’sGDPpercapita,find$22,587,andthenapplyyourmodelandfindthatlifesatisfactionislikelytobesomewherearound4.85+22,587×4.91
0 1–5
0 1–5
-5
×10 =5.96.
Towhetyourappetite,Example1-1showsthePythoncodethatloadsthedata,preparesit, createsascatterplotforvisualization,andthentrainsalinearmodelandmakesaprediction.
Example1-1.TrainingandrunningalinearmodelusingScikit-Learnimport matplotlib.pyplot as pltimport numpy as npimport pandas as pdimport sklearn.linear_model
# Load the dataoecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='\t', encoding='latin1', na_values="n/a")
# Prepare the datacountry_stats = prepare_country_stats(oecd_bli, gdp_per_capita)X = np.c_[country_stats["GDP per capita"]]y = np.c_[country_stats["Life satisfaction"]]
# Visualize the datacountry_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')plt.show()
# Select a linear modelmodel = sklearn.linear_model.LinearRegression()
# Train the modelmodel.fit(X, y)
# Make a prediction for CyprusX_new = [[22587]] # Cyprus's GDP per capitaprint(model.predict(X_new)) # outputs [[ 5.96242338]]
NOTEIfyouhadusedaninstance-basedlearningalgorithminstead,youwouldhavefoundthatSloveniahastheclosestGDPpercapitatothatofCyprus($20,732),andsincetheOECDdatatellsusthatSlovenians’lifesatisfactionis5.7,youwouldhavepredictedalifesatisfactionof5.7forCyprus.Ifyouzoomoutabitandlookatthetwonext-closestcountries,youwillfindPortugalandSpainwithlifesatisfactionsof5.1and6.5,respectively.Averagingthesethreevalues,youget5.77,whichisprettyclosetoyourmodel-basedprediction.Thissimplealgorithmiscalledk-NearestNeighborsregression(inthisexample,k=3).
-5
6
7
ReplacingtheLinearRegressionmodelwithk-NearestNeighborsregressioninthepreviouscodeisassimpleasreplacingthesetwolines:
import sklearn.linear_modelmodel = sklearn.linear_model.LinearRegression()
withthesetwo:
import sklearn.neighborsmodel = sklearn.neighbors.KNeighborsRegressor( n_neighbors=3)
Ifallwentwell,yourmodelwillmakegoodpredictions.Ifnot,youmayneedtousemoreattributes(employmentrate,health,airpollution,etc.),getmoreorbetter-qualitytrainingdata,orperhapsselectamorepowerfulmodel(e.g.,aPolynomialRegressionmodel).
Insummary:
Youstudiedthedata.
Youselectedamodel.
Youtraineditonthetrainingdata(i.e.,thelearningalgorithmsearchedforthemodelparametervaluesthatminimizeacostfunction).
Finally,youappliedthemodeltomakepredictionsonnewcases(thisiscalledinference),hopingthatthismodelwillgeneralizewell.
ThisiswhatatypicalMachineLearningprojectlookslike.InChapter2youwillexperiencethisfirsthandbygoingthroughaprojectendtoend.
Wehavecoveredalotofgroundsofar:younowknowwhatMachineLearningisreallyabout,whyitisuseful,whatsomeofthemostcommoncategoriesofMLsystemsare,andwhatatypicalprojectworkflowlookslike.Nowlet’slookatwhatcangowronginlearningandpreventyoufrommakingaccuratepredictions.