53
Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 Eric Just, Senior Vice President Levi Thatcher, Director of Data Science Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 Eric Just, Senior Vice President and Levi Thatcher, Director of Data Science [00:20] [Eric Just] Before we get started, I wanted to begin just with one of my favorite examples practical analytics and how it influences our everyday life. I spent a lot of time traveling for my work and I've really learned to appreciate the simple meaningful actionable predictive analytics that Uber provides. If you don’t know what Uber is, it's a ride-sharing system.

Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

DeployingPredictiveAnalytics:APractitioner'sGuideOctober13,2016

EricJust,SeniorVicePresidentLeviThatcher,DirectorofDataScience

DeployingPredictiveAnalytics:APractitioner'sGuideOctober13,2016EricJust,SeniorVicePresidentandLeviThatcher,DirectorofDataScience[00:20]

[EricJust]Beforewegetstarted,Iwantedtobeginjustwithoneofmyfavoriteexamplespracticalanalyticsandhowitinfluencesoureverydaylife.IspentalotoftimetravelingformyworkandI'vereallylearnedtoappreciatethesimplemeaningfulactionablepredictiveanalyticsthatUberprovides.Ifyoudon’tknowwhatUberis,it'saride-sharingsystem.

Page 2: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Uber[00:31]

Andit'skindoflikebeingabletocallataxifromyourphone.SoitismorethanthatbecausewhenyoumakearequestfromUber,theyknowyourGPSlocationofyourdevice.TheyalsoknowtheGPSlocationofdriverswhoarearoundyou.AndUberusesyourGPSlocationandtheGPSlocationofthedriversaroundyoutoestimatehowlongtheythinkitwilltaketoget[inaudible][00:55].IthinkitisnowavailableasanApplewatchapplication.Sowhatyouareseeinghereisapictureofmywatchyesterday,andatthattime,Ubersaiditwouldtake5minutestogetthedrivertome,anditwasallbasedonpredictiveanalyticsofmylocation,thelocationofthedriversaroundme,theroutesbetweenmeandthosedrivers,thetraveltime,thetimeofday,andperhapspasttraveltimesatthoseroutes.Andallofthatinformationiskindofputtogetherinthissimplenumber5minutesandthisisgreatbecauseitissuperactionable.IfIamhappywiththatnumber,Igoaheadandhitthatrequestandadrivershowsupinabout5minutes.IfI'mnothappywiththatnumber,maybeit'stoobig,maybeI'lljustcancelmyrequestandcallacab,andI'vedonethatbefore,andit'sjustagreatexampleofhowUberistakingallofthisdataaboutmeandthedrivers,compilingittoaverysimpleactionablenumberthat'sdeliveredrighttomywatch.

Page 3: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

AmazonandNetflix[01:55]

Andit'snotjustUber.Weliveinaworldwherepredictiveanalyticsispervasive.SowhenyoulogintoAmazonorNetflix,thisiswhatAmazonandNetflixthinkthatIwanttowatchbasedontheviewingpatternsoftheaccountthatIuse.Andifinteresting,Ithinkitisprettyobviousprobablytomanypeoplethatit'snotnecessarilymewhoiswatchingthesevideos.WhatwasgoingonhereisAmazonistakingthebuyingandviewingpatternsofmeandcomparingthemwithuserswhohavesimilarviewingpatternsandmakingsuggestionsbasedonthosepatterns.AndwhathappenshereismykidsloginontheweekendsandtheywatchallsortsofcartoonsandAmazonandNetflixboththinkIhaveastronginterestinwatchinga(02:40)groupofcartoonpuppiessolveproblems,that'swhatPAWPatrolis,butthetruthisI'mnotreallyinterestedinthesepredictiveanalytics.

AndaninterestingthingtothinkaboutisoneoftheassumptionsthatwemakeabouttheunderlyingdataasweusepredictiveanalyticsandIthinkwe'llponderonsomeofthesequestionsaboutwhatitmeansforhealthcarealittlebitlaterinourpresentation.

Page 4: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PollQuestion#1Howimportantarepredictiveanalyticsforthefutureofhealthcare?[03:04]

Wearegoingtohaveanotherquickpollquestion.So,howimportantarepredictiveanalyticsforthefutureofhealthcare?Notatallimportant,lowimportance,neutral,moderatelyimportant,extremelyimportant,orunsureornotapplicable.

[TylerMorgan]Okay.Eric,whilewe'rehavingeveryonerespondtothose,Iwouldliketoapologizetoeverybody.Wehaveacoupleofaudioissues.Ithinkwe'vegotthissortedout.Itlookslikewe(03:28)softwareforus.Weappreciateyourpatiencewithus.

Alright.Let'sgoaheadandsharetheresultsofourpoll.We'reshowing75percent,extremelyimportant.Eric,thatiswhytheyjoinedthewebinartoday.

[EricJust]ItsoundslikealittlebitofaselectionbiasbutIthinkitisimportantandwewilltalkabouthowwecanlowerthebarriertodoingpredictiveanalyticsinhealthcare.

[TylerMorgan]Alright.Justgettingbacktoshowthatscreen.

Page 5: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Predictiveanalyticsisaboutusingpatternrecognitiontopredictfutureeventsbut…[04:05]

[EricJust]Alright.Aswe'removingtohealthcare,sofirstofall,let'sjust–highlevel,predictiveanalyticsisaboutusingpatternrecognition.JustlikewetalkedaboutwiththeAmazonandNetflixexampleswithpatternsinthatdatathattheyareusingtopredictfutureevents.Wecanapplythattohealthcarebutit'sreallyimportanttounderstandthatpredictingsomethingisnotgoodenough.Youmusthavethedatatoactandintervene,andespeciallyinhealthcare,theorganizationalwherewithaltointervene.It'sonethingtopredictwhatvideosImightwanttoviewnextorwhatthingsImightwanttobuynext.It'sadifferentthingtostartrecommendingcarebasedonpredictiveanalytics.Soinhealthcare,thestakesarehigherbuttherewardsarepotentiallymuchgreater,andit'simportantforanorganizationaltobuyintothatriskbalanceandalsotoensurethattheanalyticsareincorporatedinappropriatewayintotheverycomplexoperationsofthehealthcareorganization.Soit'sdefinitelyadifferentgame.

Page 6: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

WhatisMachineLearning?[05:07]

Atthispoint,Iwantedtotalk–justI'lllaythedefinitionsandwehaveafewdefinitionslideshereinthepresentation.AndIjustwantedtokindofclarifytheseaswemovealongbecauseIwillbeusingsomeJargoninthepresentationandIthinkitisgoodtogeteverybodyonthesamepageaboutwhatthatmeans.

So,weoftenhearmachinelearningandpredictiveanalyticsinthesamebreathandsometimesevenmentionedsynonymously.Somachinelearningexploresthestudyandconstructionofalgorithmsthatcanlearnfromandmakepredictionsondata.Thenwithinthefieldsofanalytics,machinelearningisamethodusedtodevisemodelsthatlendthemselvestoprediction.Thisispredictiveanalytics.SothewaythatIliketothinkaboutitisthatmachinelearningisatechniquethat'susingpredictiveanalytics.Thereareotherwaystodopredictiveanalyticsbutmachinelearningisbyfarthemostpervasive,popular,andgrowingmethodrightnowforpredictiveanalytics.Thatiswhyyouoftenhearthemmentionedinthesamebreath.

Page 7: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PredictiveAnalyticsinHealthcare:"Classic"Approaches[06:11]

Predictiveanalyticsisnotcompletelynewtohealthcare.Sowhengoingallthewaybackto1987,theCharlsonIndexisactuallyapredictivealgorithm.Itisdesignedtopredictthemortalityofapatientwithmultiplecomorbidities.AndtheCharlsonIndexwasdonebyagroupthattookdatafromtheriskpatients,classifiedtheirconditionsinthecomorbidconditions,andtheyuseafairlynarrowsetofdata,fairlyeasy-to-getsetofadministrativedata,thentheycalculatethosecomorbidconditionsandrankthembasedontheirseverity,andtheycombinethatcombinedcomorbidityscorewithotherinformationaboutthepatientssuchastheiragetodeveloparelativeriskofthatpatientwhoisgoingtodieinthenext10years.Soitisapredictorofmortalityandithasactuallygainedwidespreadpopularity.WehearalotaboutCharlsonIndexstilltoday.

TheLACEIndexisanotherexampleofpredictiveanalyticsandLACEismeanttopredictreadmissionsandtheLACEgrouptookdatafromallacrossthecountryanddevelopamodelthatpredictsreadmissionsbasedonlengthofstay,acuity,comorbidities,andERutilization.That'swhatLACEstandsfor.Andthegroupiswetookdatafromalargenumberofdifferentorganizationscontributingtheirdataandtheydevelopedthismodelthatusesthoseinputstodeterminethepatient'sriskofreadmissionanditalsohasgainedwidespreadpopularityandwehearalotoforganizationsthatareimplementingLACE.

Page 8: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

WhatHasHappenedSince2010?[07:50]

Theinterestingthingaboutthesemodelsisthatwhiletheyareverygoodbecausetheyalloworganizationswithoutadeepmachinelearningcapabilitytodopredictiveanalyticsandtodoitonasmalloreasier-to-getsetofdata,thereareproblemswiththesemodels,andweareshowingheretwoissuesthathavecomeoutwiththismodel,andthesearejusttwoofmany.So,whatyouseeonthetopheadlineisthetopcitationofthatpatient.TheywereusingLACEtopredictreadmissionsforCHFpatients,CongestiveHeartFailurepatients.Andthenextone,theyweretryingtouseLACEtopredictreadmissionsforolderUKpopulations.Andwhattheyfoundwithbothoftheconclusions,itsaidthatLACEwasnotagoodpredictorforbothofthesespecificpopulations,andpartofthereasonforthatisthatwhentheLACEmodelwascreated,theywereusingdatafromalldifferentkindsofpatientsfromallacrossthecountry.Andweknow,forexample,thatthefactorsthatdriveanappendectomyreadmissionarequitedifferentthanthefactorsthatdriveacongestiveheartfailurereadmission.Inthelatemodel,allofthosearemixedtogetherandassoonasyoustartlookingspecificallyinusingLACEtotrytopredictthespecificpopulation,youloseyourpredictivevalue.Sothesegeneralmodels,whiletheyarehelpingtogetpeoplestarted,theyknewtheirpredictivevalueaswestarttolookmoreandmoreintospecificpopulations.Andanybodywhoisworkinginhealthcaretodayknowsthatwe'redoingalotofthat.Wearelookingintohowdowecareforthesespecificpopulations.Sothosemodelsdon'tholdupsowellforthatusecase.

Page 9: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

WhatHasHappenedSince2010?[09:21]

Sowhathashappenedsincethesemodelscameoutin2010?First,wetalkedaboutonthelastslidethelimitationsonthosemodels.Whiletheyaregoodtogetstarted,theylackintheirabilitytopredictspecificpopulations.Next,dataavailabilityhasgrownalotsince2010.Wehavebeenluckyenoughtobeapartoforganizationsthatareinvestingindatawarehouses.Afterthebiginvestmentinelectronichealthrecords,alotmoredatabecameavailableandthepremiseoftheindexmodelsisthattheyareusinganarrowsetofdatabutnoworganizationsjusthaveaccesstomuchdeeperrepositoriesofdata.Again,wehavebeenluckyenoughtobeapartofthatandseethatplayout.

Thereisalsomoreadvancedanalyticscapability.So,thebasicunderstandingonhowtousedatatoimprovebusinessprocessandtoimprovecarehastakenalargerpartofournationalfocusaswell.Organizationsarerepeatedlyusingdatatoimprovehealthcareandtheyarestartingtoaskthatnextlevelofquestion.Sowetypicallystartwithretrospectiveanalytics.Sowherehaveyougonewronginthepastandhowcanwefixthatmovingforward.Now,organizationsareaskingmorematuredquestionsabouthowdowegetaheadofmyproblemsandpredictiveanalyticsareofcourseabigpartofthat.

Andfinally,wehavemuchbettermachinelearningtoolssincethen.Soeveninarelativelyshortamountoftime,there'sbeenahugeexplosionofopensourcetools,ofonlineeducationthathelpedtospreadthismachinelearningandhowtodomachinelearning.Andso,thosebettertoolsarealsoapartofthisincreasedinterestinmachinelearning-basedpredictiveanalytics.

Page 10: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PollQuestion#2Whatisthebiggestbarriertoimplementingpredictiveanalytics?[11:14]

So,wearegoingtoaskanotherpollquestion.Whatisthebiggestbarriertoimplementingpredictiveanalytics?Wearelackingtherightpeopleorskills,wedonothavetherightdataortechnicaltoolsandinfrastructure,wedonothavetheexecutivesupportorbudget,pasteffortshavefailedtoshowresults,other,orunsureornotapplicable.

[TylerMorgan]Alright.Wewillgivesometimeforfolkstorespondtothepoll.Andwewouldliketoremindeveryone,wehavehadafewquestionsabouttheslides.Iwouldliketoleteveryoneknowwewillbesendingoutanemailaftertheeventwithlinkstotherecordedon-demandwebinar,aswellastheslidesaswell.

Solet'sgoaheadandlookatourresults.

Page 11: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Results[12:01]

Okay.Itlookslikeorganizations,thetoptworesponsesarepeopleorskillsandtherightdataortechnicaltools.Andhopefullythosepartsarewelladdressedwithintherestofthepresentationandexecutivesupportorbudgetisalsoaverygoodfactorandwewillhopefullyhavesomeinformationthatcanhelpconvincetheexecutivesthatthisisagoodthingtodoaswell.

Page 12: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Predictiveanalyticsiseasy(oratleasteasier!)[12:27]

Sothemainmessageofallofthispresentationisthatpredictiveanalyticsiseasy.It'satleasteasierandpartofthatisduetotheexplosionoftools.Butwhatorganizationsaretrulystrugglingwithismakingpredictiveanalyticsroutine,pervasive,andactionable.Andthatiswhatwewanttotalkabouttoday,ishowdowetakepredictiveanalyticsandmakeitsomethingthatiseasiertodoandroutineforanorganization.

Page 13: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Typical'CurrentState'forPredictiveAnalytics[12:57]

Thetypical'currentstate'ofpredictiveanalyticsisstillnotnecessarilyoptimizedforoperationalization.Whathappensisyou'vegotdatascientistsandtheymayhaveaccess,theymayhaveread-accesstoadatarepository.Andthefirstthingthattheydowhentheyhaveapredictivemodelthattheywanttodevelopistheywriteareallybigqueryagainstthatdatasourcebecausetheyneedtogetallofthedatapointstheythinktheyaregoingtoneedtomakeapredictionandtheyknowtheyaregoingtohavetomanipulatethatdata.SotheywritethisreallybigSQLqueryandthentheybringitintotheirtoolofchoice.ItcouldbeExcel,itcouldbeSAS,itcouldbeR,buttheideaofit,theygetallthatdataintotheirtoolbecausethat'swheretheyfeelcomfortablemanipulatingdata.Andthentheydothatdatamanipulation.Theygetthatdatainastatethatisreadytobeusedonapredictivemodelandagain,theyareusingatooloutsideoftheiranalyticsenvironmenttodothis.Thentheyapplythetoolsandalgorithms.So,example,SAS,we'vegotRandPython.Allofthesetoolsaretoolsthatareavailabletotakethatdataandturnitintoapredictivemodel.

Andthenoncetheyhavedevelopedapredictivemodel,there'sabigquestionmark,twoquestionsusually.Numberoneishowdowemovethisinaproduction;andthennumbertwoishowdoweactuallygetittoimprovecareorhowdowegetittoactuallyenhancethedecision.Sothatisoftentimesabigquestionandwehaveseenthatabunchoftimeswhereagoodpredictivemodelisdevelopedbutisneverreallydeployed.

Page 14: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

ThreeKeyRecommendationsforScalingPredictiveAnalytics[14:28]

Andthepointtoday,wewouldliketojusttalkaboutthreerecommendationsandthewaywewillbestructuringtherestofthepresentationisaboutthesethreerecommendations.Numberoneisfullyleverageyouranalyticsenvironmentandwewilltalkaboutwhatthatmeans.Butinanutshell,donotdoalotofdatamanipulationoutsideofyouranalyticsenvironmentbecausethenitbecomesasiloanditisverydifficulttore-use.Standardizetoolsandmessageandcreateproductionqualitycodethatyoufeelcomfortableputtingintoproduction.Ifyoudevelopagoodmodel,thelogicalnextstepisgoingtobetoputinproduction.Sohavingreallygoodcodetodothatisveryimportantandalsotohavestandardmethodswithyourteam.

Andthisoneislastbutitshouldreallybefirstbecauseitisthemostimportantofallofthesepoints–istodeployyourmodelswiththestrategyforintervention,makesureyouknowwhoisgoingtousethedatatochangewhattheyaredoingortohelpmakeadecisionandhowthatisgoingtobepresentedwiththem.Thatisthemostimportantpointofallthisandwewilltalkaboutwhatthatlookslikealittlebitlaterinthepresentation.

Page 15: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

FullyLeverageYourAnalyticsEnvironment[15:38]

Solet'stalkaboutfullyleveragingyouranalyticsenvironment.

WhatisaFeature?[15:42]

Page 16: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Hereisanotherpieceofjargon.Inpredictiveanalytics,afeatureissimplyaninputparameter.Justthinkofitasaninputtooneofyourdatamodels,andinmachinelearning,wecallitafeature.Sowhenyouhearmeusetheword'feature',justthinkoftheinputstothemodelthatI'mtryingtogenerateapredictionfrom.AndthisdefinitionisfromWikipedia.

LeverageYourAnalyticsEnvironment[16:09]

Andlet'sthinkaboutwhatananalyticsenvironmentis.Ananalyticsenvironmentoradatawarehouse,youcanthinkofasalmostlikeachock-fulloffeatures.You'vegotabunchofdatatherebutitisnotalwaysjustsittingthereinrawformat.You'vegotthingslikeclinicalregistries,youhavecomorbiditymodels,youhavecalculationsonreadmissions,lengthofstay,andothercalculatedfields.Soallofthesemakegreatfeatureinputstomodels.Butitisreallyimportanttounderstandthatread-onlyaccessisnotenough.Datascientistsandthefolkswhoaregeneratingpredictivemodelsneedtobeabletocreatetheirownfeaturesintheanalyticsenvironment.Wewillmakeastrongcaseforthathere.

Page 17: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PolypharmacyFeature[16:51]

Toillustratethepoint,wearegoingtoexploreapolypharmacyfeatureandthisisafeaturethatwedevelopedasaninputtooneofourmodels.Oneofourdatascientistswasdevelopingamodelforpredictingcomplicationsindiabeticpatients.Andifyoudonotknowwhatpolypharmacyis,theNewYorkTimeshererepresentsitasTheEver-MountingPileofPills'.Quitesimplyputtoanumberofmedicationsthatapatientisonatanygivenpointintimeandthere'sgoodexamplesintheliteraturewithpolypharmacybeingagoodpredictorofspecificoutcomes.Sothisdatascientistwantedtouseapolypharmacyfeatureinhismodel.

Page 18: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PolypharmacyDataMart[17:34]

Andwhenhelookedatthemedicationdatathough,itwasalittlebitmessy,andI'msuregiventhenumberofdataarchitectsandanalystswehaveonthecall,thisshouldnotcomeasasurprisethatthereismessydataunderneaththehoodinthedatawarehouseenvironment.Whatyouseeontheleft-handsideisatableofmedications.Andforeverypatientmedicationpair,there'sastartdateandanenddate,thedatethatapatientstartedaspecificmedicationandthedatetheyendedthat.

Youcanseeonthefarright-handsidethatthere'sseveralNULL.Sotheenddateisnotknownincasesanditisactuallymissingdatathatcanactuallybeverydamagingtoapredictivemodel.So,howdowecleanthatup?We'vegottounderstandwhat'sgoingontocreatethoseNULLvalues.Insomecases,thepatientdiesbeforetheenddateandinothercasesthepatienttookaone-timedosewheretheenddatewasnotputinbecausetherewasasingledoseofthemedication.Andfinally,thereisyetanothercasewherethepatientjusthasnotreachedtheenddateyet.Theyarestillonthemedication.

So,understandingallofthosebusinessruleshelpedourdatascientiststofillinappropriatemissingenddatesandcreatewhatyouseeontherighthandsidehere,whichiswhataninputtoapredictivemodellookslike.It'sverycleananditgivesusthatpolypharmacycount.Soforeverypatientencounterforanypointintime,wecaneasilytellhowmanymedicationsapatientwason.Andthisisanexampleofwhatwecallfeatureengineering.

Page 19: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

WhatisFeatureEngineering?[19:11]

"Featureengineeringistheprocessoftransformingrawdataintofeaturesthatbetterrepresenttheunderlyingproblemtothepredictivemodels,resultinginimprovedmodelaccuracy."Andfeatureengineering,inouropinion,isoneofthemostchallengingandinterestingpartsofdevelopingpredictivemodels.Itisalsorecognizedbyfolksoutthereontheinternetthat"muchthesuccessofmachinelearningisactuallysuccessinengineeringfeaturesthatalearnercanunderstand."So,featureengineeringisanabsolutelycriticalparttodatascienceandpredictiveanalytics.Wecannotunderscorethatpointenough.

Page 20: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

OtherExamplesofFeatureEngineering[19:53]

Otherexamplesoffeatureengineering,andI'msurethedataarchitectsanddataanalyticsonthephonewillrecognizehowsomeofthesethingssoundreallysimplebuttheyareactuallyalittlebitmorecomplicatedtoputtogetherthanyoumightthink.SothenumberofERvisitsinthelastyear.Fairlysimple.Thenumberoflinedaysthatapatientison.Sometimestheunderlyingdatapreventsachallengeincalculatingthat.Thenumberandtypesofcomorbidcondition,howdoyouclassifythosecomorbidconditions.Almostanyinputtoapredictivemodelwillneedtobeengineeredinsomeway.Andtheabilityfordatascientiststoengineerfeaturesiscriticaltothesuccessofpredictiveanalyticsandthemachinelearningstrategy.

Page 21: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

FullyLeverageYourAnalyticsEnvironment[20:36]

Andremember,thepointofthissectionistofullyleverageyouranalyticsenvironmentandoneofthemainreasonswhywesaythatisbecausetheanalyticsenvironmentisthebestplacetoengineerfeatures.Thedatascientisthastobeabletopromoteefficientre-useoftheengineeredfeaturesisonegreatexample.Soifwegobacktothatpolypharmacyexample,thatpolypharmacytableisnowsittinginthedatawarehouseandavailableforothermodelstouse.Sobyusingtheanalyticsenvironmenttodoourfeatureengineeringandnotdoingitinasiloedtool,wearepromotingre-useofallthatgreatwork.

Secondly,thedatawarehousehasstandardtoolstooperationalizeandruntheseonanightlybasis.WecallETLorExtract,TransformandLoadandthosetoolsareveryvaluableinproductionalizingthat.Soitbecomesmucheasiertoproductionalizetheminascriptinoneofthemachinelearninglanguages.

Page 22: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

StandardToolsandMethodsUsingProduction-QualityCode[21:36]

ThreeKeyRecommendationsforScalingPredictiveAnalytics[21:37]

Page 23: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

So,goingbacktoourthreekeyrecommendations,rememberthefirstwastofullyleveragetheanalyticsenvironmentandthenextistostandardizetoolsandmethodsusingproductionqualitycode.

YouNeedLotsofSmartPeople![21:51]

Asyoustarttoputforthadatasciencemachinelearningpredictiveanalyticsstrategy,youneedlotsofsmartpeopletodothis.Thisshouldnotbeasurprise.AndthetworolesthatIwanttotalktodayaretheyaresimilarbutdifferentroles.So,thedatascientistformulateshypothesesaboutfeaturesdrivingapredictivemodel.Thedatascientistisonewhoistalkingtocliniciansandtryingtounderstandtheunderlyingcausesofwhatistryingtobepredicted.Thedatascientistisdoingwhatwecallexperimentsandtryingvariousmodelstodeterminethebestapproachforprediction.Andthedatascientistisassessingthemodeloutputandlookingattheaccuracyandtryingtodecideonwhatthebestapproachis.

Themachinelearningengineer,likeIsaid,issimilarbutdifferent.Sothemachinelearningengineerhastohavealotofknowledgeofdatasciencebutoneofthechallengingthingsistofindsomebodywhohasaknowledgeofdatascienceandaknowledgeofsoftwareengineeringbestpractices–becauserememberwetalkedtoyouaboutgeneratingproductionqualitycode.AndoneofthebiggestimpactswehavehadonourgroupwaswhenwehiredLeviwhohasgotagreatmachinelearningengineerapproach.Heunderstandthedatascienceandhealsohasaknowledgeofsoftwareengineeringbestpracticesandthathasreallyhelpedustoscaleandwewilltalkaboutwhatwemeanbyscale.Butamachinelearningengineerisawonderfulthingtohaveandwewilltalkaboutthefruitsofourmachinelearningengineeringeffortsalittlebitlater.

Page 24: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PredictiveAnalyticsProcesses[23:25]

Andnowwearegoingtotalkaboutwhatkindofcodeyouneed,Ithinkitisgoodtotalkalittlebitaboutthepredictiveanalyticsprocessesandwhatisitthatadatascientistisdoingthatwewanttotrytooperationalize.Andthere'stwopiecesofthis.Oneisadevelopmentprocess.Andlet'sviewtheexampleofareadmissionprediction.Letussaywearetryingtodevelopamodeltopredictreadmissions.Thedatascientistisgoingtofirstofallidentifywhichpatientswerereadmittedandthenwhichpatientswerenotbutitisimportanttounderstandwhattheoutcomeswere.Andthentheyaregoingtogathera30to40-featureinputs,andthisiswherehypothesisgenerationtakesover.Theyarehypothesizingwhatarethe30or40mostlikelythingstodrivereadmissions.Andthatdataset,that30to40inputfeaturesandtheoutcomeisthensplitintotwopieces.One,wecallthetrainingsetandonewecallthetestset.Andthetrainingsetiswhatwecrunchallthenumbersonandthatiswherethemodelisgeneratedfrom.Thetestsetiswhatweusetomeasuretheperformanceofthatmodel.Soitisimportanttoholdbacksomedatasothatwecanseehowwellourpredictionwouldhavedoneonpredictingtheitemsinthetestset.Andso,thedatascientistisrunningmultiplealgorithmsonthattrainingset.Theyarelookingatlotsofdifferentcombinationsoffeaturesandlotsofdifferentalgorithms.Andforeachoneofthose,theyaremeasuringtheperformanceanddecidingwhatthebestmodelis.AnditisaniterativeprocesssoI'vedrawnthisarrowgoingbacktothebeginning.Sometimesyouneedtogobacktosquareone.Buteventuallyyougettowhatyouseeanorangebox,whereyou'vegotthebestalgorithmandasmallerlistofimportantfeatures,usuallyaround10orso.Onceyou'vedevelopedyourmodel,youcanthenstorethoseparametersforlateruse.Again,thisdevelopmentprocessiswherethereally

Page 25: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

intensecomputation.Wearelookingatmillionsofrecordsandcrunchingnumbersandlookingforpatternsandthenextractingthepatterns.

Butoncewegettoamodel,thenextstepistorunthemodel.Andrunningthemodeliswhatoccurseverydaymultipletimesaday,muchlesscomputationallyintensive,usingtheoutputofthedevelopmentprocess.Now,ifwearegoingtodoareadmissionprediction,wedonotneedtocrunchnumbersonmillionsofpatientsordata.Wehavedonethatinthedevelopmentprocess.Now,it'samatteroflookingatwhoarethepatientswhojustcamein,let'sgetthose10importantfeaturesonthoseandwriteitonerecordatatime,calculatethatpredictionandoutputittothedatawarehouse.So,runningthemodel,muchlesscomputationallyintensive,butthisistheparttheygetputinproductionandisruneveryday,eitheraspartofanETLprocessorpartofawebservice.Wewilltalkaboutthedifferentwaysthatitcanbedeployed.Thesearejustthetwodifferentthingsthatamachinelearningcodeshouldbeabletoaddress.Inthedevelopmentprocess,itisimportanttostandardizeonpiecesofthat,andrunningthemodelthat'swherewewanttohavereallyrobusttestingcode,sowecanputinproduction.

DevelopingaMachineLearningCodeBase[26:33]

So,Iwanttoaddresswhyyouwouldwantacodebaseorasoftwaretohelpyoudothis.Thereisalotoftoolsouttherethatmakeitreallyeasytowritesomeofthesescriptsbutitisimportanttofocusthedatascienceonthemodeldevelopmentandnotnecessarilywritingthecode.

Thecodeissomethingthatisstandardizableandthedatascientistpart,thequestionsthattheyareasking,whatfeaturesdoIuseforinput,howdoImodelthosefeaturesinthedatabase,

Page 26: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

howdoIcomparetheperformanceofthesetwodifferentmodels.That'stherealvalueoutofadatascientist,notnecessarilywritingcodeorpossiblyreinventingthewheelthatsomebodyintheirdepartmentmayhavealreadydone.Sohavingastandardcodebasealsoallowsateamofdatascientisttostandardizethemethodologies.Itisarealproblemifyourdatascientistsareusingtwodifferentpiecesofsoftwaretocreatetheirmodelsandevenmoreofaproblemiftheyaremeasuringtheperformanceoftheirmodelsindifferentways.Howareyouevergoingtoknowwhatthebestmodelisifweareusingdifferentyardsticks.Sothatstandardizationpieceisimportantheretootohavethatorganizationalcodebasethatdatascientistscanusesothattheyareusingthesamemethods.

AndthenfinallythepointthatIhavemadeabunchoftimesandIprobablywillnotmakemuchmorethanthisisthatputtingmodelsinproductionreallyrequirethatproductionqualitycode.Youdonotwanttoputanythingthatmightbreakintheproduction.

DevelopingaMachineLearningCodeBase:BestPractices[27:59]

Andasweweredevelopingourmachinelearningcodebase,wethoughtitwasreallyimportanttoadheretosoftwaredevelopmentbestpractices.Andsoftwaredevelopmentbestpracticesareusedinthesoftwaredevelopmentworldtosolvealotofthesesameproblems.So,howdowecreatearobust,re-usablecodebase.Oneofthefirstthingsthatwedidwasuseversioncontrol.Andversioncontrolisusedbysoftwaredevelopersandallowsmultipledeveloperstocontributecodetoasinglerepository.Andbykeepingitasasinglerepository,manypeoplecanbeeditingthesamecodebaseatthesametimeandthenthere'stoolstomakesurethatpeopledonotsteponeachother'stoesandwhenthereisaconflict,thatitcanberesolved.Soitisreallyimportantforteamsofdatascientiststohaveversioncontrolwiththeircodebase.

Page 27: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Theotherthingthatisveryimportantforthisisunittesting.Andunittestinghasbeenusedinthesoftwaredevelopmentworldformanymanyyears.Andtheideaofunittestingisthatassoftwarebecomesmoremodularandmorere-used,itbecomesaloteasiertoaccidentallybreaksoftware.Andagoodsoftwarecodebaseisefficientanditisre-usingcodebutyou'vegottomakesurethatasyoumakechanges,yourchangesarenotresultinginunexpectedconsequences.Sounittestingbasicallydoestestingofallofthefunctionsinyoursoftwaretomakesurethattheoutputisasexpected.SoifImakeachangetothesoftwareandthatchange,I'mnotsurehowitisgoingtoaffecttherestofthesoftwareifIruntheunittestandtheyoutrun,IcanbefairlyconfidentthatIhaven'tbrokenanythingdownstream.Sothesearesomeofthesoftwaredevelopmentbestpracticesthatarerequiredforhavingagoodcodebase.

There'salsothingslikedocumentation,howdowegetpeopletofindallofthefunctionalityavailableinthesoftware,andcontinuousintegration,eitherallbestpracticesthatweusedinthedevelopmentofourmachinelearningcodebase.

DevelopingaMachineLearningCodeBase:TechnologyChoices[30:12]

Soifyouaregoingtoembarkondevelopingamachinelearningcodebase,andpleasestayonfortheentirepresentationbecausewehavegoodreasonswhyyoumightnotwanttodevelopyourown,butifyouare,there'safewtechnologychoicesoutthere.OneisRandRisalanguagethathasbeendeployedanddeeplyentrenchedinhealthcare.IamsuremostoftheanalystsandstatisticiansareatleastfamiliarwithRonthecall.Ithadbeenaroundforalong

Page 28: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

time.Andbecauseofitbeingananalyticsenvironment,itismorefamiliartoanalystsandstatisticians.

Pythonisanotherlanguagethat'soutthere.Itisafullyfunctionalsoftwarepredominantlanguage.Thelanguageitselfisnotnewbutalotofthetoolsthathavebeendevelopedformachinelearningarenewerandthere'slotsofmomentumbehindPython.Asamatteroffact,alotoftheonlinelearningbaseinmachinelearningusesPythonasthelanguageinwhichalotofthenewdatascientistsarebeingtrained.AndPythonismorefamiliartosoftwaredevelopersanddataanalystsbecauseitiskindofafull-featuredsoftwareprogramminglanguage.

AzureMLisaCloud-basedsolutionfromMicrosoft.BecauseitisCloud-based,itisveryeasytosetupanddeploy.Thereisnoinstallationrequired.YoucanjustkindofcreateanAzureMLaccountandstartcreatingmodelsintheCloud.BecauseitisCloud-basedandbecauseweareinhealthcare,theadoptionofAzureMLisalittlebitlessthanyoumightexpectandyouhavereadsomestoriesaboutorganizationsthatareleveragingAzureMLforpredictiveanalyticsinhealthcareandtheyhavetode-identifyandscrubtheirdatabeforetheyputinAzuretotheirmodels.AndeventheexamplethatIreadworkingwithdatesandtheyhadtomaskthedates,andadateisactuallyaninputtoyourpredictivemodel.Tomethat'salittlebitriskytostartmanipulatingdatestomaskthedataifyouwanttogetagoodpredictivemodel.Soforthatreason,IthinkAzureMLhasnotseenthewidespreadofoptioninhealthcarethatyoumightexpectinatleastinotherindustries.

There'splentyofotherchoicesbutIthinktheindustryrightnowisstandardizingonRandPythonandthat'swhereweputourefforts.WehavedevelopedsoftwareinbothRandPythontodoourmachinelearningcodebase.AndthereasonwhywechosethatisRisprobablymorepopularrightnow.There'ssupportfrommajorvendors,likeSQLServer,MicrosoftSQLServer.Pythonismoreoftheupandcomingapproach.Sowewanttobereadyforbothofthoseandourclientshavedifferentpreferencesaswell.Soweaddressedbothofthem.

Page 29: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

OurCodeBaseIncludes:[32:55]

SoourCodeBaseincludestoolsfordataingestion.Sowehavebeentalkingalotabouthowdoweleveragetheanalyticsenvironmentwithourmachinelearningcode.Wellwe'vegottobeabletoveryquicklyandeasilygetdataoutofthatenvironmentintoourcodebase.Sowehaveroutinesthatloaddatafromthedatabaseorflatfile.Dateandcameisimportantinmachinelearning.Sowehavetoolsthatallowustoexpanddate/timeintothingslikedayoftheweek,weekoftheyear,makethatreallyeasy.Missingvaluescanreallycomplicateandmakepredictionsnotverygood.Sowehaveacoupleofroutinesfordealingwiththoseindifferentwaysandbyallmeans,thewaythatyoudealwithmissingvaluesisdifferentfordifferentmodelsanddifferentusecases.Soyouwanttoprovidefunctionsforthat.

Wealsoprovidealargetoolsetaroundthemodeldevelopment.Thisiscalledthenumbercrunchingthatwetalkedaboutinthatworkflowofthedatascientist.Sosplittingthatdatabetweentestandtraining.Doingfeatureselection,howdowegetfrom40featuresdownto10features.Andthenofcoursethemachinelearningalgorithmsthemselves–whatarewerunningonthedata.RandomForestisaverypopularalgorithm,Lassoisaregression-basedmethod,andthenMixedModelsarekindthatthosethathave(34:12)longitudinaldatainhealthcareeasierandK-meansplusduringwhichwewillbeusingextensivelynextyearinourcodebaseaswell.Andthenallthetoolstoevaluatethatperformanceandhelpthedatascientistdecidewhat'smybestmodel.

Inadditionofthedevelopmenttools,wehaveanalysistools–so,howdowegenerateourperformancereportforthemodelsthatwe'recreating,andthentoolsliketohelpidentifywithtrendidentificationandbeingabletoperformwithadjustedcomparisonsarepartoftheanalysissuiteinourcodebase.

Page 30: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

ScalingPeople[34:48]

Thegoodthingaboutthesoftwareisthatithasreallyhelpedustoscalepeople.Andwhenwethinkaboutwhatarethebigchallengesindatascientists,andthishascomeupoverandoveragain,thebigchallengeisthatfeatureengineeringpieceandhowdowerepresentthedata.Anditturnsout,dataarchitectshavegreatdomainknowledgeofhowtodothat.Theyhavebeenmovingdatainhealthcareandanalyzingdataanddevelopingtheroutinetogetdataintodifferent,transformingdataintousableformatforyears.Theyarealsooftenlookingforopportunitiestoadvancetheircareerandskills.Andwhatwefoundisthatgiventherighttools,dataarchitectsmakeincrediblefeatureengineers.Giventheiryearsandyearsofexperienceinmanipulatingdata,wearejustapplyingthemtoadifferentproblemanditworksreallywell.Andthenwhatourcodehasdoneisithasalloweddataarchitectstoeasilygetstartedinactuallyrunningpredictiveanalyticsalgorithms.

Andthisisaquotefromoneofourdataarchitectswhowasusingoursoftwaretocreateapredictivemodelinoneofhisproducts.AndthisisPeterMonaco,andhesaid,"OneawesomethingabouttheoutputfromtheRpackageyouputtogetheristheoutputalignsperfectlywithcreatingPatientStratificationalgorithms.ThefactthatIfeelcomfortablerunningthisstuffspeakstohoweasyyouhavemadeit.Thanksagain,Levi."Andhe'sthankingLeviwhoyouaregoingtohearfrominalittlebit.Butthisisgreat.ItallowedPetertodowhathedoesreallywell,getthedatainagoodformat,andlowerthebarrierforhimtoactuallyrunthesealgorithmsanddosomeoftheworkthatdatascientistdoes.Soweseeit'sverypromisingtohelpingustoscaleourmachinelearningeffortsacrossalargenumberofpeopleintheorganization.

Page 31: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PuttingPredictiveModelsinProduction[36:37]

Modality#1:Extract,Transform,Load(ETL)Process[36:37]

Page 32: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Sonowitcomestimetoputmodelsinproductionandwearegoingtotalkfirstabouthowwemovethemintoproductionfromatechnicalstandpointandthenhowdowemovethemintoanapplicationorviewthatcanactuallychangebusinessprocessorbetteryet,providecareforpatients.

SoModality#1istoputthemallinproduction,leveragingtheETLprocess.Andthisisappropriateifthepredictionisnotbasedonhighlydynamicdata,oriftheinterventionstrategyisokaywithsomeleveloflatency.So,anexampleofthiswouldbeareadmissionproduction.Typically,readmissionalgorithmsarenotbasedonhighlydynamicdata.Theyarebasedondatathatisnotchangingsuperfast.Soifwearepullingdataonanightlybasisorevery12hours,areadmissionalgorithmisgenerallygoingtobeokaywiththat.Andinthiscase,wejustputthemachinelearningcodeinthemiddleoftheETLprocess.So,we'vegotETLtoloadthedatasources.We'vegotdataETLthatscientistsordataarchitectscreatethatloadthoseengineeredfeatures,theinputstoapredictivemodel,andthenwerunthatcodethatcaneasilygrabthosefeaturesfromthedatabaseandoutputaproductiontothedatabase.Ourmachinelearningcodecanalsowritethesepredictionstothedatabaseandthisishowwehavedeployedseveralmodels.ItiseasyanditjustwrapsrightupwiththeETLprocess.

Modality#2:WebServices[38:07]

Modality#2iswhenthedataismoredynamic.Soanexampleofthiswouldbesepsisearlydetection,wherewearelookingatchangesinvitalsignsandtheinterventionstrategy,wecannotwaitupto24hourstointervenewhensepsishappened.Itissomethingweneedtointervenefasteron.Sointhiscase,wecandeploythepredictivealgorithmasawebservice,andthewebserviceisreceivingrealtimefeatures.Sothosechangesinvitalsigns,withthose

Page 33: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

vitalsigns,that'sgoingtocomeinfromaverydynamicsetting.Youmightstillbeusingsomehistoricfeatureslikewhatarethedemographics,whatistheageofthepatient,thatwecanpullfromtheEDW.Andthen,thewebservicewillbecombiningthatliveinputwiththathistoricdatatorunthatmachinelearningcodeandthenoutputthemodelbackintotheapplication.Sothisisdefinitelydesignedformoredynamicsituationsandmoredynamicpredictions.

ScalingPredictiveAnalytics[39:08]

Page 34: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

DeployWithaStrategyforIntervention[39:08]

So,goingbacktothispointofdeployingwithastrategyforintervention,thisisthereal–Iwouldcallthisthemostimportantpointinthepresentation.So,theideahereishowdowedeployandgetthesepredictionstoactuallyimpactcare.

Page 35: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

CaseStudy:CentralLineAssociatedBloodstreamInfection(CLABSI)[39:26]

I'lltalkaboutalittlecasestudythatwedidwithoneofourclientsonCentralLine-AssociatedBloodstreamInfectionsorCLABSIs.Approximately41,000patientsactuallyendupwiththiscondition,41,000patientsintheUSperyear,thatshouldread,andactuallyoneforpatientsthatgetaCLABSIwhodie.Soitisaveryseriouscondition,andorganizationsarereallystrugglingtokeepupwiththis.There'sgreatguidelinesoutthere,evidence-basedguidelines,forhowtocareforpatientssuchasreducethelikelihoodoftheCLABSIandweworkedwithaclienttodevelopretrospectiveanalyticstolookattheircompliance.Anditreallyhelpshighlightsomeproblemsandtheygotreallygoodatusingthedatatofindproblemareasandthendevelopinginterventionstofixthose.Sotheydevelopedamusclemanofusingdatatoimprovetheircareandbusinessprocesses.Thentheysaid,okay.Takeustothenextstep.Nowwedonotwanttoknowwherewefailed,wewanttoknowwhatiscomingnext,whoarethepatientsthat'sahighriskforCLABSIsothatwecanintervenewiththem.Andso,theycametoLeviandLeviandhisteamdevelopedapredictivealgorithmthatisbasedon16featuresthatpredictsthelikelihoodthatapatientisgoingtodevelopCLABSI.Wewillseewhatthatlookslikeinaminute.

Page 36: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

ModelPerformanceReport[40:40]

Itisimportantthateverymodelthatwedevelopandeverymodelthatwedeploycomeswithaperformancereport,andthisperformancereportisnotahighlytechnicalreport.Andtheideahereisthatwearetryingtobrieflysummarizewhatwearetryingtopredict,thevariablesthatwereconsideredinthatorthefutureinputsthatwereconsidered,anddevelopingthatmodel,whatmodelwechooseintheaccuracyofthatmodel?Andthisreportisnotusedfortechnicalpeoplebutthisreportisusedforbusinessorclinicalpeopletohelpthemunderstandthatalgorithm.Communicationaboutwhatanalgorithmdoesisit'sextremelycriticaltotheadoptionofthatmodel.

Page 37: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

DiscussingModelsWithClinicians[41:22]

Whendiscussingmodelswithclinicians,clinicianswilladoptpredictiveanalyticsinsofarastheyunderstandit.Iftheydonotunderstandwhatisgoingon,itisgoingtobeamuchharderconversationbecauseifyouthinkaboutwhatcliniciansdo,theyarerunningpredictivealgorithmsintheirheadallday.Theyarelookingatlargeamountsofpatientdataandboilingthatdowntohypothesesorconclusionsaboutthesepatients.Whatwetrytodowithmachinelearningisstandardizethat.Typically,doctorsdonotdothatonthesameway.Sowehelpthemtostandardizeit.Andiftheyunderstandhowwearedoingit,anditisclosetowhattheyaredoing,theyaremuchmorelikelytoadoptit.

Theotherpointthatweshouldmakeone,talkingaboutdeployingpredictivemodels,isthatcomplexitycomesoutofprice?Anditcomesinanextremeprizesometimes.So,aregressionmodelcanoftenstrikeabalancebetweenpredictivevalueandinterpretability.Regressionmodelsareonewaytodoprettygoodanalyticsandtherearemoresophisticatedmachinelearningalgorithmsthataremuchhardertoexplain.Soifyouhavetwomodels,aregressionmodelandamoreadvancedmodelthatmighthavemagicallybetteraccuracy,theregressionmodelstillmaybethemorefavorablemodeltodeploybecauseitiseasiertoexplain.Andinourcode,weuseaprocesscalledregularizationthatpenalizecomplexity.SotheLassoalgorithmisaregressionalgorithmthatactuallyhasbuiltintoittheabilityforthemodelitselftotuneitselfandtocreateafavor,amoresimplemodel.AndthewaythatWikipediasaysisthatenhancesthepredictionaccuracyintheinterpretabilityofthemodel.So,supersuperimportant.Complexitycomesataprize,especiallyinaclinicalsetting,forerganizationsthatarenewtopredictiveanalytics.

Page 38: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

WhatDoesItLookLike?[43:09]

Andwhatdoesitalllooklike?Now,thisisthepunchline.Thisiswhatpredictiveanalyticslookslikewhenitisputinfrontofcliniciansandtheyaregivingthemtheabilitytomakedecisionsbasedonthis.AndthisisliketheUbertellingthefiveminutestoacaronmywatch.AndwhatyouseehereisaunitscoreboardforclientwhowasdoingCLABSI,andIhavetosaythiswasalsogeneratedonscrubbed,de-identifieddataset,butitfairlyaccuratelyreflectswhattheclientwasseeingintheirenvironment.SoIjustwantedtomakeitclearthatthisisallde-identifieddata.Butwhatyouseehereisinthatgreenbox,thereare12patientswhoareanactivelistforCLABSIonthisunit.Andtheunitreviewedthisdashboardmultipletimesperday.

Andwhatyouseebelowinthislistofpatientshereisthenameofthepatientagainst(44:08)andtheprobabilitythattheyactuallyhaveaCLABSI,andthehighestriskpatientsareatthetopheresothattheeyegoesimmediatelytothepatientswhohavethehighestrisk.Thistoppatienthasa64percentchanceofdevelopingaCentralLineAssociatedBloodstreamInfection.

Andtheimportantthinghereisnottojustleavethatnumberbutwhatgoesoninthisfarright-handcolumn.Thisfarright-handcolumnshowstheriskfactorsthataredrivingthatprediction.Andmoreover,itisnotjusttheriskfactors,it'swhatwecallmodifiableriskfactors.So,thepatient'sageisoftenariskfactorintheirdevelopmentofaCLABSI.However,thereisnotmuchacliniciancandoaboutthat.Sowhatwedoisweshowthefactorsthatthecliniciancanactuallychangeormodifytogetthatpatienttoalesserriskstate.Andinmanycases,it'sthenumberofdaysthattheyhavebeenontheCentralLine,theplacementofthatline.There'sallsortsoffactorsthatdriveit,andwhatacliniciandoistheylookatthisasawholeanddecidewhatcanIdotoreducethatpatient'schanceofdevelopingaCLABSI.Wearereallyexcitedaboutthis.Ithasbeendeployedinourproductionenvironmentatoneofourclientsandwe

Page 39: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

areworkingwiththemtounderstandlong-termhowwillthisaffecttheCLABSIratefortheirpatients.Butthisisjustoneexampleofwhatitmeanstoputpredictiveanalyticsinthehandsofclinicianswhocanmakedecisionsofcarebasedonthis.

ModelsBuilttoDate:[45:42]

Themodelswebuilttodate,thisisjustalistingofthem.CLABSIisjustoneofmanymodels.Wewantedtofocusononehighimpactexample.Butasyoucansee,wehavealotofalgorithmsthatwehavebuilttodatethataredrivingdecisionsacrossthecountryandlotsmoreindevelopmentandlotsandlotsofideas.So,thishighlight,Ithink,needforustoscaleourmachinelearningandpredictivecapability.Onlysomuchthatoneteamcanuse,andthat'spartofourstrategy–istousethatsoftwarethatwehavedevelopedtomakeiteasierforlotsofpeopleintheorganizationtobeabletodothis.

Page 40: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

PollQuestion#3Whatareorwouldbethetopthreemostimportantdatasourcestoyourorganizationinmakingpredictions?(select3,ifapplicable)[46:20]

We'vegotonemorepollquestion.Whatarethetopthreeimportantdatasourcestoyourorganizationinmakingpredictions?ClinicalEMRdata,claimsdata,patientoutcomesdata,financialdata,non-medicalpatientdata,patientsatisfactiondata,orunsureornotapplicable?

[TylerMorgan]Alright.We'vegotthatpollquestionup,Eric.AndIwouldliketoleteverybodyknowthisisamultipleselection.Sopleaseselectuptothree,ifapplicable.Wewillleavethisopenforaminute.Wewouldliketoremindeveryonetopleasetypeinyourquestionsorcommentsinthechatpaneofyourcontrolpanel.We'vegotalotcominginhere.That'sgreat.

Page 41: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Results[46:59]

Alright.Letusgoaheadandsharetheresults.

[EricJust]Thisisgreat.Okay.Patientoutcomesdata,that'sgreat.Thatissomethingthatweareseeingalargetrendintheindustryaswell.ClinicalEMR,claimsofcourseareverypopular.Butpatientoutcomesdataisdefinitelyahottopic,especiallypatient-reportedoutcomes,howdowebettermeasurethoseoutcomes.Andofcourse,that'swhatwe'retryingtopredict–isoutcomes,inmostcases.So,thankyoufortakingthetimetorespondtothepolls.Theyareveryinsightful.

Page 42: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

ThreeKeyRecommendationsforScalingPredictiveAnalytics[47:34]

Sojusttoreiterateourthreerecommendations,fullyleverageyouranalyticsenvironment,dothedatamanipulationandthedatawarehouse.Itiseasiertore-useandeasiertooperationalize.Standardizeusingproductionqualitycode.So,havingyourgroupusingthesamerepositoryincreasestheeconomyofscaleanditallowsyoutodeploy,theabilitytodopredictiveanalyticstomorepeople.Andthenfinally,deployingwithastrategyforintervention.Alwaysthinkabouthowthebaitisgoingtobeusedtomakedecisions.

Page 43: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

WhattheFutureHolds[48:09]

AndbeforeIcutovertoLevi,Ijustwanttotalkbrieflyaboutwhatthefutureholdshere.

Page 44: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

ClosedLoopArchitecture™[48:12]

Whatwesee,ofcourse,isthattheclinicalworkflowengineisstilltheEHR.Cliniciansspendmostoftheirtimeintheelectronichealthrecordandthatiswheretheinsightsaregoingtobedeliveredtothemthatinfluencetheircaredecisions.Andintoday'sworld,wehearalotaboutSmartonFHIR.ThisisatechnologythatallowsthatEHRworkflowtobeaugmentedthroughwebapplicationsandFHIRisaninterfacethatisdesignedtositovertoEHRandmakeiteasytopulllivedatafromtheEHRanddevelopwebandmobileapplicationsthat,again,augmenttheworkflowoftheEHR.

Whereweseetheanalyticsenvironmenttakingplaceisreallyprovidingalotofpowertothisideaofputtingnewapplicationsintheclinicalworkflow.Andthedatawarehousereallybecomestheanalyticsenginethatisdrivingalotofthedatathatshowsupinthatworkflow.So,thedatawarehousehasahostofdifferentdatasourcesthataredrivingmodels.LikeIsaid,Ireferreditfromthebeginningofthefeature,achockfulloffeatures.We'vegotregistrydefinitions,we'vegottextprocessingNLPalgorithms,we'vegotallofthesepredictivealgorithmsthatwe'regeneratingalmostlikeanalgorithmlibraryandthenwewanttoexposethatthroughanAPIsuchthattherealtimedatafromtheEHRthroughtheFHIRinterfacecanbecombinedwithallofthatanalyticdataandthendeliveritbacktothewebandmobileapplicationsandreally,notonlyaugmenttheirworkflow,butaugmentthedatathattheyareseeingandmakeitbasedonalargerrepositoryofhighlyvaluableanalyticdata.

SoIamgoingtocutovertoLevinow.Leviisgoingtosharewithyousomeexcitingnewsaboutoursoftwarethatwehavedeveloped.WehavedecidedtoOpenSourceoursoftwareandLeviwillwalkyouthroughhowyoucangetthroughourcoderepositoryanddownloadthesoftwareanduseitforyourteamoryourself.

Page 45: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

AboutHCTools

[LevThatcher,DirectorofDataScience]Hieveryone.Greatjob,Eric.So,Erichasgivenafantasticoverviewofbestpracticesinpredictiveanalytics.Soyouaregoingtoaskyourself,well,howcanwetakeitfromhere?Howcanweactuallydowhatyouguysdescribedinourorganization?Andthatiswhywearesoexcitedtoopensourcethis,ourpackagesandworkingonit.Soit'scalledHCTools,aglobalproject.Youcanseehere,wearehctools.org.Soifyouwanttogetstartedtoday,simplytypethatinyourbrowserandthatis,thatthisenablesyoutocreatemodelsonyourdatawithverysimpleexamples.

Page 46: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

WelcometoHCRTools[50:47]

SoifyouclickintoadocumentationforHCRTool,itverybasicallydescribeswhyitissogreatforhealthcare,howtoinstallthepackage,howtogetstartedwithexamples.Andso,let'stakethosescenariosthatyoumightbeinterestedin.Sosayyouhaveagreatdatatoputtogether,saydiabeticdata,andyouarewantingtopredict,say,readmissions.Soyoucanaskyourself,wellhowdoesthetoolhelpmedothat?

Page 47: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

[51:13]

Soifwehopovertoourstudio,ifyouhaveinstalledthepackage,whatyoudoisyoutype,okay,welllibraryisyourtool.SoinR,youoftenloadpackagesandthenitbringcertaintypesoffunctionalityandyousimplytype?HCRToolsandthatwouldbringuptheexamplesassociatedwithourpackage.So,nicebuilt-indocumentation,youcangetit(51:35)tocreateamodel.Soonceyouhavethisdataset,youclickonthis,sayLassoDevelopmentorRandomForestDevelopmentlinkandthatwillgiveyouboththedescriptionsofthearguments,thefunction,aswellastheexamplecode.

Page 48: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

Comparepredictivemodels,createdonyourdata[51:50]

Sowhatyoucandoisscrolldownandyoucanfindthebuilt-indataset.Solet'ssay,okay,wellyouhaveyourdataset,areyoureadytogo?Andsowhatyoudoisyoujustbasicallygrabthisexamplecodeandopenupanewscript,copyitin,andhitrun.

Page 49: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

[52:09]

Andwhatitbasicallywilldo,itwilltellyou,fromthisparticulardataset,howwellyourmodeldid.SoyouhaveaLassomodel,whichErichadmentioned,andyouseetheAUCis0.86.Sothat'sthemeasureoftheaccuracy,sayifwe'repredicting30-dayreadmitlog.AndthenyouhaveaRandomForestexampleaswellthathadAUCof0.82andsoyouareabletoquicklysee,okay,wellsothisisyourdataset.Wehavethisparticularmodelthatdidreallywell.Andso,Ithat'sRandomForestmodel.Yousimplyusethedocumentationtogoanddeploythatmodelandyoucanseethelinksthere.

Page 50: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

[52:46]

Soasyouhaveyourdataputtogether,thisisthewebsite.Reachout.Letusknowwhatyouareworkingonandhowwecanimprovethesetoolsbecausewereallywanttobuildaplacewherewecanallcollaborateandbuildsomethingthathelpseveryoneinhealthcare.

[EricJust]Thanks,Levi.Soagain,theURLtogotoifyouwanttodownloadthesoftwareishctools.org.WewillberenamingandredirectingtheURLsatsomepointtoamoremarketing-friendlyname.Ourmarketingteamhasinformedusthatthisisthekindofnamethatyougetwhenyougetabunchofdatascientistsintoaroomtonameaproduct.Sowewillberenamingiteventuallybuthctools.org.Pleasegothereandgiveusyourfeedbackonthesoftware.Ifyouwanttodownloaditandplaywithit,makeitpartofsomethingthatyouuseinyourorganization.Wearehappytoanswerquestionsandprovidethattoolforyou.

[TylerMorgan]Alright.That'sgreat.Thanks,Eric.WeareaboutreadyforourQ&Atime.We'vegotsomegoodquestionsin.

Page 51: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

HowinterestedareyouinsomeonefromHealthCatalyst®contactingyouaboutademonstrationofoursolutions?[53:54]

Butwhilewehavethesequestionsin,wedohaveafinalpollquestionforyou.Whilethesewebinarsareintendedtobeeducation,wehavehadmanyrequestsformoreinformationaboutHealthCatalyst,whoweare,whatwedo.IfyouareinterestedinhavingsomeonefromtheCatalystreachouttoyou,toscheduleademonstrationofanyofoursolutions,pleaseanswerthispollquestionnow.

QUESTIONSANDANSWERS

QUESTIONS ANSWERSWhyarepeoplenotyetusingpredictiveanalyticsinhealthcare?Whydoeshealthcareseemtobebehindotherindustries,whetheritmaybecontractriskorotherreasons?

Ithinkalotofitcomesdowntotherisk(54:33).ItisriskiertostartandIthinkthatweareveryrisk-averseofcourseforagoodreason,arisk-averseindustry.AndIwillgetthepredictiveanalyticsthataredeliveredtomeinNetflix.Theyarethelonganalytics.Idonotcareaboutthosecartoonsthatarebeingsuggestedtome.Andallofthathastodowithassumptionthatwearemakingonthedata,andIthinkhealthcareofcoursestillhasalotofissuestoworkoutwithtrustingthedata

Page 52: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

andtheunderlyingqualityofthedata.Soorganizations,Ithink,whoarewarytousepredictiveanalyticsmayalsobewaryoftheirunderlyingdataquality,datagovernanceofthetopicthatwehearaboutalotlately,andhelpingorganizationstoactuallyimprovethequalityoftheirdataaspartofthedatastrategyisgoingtobeveryimportantfortheincreasedadoptionofthistechnologyandthesealgorithms.

WhatarethesystemrequirementstousetheHCTools,theHCRToolsetthatyou'vegot?

IfyouhaveRinstalledonyourmachine,youcansimplyvisithctools.organdfollowthequickinstallguide.Justafewsimplecommandsandthatshouldbeupandrunning.

CanyouintegrateourmodelsdevelopedoutsideofyourorganizationoroutsideofHealthCatalyst®?

Absolutely.Thecodebaseisdesignedtoallowyoutodevelopyourownmodels.So,aslongasyougetyourfeatureinputsinsuchawaythatthesoftwarecanaddressthat,thesoftwarewillhelpyoudevelopyourownmodelsandeithermakethoseavailabletosomebodypubliclyorjustleavetheminyourownenvironment,thetooldefinitelysupportsrunningandcreatingmodelswithsomeHealthCatalyst,aswellascreatingyourown.

IsthereanAPIthatyouhavetodeploy? No.NoAPIisnecessary.Sobasicallyyouinstallourpackageandyoucanusethebuilt-indocumentationorthedocumentationonthewordpagesinhereandhavesomebuilt-indataactuallythatyoucandotostartwith.Sotheexamplesrunthatoutoftheboxandthenyoucanusethoseexamplestotailorthemtoyourspecificdata,onyourtableandyourdatabases,etc.

Canyoupleasesharesomeofyourexperienceintermsofdemonstratingthevalueofpredictiveanalytics.

Absolutely.So,wheneverwedevelopmodelsanddeploythemwithacustomer,oneofthethingsthatwetrackisoutcomes.Andineachofourapplications,wehavetoolstotrackthevariationinthoseoutcomesandwelookfor,that'sanotherarea,whereadatascientistandanalystareveryhelpful,ishelpingtounderstandthattrendandisthattrendactuallygoingdownsincetheimplementationofthatpredictivemodel.Sowedohavewaystomeasurethat.Theotherthingthatwedoiswemakesurethataswearedeployingthemthatthereisanorganizationalunderstandingofhowtousethem.Thatisactuallyverycriticalincreatingthatvalue.

IfI'mnotusingtheHealthCatalystdatawarehouseandBIplatform,canIstillutilizethesemodels?

Yeah.Forsure.Soitisveryflexible.Soifyouhave,say,(57:59)filesoradatabase,youcanconnecttoanyofthose.Butwewanttobeclear,youcanusethesoftware.Themodel-specificreadmissionmodelsdonotcomewiththesoftware.Thesoftwareisjustforrunningandcreatingthosealgorithms.

Page 53: Deploying Predictive Analytics: A Practitioner's Guide ... · Deploying Predictive Analytics: A Practitioner's Guide October 13, 2016 ... seeing here is a picture of my watch yesterday,

[TylerMorgan]Alright.Well,thankyousomuch,Eric.Thanks,Levi.Wewouldliketoleteveryoneknow,shortlyafterthiswebinar,youwillreceiveanemailwithlinkstotherecordingofthewebinar,thepresentationslides.Wehavethelinktohctools.organdalso(58:28)download.Also,pleaselookforwardtothetranscriptnotificationthatwewillsendyouonceitisready,andalso,withspecialinvitationstotheupcomingwebinarsinpredictiveanalyticswebinarseries.

OnbehalfofEricJust,LeviThatcher,aswellastherestofushereatHealthCatalyst,thankyouforjoiningustoday.Thiswebinarisnowconcluded.

[ENDOFTRANSCRIPT]