Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
DeployingPredictiveAnalytics:APractitioner'sGuideOctober13,2016
EricJust,SeniorVicePresidentLeviThatcher,DirectorofDataScience
DeployingPredictiveAnalytics:APractitioner'sGuideOctober13,2016EricJust,SeniorVicePresidentandLeviThatcher,DirectorofDataScience[00:20]
[EricJust]Beforewegetstarted,Iwantedtobeginjustwithoneofmyfavoriteexamplespracticalanalyticsandhowitinfluencesoureverydaylife.IspentalotoftimetravelingformyworkandI'vereallylearnedtoappreciatethesimplemeaningfulactionablepredictiveanalyticsthatUberprovides.Ifyoudon’tknowwhatUberis,it'saride-sharingsystem.
Uber[00:31]
Andit'skindoflikebeingabletocallataxifromyourphone.SoitismorethanthatbecausewhenyoumakearequestfromUber,theyknowyourGPSlocationofyourdevice.TheyalsoknowtheGPSlocationofdriverswhoarearoundyou.AndUberusesyourGPSlocationandtheGPSlocationofthedriversaroundyoutoestimatehowlongtheythinkitwilltaketoget[inaudible][00:55].IthinkitisnowavailableasanApplewatchapplication.Sowhatyouareseeinghereisapictureofmywatchyesterday,andatthattime,Ubersaiditwouldtake5minutestogetthedrivertome,anditwasallbasedonpredictiveanalyticsofmylocation,thelocationofthedriversaroundme,theroutesbetweenmeandthosedrivers,thetraveltime,thetimeofday,andperhapspasttraveltimesatthoseroutes.Andallofthatinformationiskindofputtogetherinthissimplenumber5minutesandthisisgreatbecauseitissuperactionable.IfIamhappywiththatnumber,Igoaheadandhitthatrequestandadrivershowsupinabout5minutes.IfI'mnothappywiththatnumber,maybeit'stoobig,maybeI'lljustcancelmyrequestandcallacab,andI'vedonethatbefore,andit'sjustagreatexampleofhowUberistakingallofthisdataaboutmeandthedrivers,compilingittoaverysimpleactionablenumberthat'sdeliveredrighttomywatch.
AmazonandNetflix[01:55]
Andit'snotjustUber.Weliveinaworldwherepredictiveanalyticsispervasive.SowhenyoulogintoAmazonorNetflix,thisiswhatAmazonandNetflixthinkthatIwanttowatchbasedontheviewingpatternsoftheaccountthatIuse.Andifinteresting,Ithinkitisprettyobviousprobablytomanypeoplethatit'snotnecessarilymewhoiswatchingthesevideos.WhatwasgoingonhereisAmazonistakingthebuyingandviewingpatternsofmeandcomparingthemwithuserswhohavesimilarviewingpatternsandmakingsuggestionsbasedonthosepatterns.AndwhathappenshereismykidsloginontheweekendsandtheywatchallsortsofcartoonsandAmazonandNetflixboththinkIhaveastronginterestinwatchinga(02:40)groupofcartoonpuppiessolveproblems,that'swhatPAWPatrolis,butthetruthisI'mnotreallyinterestedinthesepredictiveanalytics.
AndaninterestingthingtothinkaboutisoneoftheassumptionsthatwemakeabouttheunderlyingdataasweusepredictiveanalyticsandIthinkwe'llponderonsomeofthesequestionsaboutwhatitmeansforhealthcarealittlebitlaterinourpresentation.
PollQuestion#1Howimportantarepredictiveanalyticsforthefutureofhealthcare?[03:04]
Wearegoingtohaveanotherquickpollquestion.So,howimportantarepredictiveanalyticsforthefutureofhealthcare?Notatallimportant,lowimportance,neutral,moderatelyimportant,extremelyimportant,orunsureornotapplicable.
[TylerMorgan]Okay.Eric,whilewe'rehavingeveryonerespondtothose,Iwouldliketoapologizetoeverybody.Wehaveacoupleofaudioissues.Ithinkwe'vegotthissortedout.Itlookslikewe(03:28)softwareforus.Weappreciateyourpatiencewithus.
Alright.Let'sgoaheadandsharetheresultsofourpoll.We'reshowing75percent,extremelyimportant.Eric,thatiswhytheyjoinedthewebinartoday.
[EricJust]ItsoundslikealittlebitofaselectionbiasbutIthinkitisimportantandwewilltalkabouthowwecanlowerthebarriertodoingpredictiveanalyticsinhealthcare.
[TylerMorgan]Alright.Justgettingbacktoshowthatscreen.
Predictiveanalyticsisaboutusingpatternrecognitiontopredictfutureeventsbut…[04:05]
[EricJust]Alright.Aswe'removingtohealthcare,sofirstofall,let'sjust–highlevel,predictiveanalyticsisaboutusingpatternrecognition.JustlikewetalkedaboutwiththeAmazonandNetflixexampleswithpatternsinthatdatathattheyareusingtopredictfutureevents.Wecanapplythattohealthcarebutit'sreallyimportanttounderstandthatpredictingsomethingisnotgoodenough.Youmusthavethedatatoactandintervene,andespeciallyinhealthcare,theorganizationalwherewithaltointervene.It'sonethingtopredictwhatvideosImightwanttoviewnextorwhatthingsImightwanttobuynext.It'sadifferentthingtostartrecommendingcarebasedonpredictiveanalytics.Soinhealthcare,thestakesarehigherbuttherewardsarepotentiallymuchgreater,andit'simportantforanorganizationaltobuyintothatriskbalanceandalsotoensurethattheanalyticsareincorporatedinappropriatewayintotheverycomplexoperationsofthehealthcareorganization.Soit'sdefinitelyadifferentgame.
WhatisMachineLearning?[05:07]
Atthispoint,Iwantedtotalk–justI'lllaythedefinitionsandwehaveafewdefinitionslideshereinthepresentation.AndIjustwantedtokindofclarifytheseaswemovealongbecauseIwillbeusingsomeJargoninthepresentationandIthinkitisgoodtogeteverybodyonthesamepageaboutwhatthatmeans.
So,weoftenhearmachinelearningandpredictiveanalyticsinthesamebreathandsometimesevenmentionedsynonymously.Somachinelearningexploresthestudyandconstructionofalgorithmsthatcanlearnfromandmakepredictionsondata.Thenwithinthefieldsofanalytics,machinelearningisamethodusedtodevisemodelsthatlendthemselvestoprediction.Thisispredictiveanalytics.SothewaythatIliketothinkaboutitisthatmachinelearningisatechniquethat'susingpredictiveanalytics.Thereareotherwaystodopredictiveanalyticsbutmachinelearningisbyfarthemostpervasive,popular,andgrowingmethodrightnowforpredictiveanalytics.Thatiswhyyouoftenhearthemmentionedinthesamebreath.
PredictiveAnalyticsinHealthcare:"Classic"Approaches[06:11]
Predictiveanalyticsisnotcompletelynewtohealthcare.Sowhengoingallthewaybackto1987,theCharlsonIndexisactuallyapredictivealgorithm.Itisdesignedtopredictthemortalityofapatientwithmultiplecomorbidities.AndtheCharlsonIndexwasdonebyagroupthattookdatafromtheriskpatients,classifiedtheirconditionsinthecomorbidconditions,andtheyuseafairlynarrowsetofdata,fairlyeasy-to-getsetofadministrativedata,thentheycalculatethosecomorbidconditionsandrankthembasedontheirseverity,andtheycombinethatcombinedcomorbidityscorewithotherinformationaboutthepatientssuchastheiragetodeveloparelativeriskofthatpatientwhoisgoingtodieinthenext10years.Soitisapredictorofmortalityandithasactuallygainedwidespreadpopularity.WehearalotaboutCharlsonIndexstilltoday.
TheLACEIndexisanotherexampleofpredictiveanalyticsandLACEismeanttopredictreadmissionsandtheLACEgrouptookdatafromallacrossthecountryanddevelopamodelthatpredictsreadmissionsbasedonlengthofstay,acuity,comorbidities,andERutilization.That'swhatLACEstandsfor.Andthegroupiswetookdatafromalargenumberofdifferentorganizationscontributingtheirdataandtheydevelopedthismodelthatusesthoseinputstodeterminethepatient'sriskofreadmissionanditalsohasgainedwidespreadpopularityandwehearalotoforganizationsthatareimplementingLACE.
WhatHasHappenedSince2010?[07:50]
Theinterestingthingaboutthesemodelsisthatwhiletheyareverygoodbecausetheyalloworganizationswithoutadeepmachinelearningcapabilitytodopredictiveanalyticsandtodoitonasmalloreasier-to-getsetofdata,thereareproblemswiththesemodels,andweareshowingheretwoissuesthathavecomeoutwiththismodel,andthesearejusttwoofmany.So,whatyouseeonthetopheadlineisthetopcitationofthatpatient.TheywereusingLACEtopredictreadmissionsforCHFpatients,CongestiveHeartFailurepatients.Andthenextone,theyweretryingtouseLACEtopredictreadmissionsforolderUKpopulations.Andwhattheyfoundwithbothoftheconclusions,itsaidthatLACEwasnotagoodpredictorforbothofthesespecificpopulations,andpartofthereasonforthatisthatwhentheLACEmodelwascreated,theywereusingdatafromalldifferentkindsofpatientsfromallacrossthecountry.Andweknow,forexample,thatthefactorsthatdriveanappendectomyreadmissionarequitedifferentthanthefactorsthatdriveacongestiveheartfailurereadmission.Inthelatemodel,allofthosearemixedtogetherandassoonasyoustartlookingspecificallyinusingLACEtotrytopredictthespecificpopulation,youloseyourpredictivevalue.Sothesegeneralmodels,whiletheyarehelpingtogetpeoplestarted,theyknewtheirpredictivevalueaswestarttolookmoreandmoreintospecificpopulations.Andanybodywhoisworkinginhealthcaretodayknowsthatwe'redoingalotofthat.Wearelookingintohowdowecareforthesespecificpopulations.Sothosemodelsdon'tholdupsowellforthatusecase.
WhatHasHappenedSince2010?[09:21]
Sowhathashappenedsincethesemodelscameoutin2010?First,wetalkedaboutonthelastslidethelimitationsonthosemodels.Whiletheyaregoodtogetstarted,theylackintheirabilitytopredictspecificpopulations.Next,dataavailabilityhasgrownalotsince2010.Wehavebeenluckyenoughtobeapartoforganizationsthatareinvestingindatawarehouses.Afterthebiginvestmentinelectronichealthrecords,alotmoredatabecameavailableandthepremiseoftheindexmodelsisthattheyareusinganarrowsetofdatabutnoworganizationsjusthaveaccesstomuchdeeperrepositoriesofdata.Again,wehavebeenluckyenoughtobeapartofthatandseethatplayout.
Thereisalsomoreadvancedanalyticscapability.So,thebasicunderstandingonhowtousedatatoimprovebusinessprocessandtoimprovecarehastakenalargerpartofournationalfocusaswell.Organizationsarerepeatedlyusingdatatoimprovehealthcareandtheyarestartingtoaskthatnextlevelofquestion.Sowetypicallystartwithretrospectiveanalytics.Sowherehaveyougonewronginthepastandhowcanwefixthatmovingforward.Now,organizationsareaskingmorematuredquestionsabouthowdowegetaheadofmyproblemsandpredictiveanalyticsareofcourseabigpartofthat.
Andfinally,wehavemuchbettermachinelearningtoolssincethen.Soeveninarelativelyshortamountoftime,there'sbeenahugeexplosionofopensourcetools,ofonlineeducationthathelpedtospreadthismachinelearningandhowtodomachinelearning.Andso,thosebettertoolsarealsoapartofthisincreasedinterestinmachinelearning-basedpredictiveanalytics.
PollQuestion#2Whatisthebiggestbarriertoimplementingpredictiveanalytics?[11:14]
So,wearegoingtoaskanotherpollquestion.Whatisthebiggestbarriertoimplementingpredictiveanalytics?Wearelackingtherightpeopleorskills,wedonothavetherightdataortechnicaltoolsandinfrastructure,wedonothavetheexecutivesupportorbudget,pasteffortshavefailedtoshowresults,other,orunsureornotapplicable.
[TylerMorgan]Alright.Wewillgivesometimeforfolkstorespondtothepoll.Andwewouldliketoremindeveryone,wehavehadafewquestionsabouttheslides.Iwouldliketoleteveryoneknowwewillbesendingoutanemailaftertheeventwithlinkstotherecordedon-demandwebinar,aswellastheslidesaswell.
Solet'sgoaheadandlookatourresults.
Results[12:01]
Okay.Itlookslikeorganizations,thetoptworesponsesarepeopleorskillsandtherightdataortechnicaltools.Andhopefullythosepartsarewelladdressedwithintherestofthepresentationandexecutivesupportorbudgetisalsoaverygoodfactorandwewillhopefullyhavesomeinformationthatcanhelpconvincetheexecutivesthatthisisagoodthingtodoaswell.
Predictiveanalyticsiseasy(oratleasteasier!)[12:27]
Sothemainmessageofallofthispresentationisthatpredictiveanalyticsiseasy.It'satleasteasierandpartofthatisduetotheexplosionoftools.Butwhatorganizationsaretrulystrugglingwithismakingpredictiveanalyticsroutine,pervasive,andactionable.Andthatiswhatwewanttotalkabouttoday,ishowdowetakepredictiveanalyticsandmakeitsomethingthatiseasiertodoandroutineforanorganization.
Typical'CurrentState'forPredictiveAnalytics[12:57]
Thetypical'currentstate'ofpredictiveanalyticsisstillnotnecessarilyoptimizedforoperationalization.Whathappensisyou'vegotdatascientistsandtheymayhaveaccess,theymayhaveread-accesstoadatarepository.Andthefirstthingthattheydowhentheyhaveapredictivemodelthattheywanttodevelopistheywriteareallybigqueryagainstthatdatasourcebecausetheyneedtogetallofthedatapointstheythinktheyaregoingtoneedtomakeapredictionandtheyknowtheyaregoingtohavetomanipulatethatdata.SotheywritethisreallybigSQLqueryandthentheybringitintotheirtoolofchoice.ItcouldbeExcel,itcouldbeSAS,itcouldbeR,buttheideaofit,theygetallthatdataintotheirtoolbecausethat'swheretheyfeelcomfortablemanipulatingdata.Andthentheydothatdatamanipulation.Theygetthatdatainastatethatisreadytobeusedonapredictivemodelandagain,theyareusingatooloutsideoftheiranalyticsenvironmenttodothis.Thentheyapplythetoolsandalgorithms.So,example,SAS,we'vegotRandPython.Allofthesetoolsaretoolsthatareavailabletotakethatdataandturnitintoapredictivemodel.
Andthenoncetheyhavedevelopedapredictivemodel,there'sabigquestionmark,twoquestionsusually.Numberoneishowdowemovethisinaproduction;andthennumbertwoishowdoweactuallygetittoimprovecareorhowdowegetittoactuallyenhancethedecision.Sothatisoftentimesabigquestionandwehaveseenthatabunchoftimeswhereagoodpredictivemodelisdevelopedbutisneverreallydeployed.
ThreeKeyRecommendationsforScalingPredictiveAnalytics[14:28]
Andthepointtoday,wewouldliketojusttalkaboutthreerecommendationsandthewaywewillbestructuringtherestofthepresentationisaboutthesethreerecommendations.Numberoneisfullyleverageyouranalyticsenvironmentandwewilltalkaboutwhatthatmeans.Butinanutshell,donotdoalotofdatamanipulationoutsideofyouranalyticsenvironmentbecausethenitbecomesasiloanditisverydifficulttore-use.Standardizetoolsandmessageandcreateproductionqualitycodethatyoufeelcomfortableputtingintoproduction.Ifyoudevelopagoodmodel,thelogicalnextstepisgoingtobetoputinproduction.Sohavingreallygoodcodetodothatisveryimportantandalsotohavestandardmethodswithyourteam.
Andthisoneislastbutitshouldreallybefirstbecauseitisthemostimportantofallofthesepoints–istodeployyourmodelswiththestrategyforintervention,makesureyouknowwhoisgoingtousethedatatochangewhattheyaredoingortohelpmakeadecisionandhowthatisgoingtobepresentedwiththem.Thatisthemostimportantpointofallthisandwewilltalkaboutwhatthatlookslikealittlebitlaterinthepresentation.
FullyLeverageYourAnalyticsEnvironment[15:38]
Solet'stalkaboutfullyleveragingyouranalyticsenvironment.
WhatisaFeature?[15:42]
Hereisanotherpieceofjargon.Inpredictiveanalytics,afeatureissimplyaninputparameter.Justthinkofitasaninputtooneofyourdatamodels,andinmachinelearning,wecallitafeature.Sowhenyouhearmeusetheword'feature',justthinkoftheinputstothemodelthatI'mtryingtogenerateapredictionfrom.AndthisdefinitionisfromWikipedia.
LeverageYourAnalyticsEnvironment[16:09]
Andlet'sthinkaboutwhatananalyticsenvironmentis.Ananalyticsenvironmentoradatawarehouse,youcanthinkofasalmostlikeachock-fulloffeatures.You'vegotabunchofdatatherebutitisnotalwaysjustsittingthereinrawformat.You'vegotthingslikeclinicalregistries,youhavecomorbiditymodels,youhavecalculationsonreadmissions,lengthofstay,andothercalculatedfields.Soallofthesemakegreatfeatureinputstomodels.Butitisreallyimportanttounderstandthatread-onlyaccessisnotenough.Datascientistsandthefolkswhoaregeneratingpredictivemodelsneedtobeabletocreatetheirownfeaturesintheanalyticsenvironment.Wewillmakeastrongcaseforthathere.
PolypharmacyFeature[16:51]
Toillustratethepoint,wearegoingtoexploreapolypharmacyfeatureandthisisafeaturethatwedevelopedasaninputtooneofourmodels.Oneofourdatascientistswasdevelopingamodelforpredictingcomplicationsindiabeticpatients.Andifyoudonotknowwhatpolypharmacyis,theNewYorkTimeshererepresentsitasTheEver-MountingPileofPills'.Quitesimplyputtoanumberofmedicationsthatapatientisonatanygivenpointintimeandthere'sgoodexamplesintheliteraturewithpolypharmacybeingagoodpredictorofspecificoutcomes.Sothisdatascientistwantedtouseapolypharmacyfeatureinhismodel.
PolypharmacyDataMart[17:34]
Andwhenhelookedatthemedicationdatathough,itwasalittlebitmessy,andI'msuregiventhenumberofdataarchitectsandanalystswehaveonthecall,thisshouldnotcomeasasurprisethatthereismessydataunderneaththehoodinthedatawarehouseenvironment.Whatyouseeontheleft-handsideisatableofmedications.Andforeverypatientmedicationpair,there'sastartdateandanenddate,thedatethatapatientstartedaspecificmedicationandthedatetheyendedthat.
Youcanseeonthefarright-handsidethatthere'sseveralNULL.Sotheenddateisnotknownincasesanditisactuallymissingdatathatcanactuallybeverydamagingtoapredictivemodel.So,howdowecleanthatup?We'vegottounderstandwhat'sgoingontocreatethoseNULLvalues.Insomecases,thepatientdiesbeforetheenddateandinothercasesthepatienttookaone-timedosewheretheenddatewasnotputinbecausetherewasasingledoseofthemedication.Andfinally,thereisyetanothercasewherethepatientjusthasnotreachedtheenddateyet.Theyarestillonthemedication.
So,understandingallofthosebusinessruleshelpedourdatascientiststofillinappropriatemissingenddatesandcreatewhatyouseeontherighthandsidehere,whichiswhataninputtoapredictivemodellookslike.It'sverycleananditgivesusthatpolypharmacycount.Soforeverypatientencounterforanypointintime,wecaneasilytellhowmanymedicationsapatientwason.Andthisisanexampleofwhatwecallfeatureengineering.
WhatisFeatureEngineering?[19:11]
"Featureengineeringistheprocessoftransformingrawdataintofeaturesthatbetterrepresenttheunderlyingproblemtothepredictivemodels,resultinginimprovedmodelaccuracy."Andfeatureengineering,inouropinion,isoneofthemostchallengingandinterestingpartsofdevelopingpredictivemodels.Itisalsorecognizedbyfolksoutthereontheinternetthat"muchthesuccessofmachinelearningisactuallysuccessinengineeringfeaturesthatalearnercanunderstand."So,featureengineeringisanabsolutelycriticalparttodatascienceandpredictiveanalytics.Wecannotunderscorethatpointenough.
OtherExamplesofFeatureEngineering[19:53]
Otherexamplesoffeatureengineering,andI'msurethedataarchitectsanddataanalyticsonthephonewillrecognizehowsomeofthesethingssoundreallysimplebuttheyareactuallyalittlebitmorecomplicatedtoputtogetherthanyoumightthink.SothenumberofERvisitsinthelastyear.Fairlysimple.Thenumberoflinedaysthatapatientison.Sometimestheunderlyingdatapreventsachallengeincalculatingthat.Thenumberandtypesofcomorbidcondition,howdoyouclassifythosecomorbidconditions.Almostanyinputtoapredictivemodelwillneedtobeengineeredinsomeway.Andtheabilityfordatascientiststoengineerfeaturesiscriticaltothesuccessofpredictiveanalyticsandthemachinelearningstrategy.
FullyLeverageYourAnalyticsEnvironment[20:36]
Andremember,thepointofthissectionistofullyleverageyouranalyticsenvironmentandoneofthemainreasonswhywesaythatisbecausetheanalyticsenvironmentisthebestplacetoengineerfeatures.Thedatascientisthastobeabletopromoteefficientre-useoftheengineeredfeaturesisonegreatexample.Soifwegobacktothatpolypharmacyexample,thatpolypharmacytableisnowsittinginthedatawarehouseandavailableforothermodelstouse.Sobyusingtheanalyticsenvironmenttodoourfeatureengineeringandnotdoingitinasiloedtool,wearepromotingre-useofallthatgreatwork.
Secondly,thedatawarehousehasstandardtoolstooperationalizeandruntheseonanightlybasis.WecallETLorExtract,TransformandLoadandthosetoolsareveryvaluableinproductionalizingthat.Soitbecomesmucheasiertoproductionalizetheminascriptinoneofthemachinelearninglanguages.
StandardToolsandMethodsUsingProduction-QualityCode[21:36]
ThreeKeyRecommendationsforScalingPredictiveAnalytics[21:37]
So,goingbacktoourthreekeyrecommendations,rememberthefirstwastofullyleveragetheanalyticsenvironmentandthenextistostandardizetoolsandmethodsusingproductionqualitycode.
YouNeedLotsofSmartPeople![21:51]
Asyoustarttoputforthadatasciencemachinelearningpredictiveanalyticsstrategy,youneedlotsofsmartpeopletodothis.Thisshouldnotbeasurprise.AndthetworolesthatIwanttotalktodayaretheyaresimilarbutdifferentroles.So,thedatascientistformulateshypothesesaboutfeaturesdrivingapredictivemodel.Thedatascientistisonewhoistalkingtocliniciansandtryingtounderstandtheunderlyingcausesofwhatistryingtobepredicted.Thedatascientistisdoingwhatwecallexperimentsandtryingvariousmodelstodeterminethebestapproachforprediction.Andthedatascientistisassessingthemodeloutputandlookingattheaccuracyandtryingtodecideonwhatthebestapproachis.
Themachinelearningengineer,likeIsaid,issimilarbutdifferent.Sothemachinelearningengineerhastohavealotofknowledgeofdatasciencebutoneofthechallengingthingsistofindsomebodywhohasaknowledgeofdatascienceandaknowledgeofsoftwareengineeringbestpractices–becauserememberwetalkedtoyouaboutgeneratingproductionqualitycode.AndoneofthebiggestimpactswehavehadonourgroupwaswhenwehiredLeviwhohasgotagreatmachinelearningengineerapproach.Heunderstandthedatascienceandhealsohasaknowledgeofsoftwareengineeringbestpracticesandthathasreallyhelpedustoscaleandwewilltalkaboutwhatwemeanbyscale.Butamachinelearningengineerisawonderfulthingtohaveandwewilltalkaboutthefruitsofourmachinelearningengineeringeffortsalittlebitlater.
PredictiveAnalyticsProcesses[23:25]
Andnowwearegoingtotalkaboutwhatkindofcodeyouneed,Ithinkitisgoodtotalkalittlebitaboutthepredictiveanalyticsprocessesandwhatisitthatadatascientistisdoingthatwewanttotrytooperationalize.Andthere'stwopiecesofthis.Oneisadevelopmentprocess.Andlet'sviewtheexampleofareadmissionprediction.Letussaywearetryingtodevelopamodeltopredictreadmissions.Thedatascientistisgoingtofirstofallidentifywhichpatientswerereadmittedandthenwhichpatientswerenotbutitisimportanttounderstandwhattheoutcomeswere.Andthentheyaregoingtogathera30to40-featureinputs,andthisiswherehypothesisgenerationtakesover.Theyarehypothesizingwhatarethe30or40mostlikelythingstodrivereadmissions.Andthatdataset,that30to40inputfeaturesandtheoutcomeisthensplitintotwopieces.One,wecallthetrainingsetandonewecallthetestset.Andthetrainingsetiswhatwecrunchallthenumbersonandthatiswherethemodelisgeneratedfrom.Thetestsetiswhatweusetomeasuretheperformanceofthatmodel.Soitisimportanttoholdbacksomedatasothatwecanseehowwellourpredictionwouldhavedoneonpredictingtheitemsinthetestset.Andso,thedatascientistisrunningmultiplealgorithmsonthattrainingset.Theyarelookingatlotsofdifferentcombinationsoffeaturesandlotsofdifferentalgorithms.Andforeachoneofthose,theyaremeasuringtheperformanceanddecidingwhatthebestmodelis.AnditisaniterativeprocesssoI'vedrawnthisarrowgoingbacktothebeginning.Sometimesyouneedtogobacktosquareone.Buteventuallyyougettowhatyouseeanorangebox,whereyou'vegotthebestalgorithmandasmallerlistofimportantfeatures,usuallyaround10orso.Onceyou'vedevelopedyourmodel,youcanthenstorethoseparametersforlateruse.Again,thisdevelopmentprocessiswherethereally
intensecomputation.Wearelookingatmillionsofrecordsandcrunchingnumbersandlookingforpatternsandthenextractingthepatterns.
Butoncewegettoamodel,thenextstepistorunthemodel.Andrunningthemodeliswhatoccurseverydaymultipletimesaday,muchlesscomputationallyintensive,usingtheoutputofthedevelopmentprocess.Now,ifwearegoingtodoareadmissionprediction,wedonotneedtocrunchnumbersonmillionsofpatientsordata.Wehavedonethatinthedevelopmentprocess.Now,it'samatteroflookingatwhoarethepatientswhojustcamein,let'sgetthose10importantfeaturesonthoseandwriteitonerecordatatime,calculatethatpredictionandoutputittothedatawarehouse.So,runningthemodel,muchlesscomputationallyintensive,butthisistheparttheygetputinproductionandisruneveryday,eitheraspartofanETLprocessorpartofawebservice.Wewilltalkaboutthedifferentwaysthatitcanbedeployed.Thesearejustthetwodifferentthingsthatamachinelearningcodeshouldbeabletoaddress.Inthedevelopmentprocess,itisimportanttostandardizeonpiecesofthat,andrunningthemodelthat'swherewewanttohavereallyrobusttestingcode,sowecanputinproduction.
DevelopingaMachineLearningCodeBase[26:33]
So,Iwanttoaddresswhyyouwouldwantacodebaseorasoftwaretohelpyoudothis.Thereisalotoftoolsouttherethatmakeitreallyeasytowritesomeofthesescriptsbutitisimportanttofocusthedatascienceonthemodeldevelopmentandnotnecessarilywritingthecode.
Thecodeissomethingthatisstandardizableandthedatascientistpart,thequestionsthattheyareasking,whatfeaturesdoIuseforinput,howdoImodelthosefeaturesinthedatabase,
howdoIcomparetheperformanceofthesetwodifferentmodels.That'stherealvalueoutofadatascientist,notnecessarilywritingcodeorpossiblyreinventingthewheelthatsomebodyintheirdepartmentmayhavealreadydone.Sohavingastandardcodebasealsoallowsateamofdatascientisttostandardizethemethodologies.Itisarealproblemifyourdatascientistsareusingtwodifferentpiecesofsoftwaretocreatetheirmodelsandevenmoreofaproblemiftheyaremeasuringtheperformanceoftheirmodelsindifferentways.Howareyouevergoingtoknowwhatthebestmodelisifweareusingdifferentyardsticks.Sothatstandardizationpieceisimportantheretootohavethatorganizationalcodebasethatdatascientistscanusesothattheyareusingthesamemethods.
AndthenfinallythepointthatIhavemadeabunchoftimesandIprobablywillnotmakemuchmorethanthisisthatputtingmodelsinproductionreallyrequirethatproductionqualitycode.Youdonotwanttoputanythingthatmightbreakintheproduction.
DevelopingaMachineLearningCodeBase:BestPractices[27:59]
Andasweweredevelopingourmachinelearningcodebase,wethoughtitwasreallyimportanttoadheretosoftwaredevelopmentbestpractices.Andsoftwaredevelopmentbestpracticesareusedinthesoftwaredevelopmentworldtosolvealotofthesesameproblems.So,howdowecreatearobust,re-usablecodebase.Oneofthefirstthingsthatwedidwasuseversioncontrol.Andversioncontrolisusedbysoftwaredevelopersandallowsmultipledeveloperstocontributecodetoasinglerepository.Andbykeepingitasasinglerepository,manypeoplecanbeeditingthesamecodebaseatthesametimeandthenthere'stoolstomakesurethatpeopledonotsteponeachother'stoesandwhenthereisaconflict,thatitcanberesolved.Soitisreallyimportantforteamsofdatascientiststohaveversioncontrolwiththeircodebase.
Theotherthingthatisveryimportantforthisisunittesting.Andunittestinghasbeenusedinthesoftwaredevelopmentworldformanymanyyears.Andtheideaofunittestingisthatassoftwarebecomesmoremodularandmorere-used,itbecomesaloteasiertoaccidentallybreaksoftware.Andagoodsoftwarecodebaseisefficientanditisre-usingcodebutyou'vegottomakesurethatasyoumakechanges,yourchangesarenotresultinginunexpectedconsequences.Sounittestingbasicallydoestestingofallofthefunctionsinyoursoftwaretomakesurethattheoutputisasexpected.SoifImakeachangetothesoftwareandthatchange,I'mnotsurehowitisgoingtoaffecttherestofthesoftwareifIruntheunittestandtheyoutrun,IcanbefairlyconfidentthatIhaven'tbrokenanythingdownstream.Sothesearesomeofthesoftwaredevelopmentbestpracticesthatarerequiredforhavingagoodcodebase.
There'salsothingslikedocumentation,howdowegetpeopletofindallofthefunctionalityavailableinthesoftware,andcontinuousintegration,eitherallbestpracticesthatweusedinthedevelopmentofourmachinelearningcodebase.
DevelopingaMachineLearningCodeBase:TechnologyChoices[30:12]
Soifyouaregoingtoembarkondevelopingamachinelearningcodebase,andpleasestayonfortheentirepresentationbecausewehavegoodreasonswhyyoumightnotwanttodevelopyourown,butifyouare,there'safewtechnologychoicesoutthere.OneisRandRisalanguagethathasbeendeployedanddeeplyentrenchedinhealthcare.IamsuremostoftheanalystsandstatisticiansareatleastfamiliarwithRonthecall.Ithadbeenaroundforalong
time.Andbecauseofitbeingananalyticsenvironment,itismorefamiliartoanalystsandstatisticians.
Pythonisanotherlanguagethat'soutthere.Itisafullyfunctionalsoftwarepredominantlanguage.Thelanguageitselfisnotnewbutalotofthetoolsthathavebeendevelopedformachinelearningarenewerandthere'slotsofmomentumbehindPython.Asamatteroffact,alotoftheonlinelearningbaseinmachinelearningusesPythonasthelanguageinwhichalotofthenewdatascientistsarebeingtrained.AndPythonismorefamiliartosoftwaredevelopersanddataanalystsbecauseitiskindofafull-featuredsoftwareprogramminglanguage.
AzureMLisaCloud-basedsolutionfromMicrosoft.BecauseitisCloud-based,itisveryeasytosetupanddeploy.Thereisnoinstallationrequired.YoucanjustkindofcreateanAzureMLaccountandstartcreatingmodelsintheCloud.BecauseitisCloud-basedandbecauseweareinhealthcare,theadoptionofAzureMLisalittlebitlessthanyoumightexpectandyouhavereadsomestoriesaboutorganizationsthatareleveragingAzureMLforpredictiveanalyticsinhealthcareandtheyhavetode-identifyandscrubtheirdatabeforetheyputinAzuretotheirmodels.AndeventheexamplethatIreadworkingwithdatesandtheyhadtomaskthedates,andadateisactuallyaninputtoyourpredictivemodel.Tomethat'salittlebitriskytostartmanipulatingdatestomaskthedataifyouwanttogetagoodpredictivemodel.Soforthatreason,IthinkAzureMLhasnotseenthewidespreadofoptioninhealthcarethatyoumightexpectinatleastinotherindustries.
There'splentyofotherchoicesbutIthinktheindustryrightnowisstandardizingonRandPythonandthat'swhereweputourefforts.WehavedevelopedsoftwareinbothRandPythontodoourmachinelearningcodebase.AndthereasonwhywechosethatisRisprobablymorepopularrightnow.There'ssupportfrommajorvendors,likeSQLServer,MicrosoftSQLServer.Pythonismoreoftheupandcomingapproach.Sowewanttobereadyforbothofthoseandourclientshavedifferentpreferencesaswell.Soweaddressedbothofthem.
OurCodeBaseIncludes:[32:55]
SoourCodeBaseincludestoolsfordataingestion.Sowehavebeentalkingalotabouthowdoweleveragetheanalyticsenvironmentwithourmachinelearningcode.Wellwe'vegottobeabletoveryquicklyandeasilygetdataoutofthatenvironmentintoourcodebase.Sowehaveroutinesthatloaddatafromthedatabaseorflatfile.Dateandcameisimportantinmachinelearning.Sowehavetoolsthatallowustoexpanddate/timeintothingslikedayoftheweek,weekoftheyear,makethatreallyeasy.Missingvaluescanreallycomplicateandmakepredictionsnotverygood.Sowehaveacoupleofroutinesfordealingwiththoseindifferentwaysandbyallmeans,thewaythatyoudealwithmissingvaluesisdifferentfordifferentmodelsanddifferentusecases.Soyouwanttoprovidefunctionsforthat.
Wealsoprovidealargetoolsetaroundthemodeldevelopment.Thisiscalledthenumbercrunchingthatwetalkedaboutinthatworkflowofthedatascientist.Sosplittingthatdatabetweentestandtraining.Doingfeatureselection,howdowegetfrom40featuresdownto10features.Andthenofcoursethemachinelearningalgorithmsthemselves–whatarewerunningonthedata.RandomForestisaverypopularalgorithm,Lassoisaregression-basedmethod,andthenMixedModelsarekindthatthosethathave(34:12)longitudinaldatainhealthcareeasierandK-meansplusduringwhichwewillbeusingextensivelynextyearinourcodebaseaswell.Andthenallthetoolstoevaluatethatperformanceandhelpthedatascientistdecidewhat'smybestmodel.
Inadditionofthedevelopmenttools,wehaveanalysistools–so,howdowegenerateourperformancereportforthemodelsthatwe'recreating,andthentoolsliketohelpidentifywithtrendidentificationandbeingabletoperformwithadjustedcomparisonsarepartoftheanalysissuiteinourcodebase.
ScalingPeople[34:48]
Thegoodthingaboutthesoftwareisthatithasreallyhelpedustoscalepeople.Andwhenwethinkaboutwhatarethebigchallengesindatascientists,andthishascomeupoverandoveragain,thebigchallengeisthatfeatureengineeringpieceandhowdowerepresentthedata.Anditturnsout,dataarchitectshavegreatdomainknowledgeofhowtodothat.Theyhavebeenmovingdatainhealthcareandanalyzingdataanddevelopingtheroutinetogetdataintodifferent,transformingdataintousableformatforyears.Theyarealsooftenlookingforopportunitiestoadvancetheircareerandskills.Andwhatwefoundisthatgiventherighttools,dataarchitectsmakeincrediblefeatureengineers.Giventheiryearsandyearsofexperienceinmanipulatingdata,wearejustapplyingthemtoadifferentproblemanditworksreallywell.Andthenwhatourcodehasdoneisithasalloweddataarchitectstoeasilygetstartedinactuallyrunningpredictiveanalyticsalgorithms.
Andthisisaquotefromoneofourdataarchitectswhowasusingoursoftwaretocreateapredictivemodelinoneofhisproducts.AndthisisPeterMonaco,andhesaid,"OneawesomethingabouttheoutputfromtheRpackageyouputtogetheristheoutputalignsperfectlywithcreatingPatientStratificationalgorithms.ThefactthatIfeelcomfortablerunningthisstuffspeakstohoweasyyouhavemadeit.Thanksagain,Levi."Andhe'sthankingLeviwhoyouaregoingtohearfrominalittlebit.Butthisisgreat.ItallowedPetertodowhathedoesreallywell,getthedatainagoodformat,andlowerthebarrierforhimtoactuallyrunthesealgorithmsanddosomeoftheworkthatdatascientistdoes.Soweseeit'sverypromisingtohelpingustoscaleourmachinelearningeffortsacrossalargenumberofpeopleintheorganization.
PuttingPredictiveModelsinProduction[36:37]
Modality#1:Extract,Transform,Load(ETL)Process[36:37]
Sonowitcomestimetoputmodelsinproductionandwearegoingtotalkfirstabouthowwemovethemintoproductionfromatechnicalstandpointandthenhowdowemovethemintoanapplicationorviewthatcanactuallychangebusinessprocessorbetteryet,providecareforpatients.
SoModality#1istoputthemallinproduction,leveragingtheETLprocess.Andthisisappropriateifthepredictionisnotbasedonhighlydynamicdata,oriftheinterventionstrategyisokaywithsomeleveloflatency.So,anexampleofthiswouldbeareadmissionproduction.Typically,readmissionalgorithmsarenotbasedonhighlydynamicdata.Theyarebasedondatathatisnotchangingsuperfast.Soifwearepullingdataonanightlybasisorevery12hours,areadmissionalgorithmisgenerallygoingtobeokaywiththat.Andinthiscase,wejustputthemachinelearningcodeinthemiddleoftheETLprocess.So,we'vegotETLtoloadthedatasources.We'vegotdataETLthatscientistsordataarchitectscreatethatloadthoseengineeredfeatures,theinputstoapredictivemodel,andthenwerunthatcodethatcaneasilygrabthosefeaturesfromthedatabaseandoutputaproductiontothedatabase.Ourmachinelearningcodecanalsowritethesepredictionstothedatabaseandthisishowwehavedeployedseveralmodels.ItiseasyanditjustwrapsrightupwiththeETLprocess.
Modality#2:WebServices[38:07]
Modality#2iswhenthedataismoredynamic.Soanexampleofthiswouldbesepsisearlydetection,wherewearelookingatchangesinvitalsignsandtheinterventionstrategy,wecannotwaitupto24hourstointervenewhensepsishappened.Itissomethingweneedtointervenefasteron.Sointhiscase,wecandeploythepredictivealgorithmasawebservice,andthewebserviceisreceivingrealtimefeatures.Sothosechangesinvitalsigns,withthose
vitalsigns,that'sgoingtocomeinfromaverydynamicsetting.Youmightstillbeusingsomehistoricfeatureslikewhatarethedemographics,whatistheageofthepatient,thatwecanpullfromtheEDW.Andthen,thewebservicewillbecombiningthatliveinputwiththathistoricdatatorunthatmachinelearningcodeandthenoutputthemodelbackintotheapplication.Sothisisdefinitelydesignedformoredynamicsituationsandmoredynamicpredictions.
ScalingPredictiveAnalytics[39:08]
DeployWithaStrategyforIntervention[39:08]
So,goingbacktothispointofdeployingwithastrategyforintervention,thisisthereal–Iwouldcallthisthemostimportantpointinthepresentation.So,theideahereishowdowedeployandgetthesepredictionstoactuallyimpactcare.
CaseStudy:CentralLineAssociatedBloodstreamInfection(CLABSI)[39:26]
I'lltalkaboutalittlecasestudythatwedidwithoneofourclientsonCentralLine-AssociatedBloodstreamInfectionsorCLABSIs.Approximately41,000patientsactuallyendupwiththiscondition,41,000patientsintheUSperyear,thatshouldread,andactuallyoneforpatientsthatgetaCLABSIwhodie.Soitisaveryseriouscondition,andorganizationsarereallystrugglingtokeepupwiththis.There'sgreatguidelinesoutthere,evidence-basedguidelines,forhowtocareforpatientssuchasreducethelikelihoodoftheCLABSIandweworkedwithaclienttodevelopretrospectiveanalyticstolookattheircompliance.Anditreallyhelpshighlightsomeproblemsandtheygotreallygoodatusingthedatatofindproblemareasandthendevelopinginterventionstofixthose.Sotheydevelopedamusclemanofusingdatatoimprovetheircareandbusinessprocesses.Thentheysaid,okay.Takeustothenextstep.Nowwedonotwanttoknowwherewefailed,wewanttoknowwhatiscomingnext,whoarethepatientsthat'sahighriskforCLABSIsothatwecanintervenewiththem.Andso,theycametoLeviandLeviandhisteamdevelopedapredictivealgorithmthatisbasedon16featuresthatpredictsthelikelihoodthatapatientisgoingtodevelopCLABSI.Wewillseewhatthatlookslikeinaminute.
ModelPerformanceReport[40:40]
Itisimportantthateverymodelthatwedevelopandeverymodelthatwedeploycomeswithaperformancereport,andthisperformancereportisnotahighlytechnicalreport.Andtheideahereisthatwearetryingtobrieflysummarizewhatwearetryingtopredict,thevariablesthatwereconsideredinthatorthefutureinputsthatwereconsidered,anddevelopingthatmodel,whatmodelwechooseintheaccuracyofthatmodel?Andthisreportisnotusedfortechnicalpeoplebutthisreportisusedforbusinessorclinicalpeopletohelpthemunderstandthatalgorithm.Communicationaboutwhatanalgorithmdoesisit'sextremelycriticaltotheadoptionofthatmodel.
DiscussingModelsWithClinicians[41:22]
Whendiscussingmodelswithclinicians,clinicianswilladoptpredictiveanalyticsinsofarastheyunderstandit.Iftheydonotunderstandwhatisgoingon,itisgoingtobeamuchharderconversationbecauseifyouthinkaboutwhatcliniciansdo,theyarerunningpredictivealgorithmsintheirheadallday.Theyarelookingatlargeamountsofpatientdataandboilingthatdowntohypothesesorconclusionsaboutthesepatients.Whatwetrytodowithmachinelearningisstandardizethat.Typically,doctorsdonotdothatonthesameway.Sowehelpthemtostandardizeit.Andiftheyunderstandhowwearedoingit,anditisclosetowhattheyaredoing,theyaremuchmorelikelytoadoptit.
Theotherpointthatweshouldmakeone,talkingaboutdeployingpredictivemodels,isthatcomplexitycomesoutofprice?Anditcomesinanextremeprizesometimes.So,aregressionmodelcanoftenstrikeabalancebetweenpredictivevalueandinterpretability.Regressionmodelsareonewaytodoprettygoodanalyticsandtherearemoresophisticatedmachinelearningalgorithmsthataremuchhardertoexplain.Soifyouhavetwomodels,aregressionmodelandamoreadvancedmodelthatmighthavemagicallybetteraccuracy,theregressionmodelstillmaybethemorefavorablemodeltodeploybecauseitiseasiertoexplain.Andinourcode,weuseaprocesscalledregularizationthatpenalizecomplexity.SotheLassoalgorithmisaregressionalgorithmthatactuallyhasbuiltintoittheabilityforthemodelitselftotuneitselfandtocreateafavor,amoresimplemodel.AndthewaythatWikipediasaysisthatenhancesthepredictionaccuracyintheinterpretabilityofthemodel.So,supersuperimportant.Complexitycomesataprize,especiallyinaclinicalsetting,forerganizationsthatarenewtopredictiveanalytics.
WhatDoesItLookLike?[43:09]
Andwhatdoesitalllooklike?Now,thisisthepunchline.Thisiswhatpredictiveanalyticslookslikewhenitisputinfrontofcliniciansandtheyaregivingthemtheabilitytomakedecisionsbasedonthis.AndthisisliketheUbertellingthefiveminutestoacaronmywatch.AndwhatyouseehereisaunitscoreboardforclientwhowasdoingCLABSI,andIhavetosaythiswasalsogeneratedonscrubbed,de-identifieddataset,butitfairlyaccuratelyreflectswhattheclientwasseeingintheirenvironment.SoIjustwantedtomakeitclearthatthisisallde-identifieddata.Butwhatyouseehereisinthatgreenbox,thereare12patientswhoareanactivelistforCLABSIonthisunit.Andtheunitreviewedthisdashboardmultipletimesperday.
Andwhatyouseebelowinthislistofpatientshereisthenameofthepatientagainst(44:08)andtheprobabilitythattheyactuallyhaveaCLABSI,andthehighestriskpatientsareatthetopheresothattheeyegoesimmediatelytothepatientswhohavethehighestrisk.Thistoppatienthasa64percentchanceofdevelopingaCentralLineAssociatedBloodstreamInfection.
Andtheimportantthinghereisnottojustleavethatnumberbutwhatgoesoninthisfarright-handcolumn.Thisfarright-handcolumnshowstheriskfactorsthataredrivingthatprediction.Andmoreover,itisnotjusttheriskfactors,it'swhatwecallmodifiableriskfactors.So,thepatient'sageisoftenariskfactorintheirdevelopmentofaCLABSI.However,thereisnotmuchacliniciancandoaboutthat.Sowhatwedoisweshowthefactorsthatthecliniciancanactuallychangeormodifytogetthatpatienttoalesserriskstate.Andinmanycases,it'sthenumberofdaysthattheyhavebeenontheCentralLine,theplacementofthatline.There'sallsortsoffactorsthatdriveit,andwhatacliniciandoistheylookatthisasawholeanddecidewhatcanIdotoreducethatpatient'schanceofdevelopingaCLABSI.Wearereallyexcitedaboutthis.Ithasbeendeployedinourproductionenvironmentatoneofourclientsandwe
areworkingwiththemtounderstandlong-termhowwillthisaffecttheCLABSIratefortheirpatients.Butthisisjustoneexampleofwhatitmeanstoputpredictiveanalyticsinthehandsofclinicianswhocanmakedecisionsofcarebasedonthis.
ModelsBuilttoDate:[45:42]
Themodelswebuilttodate,thisisjustalistingofthem.CLABSIisjustoneofmanymodels.Wewantedtofocusononehighimpactexample.Butasyoucansee,wehavealotofalgorithmsthatwehavebuilttodatethataredrivingdecisionsacrossthecountryandlotsmoreindevelopmentandlotsandlotsofideas.So,thishighlight,Ithink,needforustoscaleourmachinelearningandpredictivecapability.Onlysomuchthatoneteamcanuse,andthat'spartofourstrategy–istousethatsoftwarethatwehavedevelopedtomakeiteasierforlotsofpeopleintheorganizationtobeabletodothis.
PollQuestion#3Whatareorwouldbethetopthreemostimportantdatasourcestoyourorganizationinmakingpredictions?(select3,ifapplicable)[46:20]
We'vegotonemorepollquestion.Whatarethetopthreeimportantdatasourcestoyourorganizationinmakingpredictions?ClinicalEMRdata,claimsdata,patientoutcomesdata,financialdata,non-medicalpatientdata,patientsatisfactiondata,orunsureornotapplicable?
[TylerMorgan]Alright.We'vegotthatpollquestionup,Eric.AndIwouldliketoleteverybodyknowthisisamultipleselection.Sopleaseselectuptothree,ifapplicable.Wewillleavethisopenforaminute.Wewouldliketoremindeveryonetopleasetypeinyourquestionsorcommentsinthechatpaneofyourcontrolpanel.We'vegotalotcominginhere.That'sgreat.
Results[46:59]
Alright.Letusgoaheadandsharetheresults.
[EricJust]Thisisgreat.Okay.Patientoutcomesdata,that'sgreat.Thatissomethingthatweareseeingalargetrendintheindustryaswell.ClinicalEMR,claimsofcourseareverypopular.Butpatientoutcomesdataisdefinitelyahottopic,especiallypatient-reportedoutcomes,howdowebettermeasurethoseoutcomes.Andofcourse,that'swhatwe'retryingtopredict–isoutcomes,inmostcases.So,thankyoufortakingthetimetorespondtothepolls.Theyareveryinsightful.
ThreeKeyRecommendationsforScalingPredictiveAnalytics[47:34]
Sojusttoreiterateourthreerecommendations,fullyleverageyouranalyticsenvironment,dothedatamanipulationandthedatawarehouse.Itiseasiertore-useandeasiertooperationalize.Standardizeusingproductionqualitycode.So,havingyourgroupusingthesamerepositoryincreasestheeconomyofscaleanditallowsyoutodeploy,theabilitytodopredictiveanalyticstomorepeople.Andthenfinally,deployingwithastrategyforintervention.Alwaysthinkabouthowthebaitisgoingtobeusedtomakedecisions.
WhattheFutureHolds[48:09]
AndbeforeIcutovertoLevi,Ijustwanttotalkbrieflyaboutwhatthefutureholdshere.
ClosedLoopArchitecture™[48:12]
Whatwesee,ofcourse,isthattheclinicalworkflowengineisstilltheEHR.Cliniciansspendmostoftheirtimeintheelectronichealthrecordandthatiswheretheinsightsaregoingtobedeliveredtothemthatinfluencetheircaredecisions.Andintoday'sworld,wehearalotaboutSmartonFHIR.ThisisatechnologythatallowsthatEHRworkflowtobeaugmentedthroughwebapplicationsandFHIRisaninterfacethatisdesignedtositovertoEHRandmakeiteasytopulllivedatafromtheEHRanddevelopwebandmobileapplicationsthat,again,augmenttheworkflowoftheEHR.
Whereweseetheanalyticsenvironmenttakingplaceisreallyprovidingalotofpowertothisideaofputtingnewapplicationsintheclinicalworkflow.Andthedatawarehousereallybecomestheanalyticsenginethatisdrivingalotofthedatathatshowsupinthatworkflow.So,thedatawarehousehasahostofdifferentdatasourcesthataredrivingmodels.LikeIsaid,Ireferreditfromthebeginningofthefeature,achockfulloffeatures.We'vegotregistrydefinitions,we'vegottextprocessingNLPalgorithms,we'vegotallofthesepredictivealgorithmsthatwe'regeneratingalmostlikeanalgorithmlibraryandthenwewanttoexposethatthroughanAPIsuchthattherealtimedatafromtheEHRthroughtheFHIRinterfacecanbecombinedwithallofthatanalyticdataandthendeliveritbacktothewebandmobileapplicationsandreally,notonlyaugmenttheirworkflow,butaugmentthedatathattheyareseeingandmakeitbasedonalargerrepositoryofhighlyvaluableanalyticdata.
SoIamgoingtocutovertoLevinow.Leviisgoingtosharewithyousomeexcitingnewsaboutoursoftwarethatwehavedeveloped.WehavedecidedtoOpenSourceoursoftwareandLeviwillwalkyouthroughhowyoucangetthroughourcoderepositoryanddownloadthesoftwareanduseitforyourteamoryourself.
AboutHCTools
[LevThatcher,DirectorofDataScience]Hieveryone.Greatjob,Eric.So,Erichasgivenafantasticoverviewofbestpracticesinpredictiveanalytics.Soyouaregoingtoaskyourself,well,howcanwetakeitfromhere?Howcanweactuallydowhatyouguysdescribedinourorganization?Andthatiswhywearesoexcitedtoopensourcethis,ourpackagesandworkingonit.Soit'scalledHCTools,aglobalproject.Youcanseehere,wearehctools.org.Soifyouwanttogetstartedtoday,simplytypethatinyourbrowserandthatis,thatthisenablesyoutocreatemodelsonyourdatawithverysimpleexamples.
WelcometoHCRTools[50:47]
SoifyouclickintoadocumentationforHCRTool,itverybasicallydescribeswhyitissogreatforhealthcare,howtoinstallthepackage,howtogetstartedwithexamples.Andso,let'stakethosescenariosthatyoumightbeinterestedin.Sosayyouhaveagreatdatatoputtogether,saydiabeticdata,andyouarewantingtopredict,say,readmissions.Soyoucanaskyourself,wellhowdoesthetoolhelpmedothat?
[51:13]
Soifwehopovertoourstudio,ifyouhaveinstalledthepackage,whatyoudoisyoutype,okay,welllibraryisyourtool.SoinR,youoftenloadpackagesandthenitbringcertaintypesoffunctionalityandyousimplytype?HCRToolsandthatwouldbringuptheexamplesassociatedwithourpackage.So,nicebuilt-indocumentation,youcangetit(51:35)tocreateamodel.Soonceyouhavethisdataset,youclickonthis,sayLassoDevelopmentorRandomForestDevelopmentlinkandthatwillgiveyouboththedescriptionsofthearguments,thefunction,aswellastheexamplecode.
Comparepredictivemodels,createdonyourdata[51:50]
Sowhatyoucandoisscrolldownandyoucanfindthebuilt-indataset.Solet'ssay,okay,wellyouhaveyourdataset,areyoureadytogo?Andsowhatyoudoisyoujustbasicallygrabthisexamplecodeandopenupanewscript,copyitin,andhitrun.
[52:09]
Andwhatitbasicallywilldo,itwilltellyou,fromthisparticulardataset,howwellyourmodeldid.SoyouhaveaLassomodel,whichErichadmentioned,andyouseetheAUCis0.86.Sothat'sthemeasureoftheaccuracy,sayifwe'repredicting30-dayreadmitlog.AndthenyouhaveaRandomForestexampleaswellthathadAUCof0.82andsoyouareabletoquicklysee,okay,wellsothisisyourdataset.Wehavethisparticularmodelthatdidreallywell.Andso,Ithat'sRandomForestmodel.Yousimplyusethedocumentationtogoanddeploythatmodelandyoucanseethelinksthere.
[52:46]
Soasyouhaveyourdataputtogether,thisisthewebsite.Reachout.Letusknowwhatyouareworkingonandhowwecanimprovethesetoolsbecausewereallywanttobuildaplacewherewecanallcollaborateandbuildsomethingthathelpseveryoneinhealthcare.
[EricJust]Thanks,Levi.Soagain,theURLtogotoifyouwanttodownloadthesoftwareishctools.org.WewillberenamingandredirectingtheURLsatsomepointtoamoremarketing-friendlyname.Ourmarketingteamhasinformedusthatthisisthekindofnamethatyougetwhenyougetabunchofdatascientistsintoaroomtonameaproduct.Sowewillberenamingiteventuallybuthctools.org.Pleasegothereandgiveusyourfeedbackonthesoftware.Ifyouwanttodownloaditandplaywithit,makeitpartofsomethingthatyouuseinyourorganization.Wearehappytoanswerquestionsandprovidethattoolforyou.
[TylerMorgan]Alright.That'sgreat.Thanks,Eric.WeareaboutreadyforourQ&Atime.We'vegotsomegoodquestionsin.
HowinterestedareyouinsomeonefromHealthCatalyst®contactingyouaboutademonstrationofoursolutions?[53:54]
Butwhilewehavethesequestionsin,wedohaveafinalpollquestionforyou.Whilethesewebinarsareintendedtobeeducation,wehavehadmanyrequestsformoreinformationaboutHealthCatalyst,whoweare,whatwedo.IfyouareinterestedinhavingsomeonefromtheCatalystreachouttoyou,toscheduleademonstrationofanyofoursolutions,pleaseanswerthispollquestionnow.
QUESTIONSANDANSWERS
QUESTIONS ANSWERSWhyarepeoplenotyetusingpredictiveanalyticsinhealthcare?Whydoeshealthcareseemtobebehindotherindustries,whetheritmaybecontractriskorotherreasons?
Ithinkalotofitcomesdowntotherisk(54:33).ItisriskiertostartandIthinkthatweareveryrisk-averseofcourseforagoodreason,arisk-averseindustry.AndIwillgetthepredictiveanalyticsthataredeliveredtomeinNetflix.Theyarethelonganalytics.Idonotcareaboutthosecartoonsthatarebeingsuggestedtome.Andallofthathastodowithassumptionthatwearemakingonthedata,andIthinkhealthcareofcoursestillhasalotofissuestoworkoutwithtrustingthedata
andtheunderlyingqualityofthedata.Soorganizations,Ithink,whoarewarytousepredictiveanalyticsmayalsobewaryoftheirunderlyingdataquality,datagovernanceofthetopicthatwehearaboutalotlately,andhelpingorganizationstoactuallyimprovethequalityoftheirdataaspartofthedatastrategyisgoingtobeveryimportantfortheincreasedadoptionofthistechnologyandthesealgorithms.
WhatarethesystemrequirementstousetheHCTools,theHCRToolsetthatyou'vegot?
IfyouhaveRinstalledonyourmachine,youcansimplyvisithctools.organdfollowthequickinstallguide.Justafewsimplecommandsandthatshouldbeupandrunning.
CanyouintegrateourmodelsdevelopedoutsideofyourorganizationoroutsideofHealthCatalyst®?
Absolutely.Thecodebaseisdesignedtoallowyoutodevelopyourownmodels.So,aslongasyougetyourfeatureinputsinsuchawaythatthesoftwarecanaddressthat,thesoftwarewillhelpyoudevelopyourownmodelsandeithermakethoseavailabletosomebodypubliclyorjustleavetheminyourownenvironment,thetooldefinitelysupportsrunningandcreatingmodelswithsomeHealthCatalyst,aswellascreatingyourown.
IsthereanAPIthatyouhavetodeploy? No.NoAPIisnecessary.Sobasicallyyouinstallourpackageandyoucanusethebuilt-indocumentationorthedocumentationonthewordpagesinhereandhavesomebuilt-indataactuallythatyoucandotostartwith.Sotheexamplesrunthatoutoftheboxandthenyoucanusethoseexamplestotailorthemtoyourspecificdata,onyourtableandyourdatabases,etc.
Canyoupleasesharesomeofyourexperienceintermsofdemonstratingthevalueofpredictiveanalytics.
Absolutely.So,wheneverwedevelopmodelsanddeploythemwithacustomer,oneofthethingsthatwetrackisoutcomes.Andineachofourapplications,wehavetoolstotrackthevariationinthoseoutcomesandwelookfor,that'sanotherarea,whereadatascientistandanalystareveryhelpful,ishelpingtounderstandthattrendandisthattrendactuallygoingdownsincetheimplementationofthatpredictivemodel.Sowedohavewaystomeasurethat.Theotherthingthatwedoiswemakesurethataswearedeployingthemthatthereisanorganizationalunderstandingofhowtousethem.Thatisactuallyverycriticalincreatingthatvalue.
IfI'mnotusingtheHealthCatalystdatawarehouseandBIplatform,canIstillutilizethesemodels?
Yeah.Forsure.Soitisveryflexible.Soifyouhave,say,(57:59)filesoradatabase,youcanconnecttoanyofthose.Butwewanttobeclear,youcanusethesoftware.Themodel-specificreadmissionmodelsdonotcomewiththesoftware.Thesoftwareisjustforrunningandcreatingthosealgorithms.
[TylerMorgan]Alright.Well,thankyousomuch,Eric.Thanks,Levi.Wewouldliketoleteveryoneknow,shortlyafterthiswebinar,youwillreceiveanemailwithlinkstotherecordingofthewebinar,thepresentationslides.Wehavethelinktohctools.organdalso(58:28)download.Also,pleaselookforwardtothetranscriptnotificationthatwewillsendyouonceitisready,andalso,withspecialinvitationstotheupcomingwebinarsinpredictiveanalyticswebinarseries.
OnbehalfofEricJust,LeviThatcher,aswellastherestofushereatHealthCatalyst,thankyouforjoiningustoday.Thiswebinarisnowconcluded.
[ENDOFTRANSCRIPT]