Machine Learning Misconceptions May 3, 2017 · Thanks Tyler. So my name is Mike Mastanduno. Levi Thatcher will join me. Levi. [Levi Thatcher, Director of Data Science] Yes. I am Levi

Machine Learning Misconceptions May 3, 2017

Machine Learning Misconceptions May 3, 2017 [00:07]

[Michael Mastanduno, PhD] Thanks Tyler. So my name is Mike Mastanduno. Levi Thatcher will join me. Levi.

[Levi Thatcher, Director of Data Science] Yes. I am Levi Thatcher, Director of Data Science. I am excited to see what we have here, Mike.

[Michael Mastanduno] Great. So the way we are going to do this today is, you know, we want to keep it kind of conversational. We want to interact with you guys. So, ask us questions. We will try and fill them on as we go. Really interesting to see all the data analysts and scientists in the audience. Hopefully, we can clear up some of these machine learning misconceptions across the whole gamut of data-hungry folks.

Data Science team [00:46]

So we are a part of the Health Catalyst® Data Science team. We are four on the team so far andsoon to be five. We just hired a promising young scholar from UC Davis. So, we are really excited about that. And the purpose of our team's work is to build out (01:04) improvements and we do that through machine learning. So, we feel like we have some understanding of how machine learning works in healthcare and how it can be utilized and implementation experience. So, we will go through some of what we have learned in the past and you can draw on our experience and hopefully we will all come out with a little bit more information.

Purpose of Today's Chat [01:29]

So the purpose of today's chat is, first, just to compare and contrast what is the difference between machine learning and artificial intelligence. We hear a lot about these two terms and we just want to make sure that at least the listeners are on the same page with what is realistic and what is not. We want to discuss techniques. So, machine learning algorithms that offer feedback into the system and when it is necessary to retrain the model. You know, we have had people ask us do the model learn continuously and the answer is sometimes but we will go through the details on that.

We want to give advice on how to avoid common pitfalls in implementing machine learning, especially in healthcare and we give advice on how to avoid them. And then we want to talk about potential applications of the different types of machine learning algorithms, just so that we are all familiar with those and what to expect.

And then finally, you know, as we said, we will reserve some time for Q&A at the end but we definitely want to – we have a plenty of time during for questions. So feel free to add to the chat and we will be monitoring them as the broadcast goes on.

Machine Learning Definition [02:42]

Alright. So let us start with machine learning definitions. And you know, we have tons of experience, which is why we are going to go straight to Wikipedia just to see what is the highest level definition for machine learning, and it is a subfield of computer science that gives computers the ability to learn without being programmed. And some algorithms, you know, they overcome strictly static program instructions by making data-driven predictions or decisions through building a model from sample inputs. So that basically means that you use an algorithm to teach a computer to learn from data and then it can use those learnings in the future to make predictions.

[Tyler Morgan] Mike, we have an interesting question again related to that. "Machine learning is mainly for predictive analytics? True or False?"

[Michael Mastanduno] Machine learning, yeah, it feels like machine learning is mainly for predicting things but it does not necessarily have to. Prediction is kind of a, you know, you are not necessarily going to be predicting whether something is going to happen. You could be kind of scaling things by whether someone will like it or how likely someone is to click on something, things like that. So we can get through that a little bit more in the applications.

[Tyler Morgan] Yeah. Sounds great.

[Michael Mastanduno] Okay.

Machine Learning Typical Use [03:59]

And so, you know, what are some typical uses of machine learnings, like you said, and you know, movie recommendations on Netflix, that is kind of predicting the likelihood that somebody is going to like a movie; people you may know on Facebook, they are going to try and show you friends or potential friends that you might know; advertising, advertising is currently built on machine learning. If we took that out, we would not have advertising anymore; patient likelihood. So now we get into healthcare stuff of contracting sepsis or being readmitted. And then really any tabular data source to predict either a yes/no outcome or a continuous outcome. And so, you are going to think of a lot of ways to use EMR data to predict the yes/no outcome or continuous outcome. For example, yes or no, will a patient contract sepsis? Yes or no, will a patient be readmitted?

Artificial Intelligence Definition [04:57]

So now, let us move on to artificial intelligence. Again, the Wikipedia article says that artificial intelligence is intelligence exhibited by machines. And so, the field of AI research defines itself as a study of intelligent agents or any device that perceives its environment and takes actions that maximize its chance of success at some goal. And I think that is a pretty good definition. But you might say it is kind of similar to machine learning, don't you think?

[Levi Thatcher] There is a lot of overlap.

[Michael Mastanduno] Yes. And so, maybe let us dive in a little deeper. And Yann LeCun, Director of Facebook AI Research, said that machine learning models are limited in their ability to reason, which means carrying out long chains of inferences or optimizations to arrive at an answer. Basically the way humans think is kind of what is missing in traditional machine learning. And the number of steps in the computation is limited by the number of layers and feed-forward nets, or by the length of time or a recurrent net will remember things. And those are AI techniques but he is basically saying that there is some jump into, you know, between what we are calling it machine learning and neural networks and even deep learning to what is the very most cutting edge of the deep learning techniques.

[Levi Thatcher] So machine learning is more focused on one particular task, wanting about one relationship.

[Michael Mastanduno] Yes, that is exactly right. So machine learning is, you know, you feed it data and then you ask it, predict – there is likelihood of sepsis but it is not going to be able to say based on this data this patient is going to get sepsis but hey, also, you know, look out for CLABSI.

[Levi Thatcher] Exactly.

[Michael Mastanduno] It would be a different model that would be required for that. Now, AI, on the other hand, a doctor might be, you know, not on artificial intelligence, might be able to make that connection but it is still at the cutting edge.

Artificial Intelligence Typical Use [06:46]

So, what are the typical uses of artificial intelligence? Well, speech translation is a big one, complex game playing like Google's group just learned how to play go, self-driving cars are considered artificial intelligence because they are taking scenarios they have seen and applying them to scenarios they have not seen and then content delivery and complex systems, and maybe even radiology. There has been a lot of clinical academic studies concerning radiology using deep learning artificial intelligence, call it what you want.

Difference Between ML and AI [07:30]

I think one interesting thing about artificial intelligence and I believe I pointed this out the other day is that the main difference is that it feels like machine learning is something we do understand and artificial intelligence is something we are still kind of as a society or as a field still kind of wrestling with the actual understandings of it. We do not fully understand why neural nets work as well as they do but we have hundreds of conferences every year on what the best way to do it. So our understanding is getting there.

[Levi Thatcher] Yes. So to the future stuff, we do not quite know how (08:05) we call AI, it seems like.

[Michael Mastanduno] Yes. That sort of seems like to me.

[Levi Thatcher] Yes. So we have a couple questions related to that. So one person says, "Are neural nets the same as machine learning?"

[Michael Mastanduno] Yes, and I would say they are. They are an algorithm that is going to be applied to data and they are going to be an algorithm that is applied to data and they are going to be able to produce an outcome, yes or no, or predict a continuous class. Now, they have a lot more flexibility than a more traditional algorithm does because they are much more complex to implement, but as far as what they are doing, yes, they are taking a data set, they are learning from it, and then they are making predictions on a set number of variables.

[Levi Thatcher] That is a great explanation. So a few questions in the same space. So, I believe the same person also asked if deep learning is the same thing as using neural nets, those two related, yes?

[Michael Mastanduno] Yes. So, in that sense, deep learning is kind of the umbrella term that falls under neural nets because, like I said, neural nets have a lot of different configurations and architectures, as they are called. And so, sometimes you can do a simple one and we would just call that neural net or multilayer perceptron. Sometimes you can do a very complicated architecture with many classes and many layers and that would qualify as deep learning.

[Levi Thatcher] That was a good explanation. The one person asked about cognitive computing and it is kind of a newer term in terms of the vend diagram, how it falls in with AI. I do not know if it is fully flashed out or…

[Michael Mastanduno] I am not sure where that one would fit.

[Levi Thatcher] Yes, it is somewhere in there.

[Michael Mastanduno] It feels like AI but I feel like one of the things we are trying to dispel at this webinar is that all the different ways you can talk about machine learning AI, predictive analytics, cognitive computing, they all mean more or less the same thing, using math to learn about data and make decisions.

[Levi Thatcher] Yes.

[Michael Mastanduno] So let us go through the main differences between AI and machine learning in our mind. Well the first is that it is fuzzy. I hope we have conveyed that, that one person's AI is the same as another person's machine learning. And Learning from data – you know, not really. That is not really the difference because they both do that. What about continuously learning from data? Again, not really because both classes of algorithms are going to do that. AI feels more complicated. Would you agree with that?

[Levi Thatcher] Yes. Yes, definitely.

[Michael Mastanduno] I think the big difference is this one though, that AI should be able to learn a skill and generalize it to an entirely different thing and that is kind of the pipe dream of AI.

[Levi Thatcher] That is a great point. It hits the nail in the head. And one person wonders, so do both these technologies use the same kind of data?

[Michael Mastanduno] Yes, they do. They can. They can and they do. I think AI is more it works better with computer vision. So, image processing, video processing, extracting information out of video, which is why you see self-driving cars being classified as AI or speech recognition. You know, that is a continuous stream of frequency in your voice, which is analyze continuously and that is kind of why AI falls into that realm. But again, I think many AI ideas, like computer vision, they just get rebranded as (11:42) as the time goes on and we begin to understand them more, and then…

[Levi Thatcher] Yes. Just stay tuned.

[Michael Mastanduno] Yes. And then as we understand the meaning more and more, people forget about them and just say, oh it is just Math.

[Levi Thatcher] Yes, for sure.

Poll Question ML Usage [11:57]

[Michael Mastanduno] So, with that, we have our first poll question on machine learning usage.

Poll #1: Have you ever used machine learning or AI? 148 respondents [12:04]

[Tyler Morgan] Alright. We have got that poll question up. Have you ever used machine learning or AI? Please select one of the following: yes, in my daily work; yes, as a hobby; no, but I plan to; or no, not applicable to me.

So we will leave this up for a few moments and give everyone a chance to respond. We would like to thank everyone for their great participation and remind everyone to continue to type in your questions and comments in the questions pane of the control panel.

Alright. Let us go ahead and close this poll. Let us share the results.

Poll Results [12:33]

Alright, 21 percent responded yes, they use machine learning or AI in their daily work, 17 percent as a hobby, 52 percent no, but they plan to, and 9 percent have responded no, not applicable.

[Michael Mastanduno] I think that is a really good indication of kind of where the market is going. Don't you, Levi?

[Levi Thatcher] Yes, a lot of practical folks here is excited to get their hands dirty.

[Michael Mastanduno] Yes, I think that is great because all the other industries get to benefit from machine learning and healthcare might as well also. So that is great to see. All you people who are doing machine learning as hobbies, try and get it into your jobs. It can be a great side project and really exciting stuff.

Actually, before we move on, we had someone commenting that Waze is a great example of AI and with the reason being that it can tell you what time you should leave the office based on traffic. And you know, I think that could be definitely considered AI, you know, you are using

Waze, but under the hood, I think Waze is just kind of checking the traffic situation and comparing that to what your normal commute time is like and then giving you a suggestion if it is going to be any different than normal. So, I do not know. You tell me, is that AI or is that just kind of making a prediction and checking versus the normal.

[Levi Thatcher] Yes. Yes. It definitely (13:57) together.

How is machine learning used? [14:01]

[Michael Mastanduno] Okay. So how does Health Catalyst® use machine learning? (14:06) put together thisgreat diagram for us and while it is a little busy, at the moment it does have a ton of great information. So, if we follow along the bottom row, we see the patient's flight path. So, the patient comes into the hospital and now identified for a cohort such as being treated for COPD, which is a lung disease. And then at that time we want to give the physician a prediction of whether that patient will be readmitted or not readmitted based on all of the data that is in the EMR about that patient as well as other patients who either have or have not been readmitted. And so, to build that model, we are going to use this Open Source healthcare AI package that our team has developed and data from the hospital's past patients. And you know, build that historical data set of all these patient characteristics and interventions and outcomes data and use those data or features in an algorithm to combine into a model. So the data plus the algorithm becomes a model. And we will call it machine learning at this point because we are using fairly simple but effective predictive algorithms. And so, the outcome of that model is a probability for each patient of being readmitted, which is exactly what the physician would get when they want it.

Now, after the physician sees that, oh, hey, Joe is really likely to be readmitted, they can do something to intervene, give Joe a little extra attention or maybe dig into why he might be readmitted and uncover another disease that he has and treat for that or whatever clinicians do. I am not a doctor. Well, I am a doctor but not that kind of doctor.

[Levi Thatcher] We are not the kind that helps people.

[Michael Mastanduno] Yes. Well, maybe we are.

[Levi Thatcher] You know, (16:04). So somebody asked related to this, what about the black box problem? So how do we help clinicians understand what to do about a high-risk patient?

[Michael Mastanduno] Yes, that is a great question. I think machine learning has this stigma of being uninterpretable and that is definitely true in the most advanced algorithms and the deep learning and the AI. We do not understand how it even works much less but a lot of these simpler models, especially the ones that the healthcare AI package is built on, are very well understood and they are very interpretable, and with the right visualizations, you can really get a feel for why a model has made the prediction that it has and also how to change that prediction for the better. And so, we focus a lot of our attention on interpretability and trust in the models because we know that that is important to adoption. So thanks for that great question.

And then just to finish up this slide, after the patient gets the actual outcome, that data is fed back into the model and the model is updated with the new data that is most up to date. And that is kind of how we envision, at least for COPD-related readmissions, that is kind of what a potential model building framework might look and how it fits into the clinical workflow.

Poll question Where's your org? [17:27]

With that, we have another poll question.

Poll #2: Where is your organization in terms of using machine learning in regular operations? 138 respondents [17:31]

[Tyler Morgan] Alright. Where is your organization in terms of using machine learning in regular operations? Are you using machine learning tools daily across many departments, daily across a couple of use cases, confined to a research study or two ,or what is machine learning?

We will leave this open for a few moments. Are there any questions that have come in that we can respond to while folks are answering this poll question?

[Levi Thatcher] A good point. There is. So, in this workflow you just presented, folks have asked, is there a success story or what are the use cases that Health Catalyst® has been working and usingthat workflow in. Yes, so I think the best success stories we have currently are on CLABSI prediction. And so, there we were able to build an app to help deal with CLABSI cases and include a predictive model in that app and measure the outcomes improvement around 20 percent and you can read about that success story on our website. We have several other models that are in testing or evaluation periods. So they are in production at hospitals and COPD readmissions. We have a no-show model. And preliminary results on those look promising, as far as outcomes improvements go, but we are still waiting for the data versus the control group to be able to write up our success story and put it online.

[Michael Mastanduno] Yes. There we go.


Well, the response to our poll question, 13 percent responded they are using machine learning tools daily across many departments, 17 percent responded to daily across a couple of use cases, 49 percent is confined to a research study or two, and 21 percent responded what is machine learning.

[Michael Mastanduno] Well, that is great to see that machine learning has made into at least the research studies. When I wrote that question, I kind of phrased it from most usage to least usage. And so, I think as time goes on, we are going to see machine learning kind of integrating more and more deeply into clinical use, which is great, but it has to start in research. So, that is not surprising to see a lot of research usage.

When does a model learn? [19:52]

Alright. So the next thing is we wanted to discuss kind of when a model learns, and the answer is that different algorithms learn at different times. So, remember we need to combine data with an algorithm to get a machine learning model, but depending on our choice of algorithms, it is going to be really different. And so, we have a few different kinds of classes of algorithms. There are algorithms that were only during training. And so, there are a few things like logistic regression, classic statistical technique, which learns once when the model is trained, but keep in mind that that model, that data could go back five years. So, all at once, that model might be learning five years worth of data and then applying it to the future data. Random Forest is another popular ensemble-based method that learns once but again five years worth of data could be going through that algorithm in just a handful of minutes and it learns from that. And then finally, clustering and this whole bunch of unsupervised techniques that learn once and then apply their knowledge.

You can also get models to train periodically after new data comes in. so, an example of that might be any of the above but with a more complex implementation, so you can update the models as new data comes in. Naïve Bayes is one that is well suited to that and then also neural networks, again, are very well suited to that just because of the way the standard implementation usually works. And then deep learning as well, which is kind of the same as neural networks but with a more complex architecture.

And then there is this class of models that learns continuously as the new data comes in. And so, again, any of the above could really be used that way but it is going to be a more complex implementation.

When should a model be retrained? [21:50]

So, with that, the question then is when should a model be retrained? Do you need continuous retraining or is it enough to retrain every couple of weeks, every couple of months?

[Levi Thatcher] Fantastic question.

[Michael Mastanduno] And so, reasons to retrain a model is after significant data turnover, maybe rebuild the source data and it no longer agrees with the way the model was originally trained. You should probably retrain the model. Another one is if performance in production drops over time. So when you build your model, you get performance score, how good it is on the past data and then you want to monitor that performance over time in the production environment. So do you know how your model is performing versus when it is trained? And so if you see that dropping down a significant amount as time goes on and maybe it is time to retrain based on the new data and you know, that could be caused by something like seasonality, maybe the holiday season is really different from a data perspective and the model relies heavily on temporal features, or maybe its treatment methods are changing and being adopted and the model is no longer as accurate as it used to be, or if new features or techniques are identified. Maybe you would get much better performance out of in development, out of a feature that you did not have previously and you should retrain and put that in production. And then finally if the use case changes, and you are going to get sick of me saying this but I really want to stress that the use case is the most important thing you can do when building a machine learning model. So, if your use case

changes, your model is probably not the best model anymore and it should be retrained with the new use case in mind.

[Tyler Morgan] Related to that point, somebody asked, "For clinical decision support, are we talking about machine learning or AI?"

[Michael Mastanduno] That is a good question. We could be talking about either, I think. Clinical decisions support, I think, is a great application for traditional machine learning where you get a surgery consult and they say, "Oh, the risk of the surgery is high." And the doctor, the surgeon is coming to that decision based on the patient's age and their health and the benefits they might get about that and kind of a gut feeling that they have around performing that surgery. And so, that would be a great opportunity for a machine learning model to pull everything from the EMR and look at patients who have been successful with that surgery versus patients who have not and give a risk score. You know, the surgeon is not going to look at that risk score and say, "Oh, you have a 75 percent chance of doing fine through the surgery," but it might help guide their decision.

[Levi Thatcher] If you go to one domain (24:52).

[Michael Mastanduno] Yes. Whereas, I think AI for clinical decision support, that is still a little further off as far as which kind of the holistic picture. Basically, I do not think doctors and clinical staff are going to go anywhere anytime soon and AI is not going to be replacing them anytime soon. Maybe we will get a self-driving cafeteria carts but at the moment we are not, yes, the market really is not – even the research is not really focused on clinical decision support for AI.

[Levi Thatcher] So I like how you said that we need a use case. So we need a business question to drive this. And related to that, somebody asked, "What was the best way to sell an executive on using machine learning for this business question that you found?"

[Michael Mastanduno] Well maybe let me go into the pitfall #1 and we can talk about it there.

Pitfall 1: Poorly Defined Use Case [25:48]

So, the pitfall #1 is I put it in front in the center because I think it is the most important thing, the most important reason that machine learning fails, and that is a poorly defined use case. And so, what that is going to lead to is incorrect usage of data fields, you are going to be looking at unavailable data, you are not going to get any adoption. And so, how do you correct that? The use case should always be the first priority. And so, you want to be asking kind of what is the question that this predictive model is going to help answer and what is the impact that answering that question will have. We want to know who the users are, we want to know when the users are going to be using it, and we want to know how they are using it. You have to understand all of these things to be able to implement an effective machine learning model. And so, I think that is one of the reasons that machine learning engineers make – you know, I saw an article recently about how machine learning engineers make good product managers and it is because they think about this kind of stuff if they are doing it right, what is the question, who are the users, how are they using it, and when are they using it. You have to know those things and be able to really sell them to understand if the model is going to make an impact.

[Levi Thatcher] Yes, it seems like that gets to that executive question. So if you really know what business question you are dealing with and you know what business question machine learning will help solve that gets you a long way towards being able to describe the executive why the machine learning could solve it.

[Michael Mastanduno] Yes, and I would say, you know, if you are going to sell it, you can use your data to say, hey look, we have this problem, we can build the machine learning model to affect it or to solve it, and if

we were to have some amount of success, we are going to impact it by this much and here is how we do it. The question is this exact thing, it is going to fit in and be used by this group of clinicians, they are going to use it during patient admit cycle and they are going to use it to benefit their clinical decisions where all the patient is an in-patient. So if you can build up that whole story by really understanding the use case, I think that is a better way to sell it than just saying, you know, hey, machine learning is new and we should do it and let us do it because you will fall into pitfall 1 if you have done that.

[Levi Thatcher] That is a fantastic pitfall.

Pitfall 2: Production Environment is Different [28:25]

[Michael Mastanduno] The second pitfall is that the production environment is different, and I say this because as we saw in that poll question, we had a lot of people saying that most of machine learning usage in hospitals right now is kind of confined to academic one-off studies. And so, I think it is safe to say that usually those studies are going to be coming from a static database or CSV file or some kind of non-changing data set. And you have to remember that when a model is in production, that is going to be quite different because that data set is not static, it is being updated continuously more or less as EMR updates and if you are not ready for those additional challenges, your model is not going to be as good as it was when you are playing with it on your laptop. And so, some of the reasons for that, you know, data might not be available in the production environment, the timing of the data might lead to target leakage, meaning the outcome is hidden in input variables if you are not careful and that can come entirely from timing,

or that your predictions are going to be made multiple times per patient, depending on how you set up your production implementation. And so, the way you can fix those is by knowing your data, you can learn how your data is populated over time. Only train your model with what is available at the time of prediction and you might go back and say, like, oh those are kind of similar to the use case, aren't they? You have to know when the prediction is going to be made to know what is available at the time of prediction. And then learning how your data is populated over time, that is also kind of a, you know, you need to know which features are available, when, and if they are going to matter in the use case that you are trying to accomplish, and I guess that is why I have said, you know, know your use case.

[Levi Thatcher] Super important. Yes. So, speaking of which, folks have asked, who defines that use case or who defines the business question? Is that more the executive or the data scientist or analyst themselves? Any advice on that, friend?

[Michael Mastanduno] Yes, I think that could go – you know, you could do that in a bunch of different ways. I can talk about how it has worked at Catalyst in the past. Sometimes we have a clinical time who is really excited about trying a machine learning use case and they say, oh man, in the heart failure unit, we are having a lot of trouble or we seem to have a lot of readmissions and could we do anything about that from a machine learning perspective. And so, there you go, heart failure readmissions model. The clinical staff is excited about it. That is important for adoption. So, you have met your first hurdle and then you can sit down and really talk about how the model is going to be used and what they want out of it and define that business case really well.

Other ways, sometimes it comes from the top down. Sometimes its executive s just kind of saying, hey, we have a problem with sepsis patients and we would save a lot of money if we could give better care around sepsis. So, aside from latest clinical best practices, what can we do, if that is really a problem for the hospital versus the national standards, I make sense to go over the sepsis model from that perspective. So it can really come from anywhere. But I think knowing the data and kind of like knowing where your, like as an organization, where your shortcomings really are is important to where you are going to be able to make the most impact.

[Levi Thatcher] Knowing the business (32:16) clear advice.

[Michael Mastanduno] Yes. Okay. Let us see.

Pitfall 3: Bad Performance Metrics [32:20]

Moving on to pitfall 3, bad performance metrics. That is another one that is harder. So, traditionally I think I like to think of traditional statistics as doing all sorts of data preparation and important stuff beforehand and then building the statistical models to make predictions or regression predictors and machine learning is the new kid on the block and standards are a little different. So people are quite astringent about data preparation as far as so whether they are causality, and that is fine from a machine learning model perspective but where it bites you is if your performance metrics are bad. So you might have a model that is 99 percent accurate but it did not find any sick people just because there are not very many sick people. And so, that is why you see statistics like positive predictive value being corded in medicine because it is a better metric. So, again, you see these things really creep out when the classes are imbalanced and that is an incredibly common thing in medicine, which we really have to account for in our machine learning models. And then finally, and then the performance changing over time, you need to pick a metric that is going to show drops in performance as time goes on. So if you have a very static metric, you are not going to be able to monitor effectively.

And so, basically the reason or the way you address these is do not use accuracy. Use area under an ROC curve or a precision recall curve as your metrics and there are blog posts on healthcare.ai™ website that you can read about the difference between these things andaccuracy. When you are dealing with imbalanced classes, it makes sense to use sampling methods to either under-sample or over-sample the underrepresented class or overrepresented class and that can help build the model that is more sensitive to the disease state that is less represented and then you want to monitor the correct performance metric over time. So if you are evaluating on precision recall in training, that is probably what you should be watching over time.

[Levi Thatcher] That is a great point that tie in to our questions. So one person asked, "Is it typical to monitor the accuracy of ML models over time? And is that a separate task from the implementation or part of the same thing that is making the predictions?"

[Michael Mastanduno] I think it is typical to monitor over time. It is probably more typical to do it kind of ad hoc just to kind of check into your database every few weeks or so and kind of do it like a snapshot check. Ideally, we would have a continuously monitoring system that would just always be every single day it would be spinning back the accuracy of the model over the last month.

[Levi Thatcher] It is kind of a separate…

[Michael Mastanduno] Yes. And that would definitely be a separate piece of implementation. I think once you have the infrastructure in place, you could do that same infrastructure for any model and it would not be too hard but if you are getting that infrastructure in the first time, it is definitely another step beyond implementation.

And again, know your use case. Different performance metrics apply to different use cases.

Pitfall 4: Poor Adoption [36:03]

So then our last pitfall I wanted to talk about was poor adoption and that basically comes down to maybe do people know about it, is it answering a relevant question, is the visualization done well, and do people trust the model?

So these might seem like simple things but they really can make or break your model as far as whether it is used. And if you are trying to drive outcomes improvements as we are, you really have to pay attention to the stuff. So to make sure people know about it, you can tell them about it. Make sure it is publicized. Make it hard for them to avoid it and that could be done with, you know, I will go to the visualization step and say simple is better and it should not affect the workflow as far as your visualization goes. But let us say you build a predictive model that is going to help prioritize risk of patients, don't you think the patient list should be sorted by that risk factor by default? And I am not saying it is the only way you should sort it but I think if it is that way by default, it is going to encourage people to kind of see those highest risk patients first. And I also have another bullet in here for a plug to knowing the use case because if it is not answering a relevant question, it is not going to get used. And then finally you can improve the trust using prediction explanations or transparent models and that kind of stuff is pretty cutting edge as far as research goes. You know, we had some questions about machine learning being a black box and the conventional algorithms should not be a black box and I think the community is adjusting to that sentiment now. So that will come forward.

Levi, do you have any comments on the pitfalls we have gone over or do you want to go into the question?

[Levi Thatcher] No, those are fantastic. Oh men, super practical stuff and I love the part about know your use case setting. So many people spend a lot of time without a well-defined use case and end up down the road saying, oh, well we have learned the issues of adoption or being able to explain why this is helping anything. So really knowing the use case is half the bottle.

[Michael Mastanduno] Yes, that is a great point.

Poll question What's stopping you? [38:23]

[Levi Thatcher] So, the poll question.

Poll #3: What's impeding you from moving forward with machine learning in your organization? 116 respondents [38:29]

[Tyler Morgan] Thank you. Our next poll question, what's impeding you from moving forward with machine learning in your organization? The available tools are overwhelming OR you don’t know what exists, use cases are overwhelming OR you don’t know what's possible, you don’t have or can't afford the technical staff to implement, or adoption – clinical team isn't interested. We also find other as well.

And we will leave this open for a few moments, again, to get everyone a chance to respond.

[Michael Mastanduno] I guess it might be worth just reiterating over and over again, we say this all the time, but know your use case, know your data, and know your audience. Those are the things you need to think about in practical machine learning. You have to know how it is going to be used. You have to know the data you have that is available to you and you have to know what you are trying to accomplish and how it is going to be done. So I think if you keep those three things kind of as a mantra in your mind, you know, know your use case, know your data, and know your audience, you will have a more successful time after the technical hurdles have been overcome with your models.


[Tyler Morgan] Yes, our responses are 16 percent responded available tools are overwhelming or they don't know it exists, 28 percent responded the use cases are overwhelming or they don’t know what's possible, 23 percent don’t have or can't afford the technical staff to implement, 9 percent responded adoption – clinical team isn't interested, and the 25 percent responding other.

[Michael Mastanduno] That is good to see that at least the adoption question got a low response. I think a phrased these questions in a way that makes them a little harder to respond to as far as available tools and use cases. Maybe there's too many, maybe there's too few, but I will just make up a plug for healthcare.ai™ here and saying that healthcare.ai™ is a machine learning package that weare trying to lower the bar of machine learning implementation. So we are trying to make it easier for data analysts or people without a computer programming background to implement machine learning. So if you have any experience kind of reading directions and kind of figuring out how an example might work and then addressing it to your own use case, it is pretty easy to get up and running with like play data sets that come with the package and then applying it to your own data sets.

[Levi Thatcher] That is a fantastic point. It would be curious for folks that responded with other. If you don’t mind sending the questions along as to what is keeping you from using machine learning that was not listed there, as well as for folks having a hard time with the use case, is it that you cannot find a good one or that you do not know where to start, we would love that feedback.

Potential Applications: ML and EMR [41:15]

[Michael Mastanduno] Great. Thanks, Levi. So let us move in to potential applications that are coming into the home stretch and we want to leave some time for questions.

[Tyler Morgan] Mike, if you hold just a moment, it looks like the screen has frozen for just a minute on the poll question. Let us try to get that fixed for you.

[Michael Mastanduno] Okay.

Actually while we are doing this, while we are getting this together, again, we do have the slides on the handout. Those of you who have downloaded the slides can actually go ahead to slide #18 as you start to talk through that if they have downloaded those slides while we continue to work on this.

There we go. Great. It looks like we have got that up. Thank you.

Alright. So let us move in to potential applications, slide 19, if you are following along. So potential applications, you know, using machine learning and the EMR. That is a great tabular data source with tons of information and we have claims, we have financial, we have clinical and then patient demographic information. It is really a rich data set. And so, there's lots of clinical applications like risk scores for readmissions or mortality, we can do risk-adjusted comparisons in a much smarter way than previously done, we can replace clinical real sets that are based on a single clinical trial from a small set of available data, and we can do correcting diagnosis coding just to make sure that things are working properly behind the scenes, and then there is this whole class of operational examples that people might not have even thought about such as you can forecast your staff needs for ER or based on the time of day and time of year and things like that to do that in a smarter way to help out nurse managers, you could do prediction on length of stay, which would then lead to number of beds you have available and three days from now, and then there's all these financial stuff too like propensity to pay, where should charitable services be allocated to, who should be called to get a reminder on their bills. And I think a lot of these fall into, you know, these are all decision support and a lot of them fall into the idea of clinical decision support and its resource allocation. That is the word I am looking for. So you have these resources, you have sentient humans behind them but they don’t have time to do everything. So, how do you help them get the most further time.

[Levi Thatcher] That is a great point and a question that asks, so a lot of health systems will have basic models that help provide guidance for these sort of use cases. So what would you say to those folks in terms of trying to get leadership to understand and to use machine learning over the basic (44:40) models that you will find in healthcare? Will you go into that in the future (44:45) coming up or…

[Michael Mastanduno] Good question. I don’t think so. Let us talk about it.

[Levi Thatcher] A great question, by the way.

[Michael Mastanduno] So, I think you are getting it kind of the clinical real sets like LACE or SOFA or – yes, how do you get people excited about that? And I guess that machine learning has several benefits over the standardized rule sets and the first is that they come from a wider assortment of data. So you get more information flowing into the prediction. And then the next one is that they are trained on your data. So, maybe the LACE clinical trial was not done at your hospital, chances are your patient population is different than the one that the LACE model was built on. And so, you are going to get a more customized model that is going to get generalized a bit better to your population. And then the final thing is you do not really, you know, if you can build it, you do not have anything to lose because you can compare the models or the machine learning model you build with the LACE metric, you can compare them really well. It is a complete apples to apples

comparison. You can see which one is more effective at making an accurate prediction and then define all benefit is a timing wise. For instance, like the SOFA score which is a sepsis, the SOFA score for sepsis, that is only really available after a patient is discharged to get all the features that are needed to get that score. So qSOFA is what people use because that is a little faster but it is still available only towards the end of a patient's stay. And so, for LACE, that is especially the case, right?

[Levi Thatcher] Yes.

[Michael Mastanduno] So yes, a couple of them when I think of just to reiterate, the two I focus on is that it is training a model on your patient set and available at the time that you need it.

[Levi Thatcher] Yes. That is exactly right.

Potential Applications: NLP or Smarter Analytics [46:59]

[Michael Mastanduno] Okay. So then what would be potential applications of NLP or Natural Language Processing or maybe smart analytics? And so, I think we have not talked about this yet so far but it does kind of fall under the main umbrella of this data science machine learning AI, natural language processing, predictive analytics, all of these things can, you know, they are similar. And so, what are some applications of those techniques if maybe that is what you are interested in. And so,

one would be parsing clinical notes to fill in discrete text fields automatically and populate the EMR or finding new features from models that only come up in conversation and then get written down. You could do smart retrospective analysis by looking at trends and exploration across the whole EMR and kind of serving up insights automatically with the idea being that a data analyst would not have to go looking through all the different filter combinations before they found a population that could use some intervention and rather have software serve that up. And I am really excited about developing that in healthcarce.ai. So that is where we are going.

Potential Applications: Image Processing [48:10]

And then finally I will get into potential application of image processing and I think that this is probably the most common use of artificial intelligence in medicine right now, if that is what you call artificial intelligence. But you could do things like diagnostics of pre-segmented suspicious regions on a lung CT or a chest CT and those are the kind of things that have been done in academic studies and published in journals but not productionalized. You could do automatic segmentation of tissue types from brain MRI or from a CT. And then diagnosis or staging of screening images, again, which kind of have the resource allocation idea, sorting the most highest priority cases to review first. And then all of those radiology, my background is in radiology, so I am partial to that, but these all apply to pathology as well. You are not going to replace the pathologist or radiologist but you might help them make their decisions a little bit more efficiently.

Poll Question Most valuable use? [49:20]

So we have another poll question.

[Tyler Morgan] Alright. We have had some issues with the poll questions. Let us see. Let us try to do this after we showed our last slide. So if you can show slide 24 talk through the healthcare.ai™. Thenwe will try those poll questions. So that way, we are not interrupting the rest of your presentation.

[Michael Mastanduno] Sounds good, Tyler.

Okay. So the last slide is just a plug for the healthcare.ai™ package and effort. This is ascreenshot from the website. It is at www.healthcare.ai™ and you can find everything you needto get started from what you can do to how it is tailored to healthcare. It is open source – meaning it is free to download the code, free to use it, you can even contribute to the code if you want to. We have our package in the language R and the package in the language Python. So you can even use the tool that you are more familiar with. In addition to that, so the packages, you can get them there or at the bottom. And in addition to that, we would encourage you to subscribe to the blog, the weekly broadcasts, or the Slack group. We are trying to build a community around this healthcare machine learning so that we can – you know, our motto has been "the rising tide raises all boats." So, we want to emerge as a leader in machine learning and joining our community, you can get help, you can see us more often if you are not sick of us already, and you will be great to build that community.

http://www.healthcare.ai/

So, Tyler, we can move, or Levi?

[Levi Thatcher] No, I was just going to say, yes, great job, Mike. We love to work together rather than having everybody working in silos bringing up things on their own.

[Tyler Morgan] We also like to say on those weekly broadcasts, Mike and Levi actually get on that weekly broadcast YouTube and talk about in more depth the machine learning concepts that they brought up here today.

So let us go into the poll questions. And while we put the poll questions up, we can also respond to some questions as well as we are getting close to our time today.

Poll #4: What's the most valuable use for ML/AI/Big Data to your organization? 95 respondents [51:17]

So, this next poll question is what's the most valuable use for machine learning/AI, Big Data to your organization? Is it parsing free-form clinical notes, image interpretation, clinical risk scores, operational efficiency, or these are buzz words and not worth the time.

So we will leave that up for a moment and give you guys a chance to respond to that. And while responding to that, let us see some of the questions that are coming in.

Questions? [51:39]

[Levi Thatcher] Yes. So one person asks if we have tried the machine learning emergency rooms and I guess I will turn over to you in a second but I feel like the data is the limiter in this question. So if you have data falling in right after admit to the ED, then, Mike, it seems like you could train tomorrow and make predictions as to the ones likely to have sepsis or not.

[Michael Mastanduno] Yes. Definitely. I think you come up to some technical hurdles there because a lot of times the data – you need a more or less real time data source to be able to get the prediction out for a patient who just walked in. So that is a little more challenging in the ED and we have not had any luck with that and we have focused more on the inpatient suite just because there is quite a lot of opportunity there.

[Levi Thatcher] But after someone has been admitted.

[Michael Mastanduno] Yes, after someone's this data pipelining stuff becomes more real time. We are definitely excited to get into ED.

[Tyler Morgan] Alright. And after this poll, it looks like the most popular response was clinical risk scores as being the most valuable use in your organization with operational efficiency coming in second place there.

[Michael Mastanduno] That is really interesting to see. I think clinical risk scores are they are probably not the sexiest of the things up there. It is pretty exciting to hear about mammography screening being automated from an image analysis standpoint but it is really cool to see that the community wants the practical stuff.

[Levi Thatcher] (53:13)

[Michael Mastanduno] …what we are trying to provide.

[Levi Thatcher] Yes, that is beautiful. Nice little feedback really.

Poll #5: If there was an algorithm that was FDA approved and read mammographic images on par with a radiologist, would you use it? 90 respondents

[Tyler Morgan] Alright. And this poll question. If there was an algorithm that was FDA approved and read mammographic images on par with a radiologist, would you use it?

[Michael Mastanduno] And I think we are going to get a preview or a discretion based on the last one but maybe this question can be used kind of as a feel for the market, you know, do you trust machine learning at all because I think that this FDA approval is something we are going to have to overcome in the near future, I think, as these algorithms get more powerful and the opportunity becomes great and we are going to have to start regulating them as more than just clinical decision support.


Alright. So, 81 percent responded that, yes, they would use it but only as an aid to the radiologist, 16 percent would trust it completely, and 3 percent say no, they would not trust it.

[Michael Mastanduno] And I think that is definitely telling of kind of where we are in the good case now, you know. It is a stepping stone. People are okay with it as just the stepping stone to being completely autonomous. And whether or not we get to being completely autonomous, I am not sure but I think it will still make an impact if we are to use it as people suggest here.

[Levi Thatcher] Yes, one step at a time. Alright.

Before we end… [54:43]

[Tyler Morgan] Now, we are almost out of time. Before we finish, I do have one last poll question for everyone.

Would you like someone from our sales organization to contact you for a product demonstration of our solutions? [54:48]

So our webinars are meant to be educational about various aspects affecting outcomes improvement in our industry, particularly from a data warehousing and analytics perspective, machine learning perspective. Today, we had many requests, however, for more information about what Health Catalyst® does and what our products. If you would likesomeone from our sales organization to contact you for product demonstration of our solutions, please respond to this poll question.

We will leave this up and we will see. It looks like that we may have time for two more questions. Let us look through here, so we can find two more questions to respond to.

Alright. We have got a question here. So, considering machine learning learns from your historical data and the patterns within, how do you use machine learning when you are trying to change clinical practice?

[Michael Mastanduno]

That is a great question. You know, that really drives to the heart of the issue, what we are trying to do here, and I guess my answer would be that knowledge is power. So if you have a really strong analysis of historical data that has kind of been the same for the last few years and maybe you are seeing a high-rated machine rate, but then you do this machine learning and get a risk score for every patient, you know, you are going to get some patients but the physician says, "Oh, that is interesting, I wouldn't have expected that. Why?" And that is going to change their behavior and sometimes that is going to lead to preventing a readmission. Sometimes it is not but I think you will find that habits change and we like to think that outcomes will improve.

[Levi Thatcher] Yes, we have seen as much. So there is another question here that was exciting, talking about, is there a cheap quick gains in operational realm and finance realms compared to clinical? I think we have…

[Michael Mastanduno] That is an interesting question too, as far as which is the fastest to see gains. I would say that probably it is fastest to do operational than financial just because you get feedback sooner. You know, like with 30-day readmissions for instance, you have to wait 30 days before you have even started evaluating the performance of the model in production. Well, let me take that back. I will say some, you know, depending on the use case, clinical could be really fast.

[Levi Thatcher] Yes. Like with CLABSI prediction, for example.

[Michael Mastanduno] Yes, that is quick feedback. It is a lot of patience. You get a lot of data quickly. So, CLABSI, you know, you are going to get predictions that you can to validate quickly but then implementing the clinical process, it might be a little harder just because it is spread across more people.

[Tyler Morgan] Alright. Well we reached our time, gentlemen. Thanks everybody for joining us. I did want to respond to just one last question. There was a question that says is healthcare.ai™ amachine learning package open source? The answer is yes, it is an open source community. That is why we would love to have everyone join and take part in the community that we have healthcare.ai™.

So we would like to thank everyone for participating. Shortly after this webinar, you will receive an email with links to the recording of the webinar, the presentation slides, and all the links shared in today's session. Also, please look forward to the transcript notification we will send you once that is ready. On behalf of Mike Mastanduno, Levi Thatcher, as well as the rest of us here at Health Catalyst®, thank you for joining us today. This webinar is nowconcluded.

[END OF TRANSCRIPT]

Documents

Machine Learning Misconceptions May 3, 2017 · Thanks Tyler. So my name is Mike Mastanduno. Levi Thatcher will join me. Levi. [Levi Thatcher, Director of Data Science] Yes. I am Levi