Many industries, especially those using huge amounts of data like Facebook, are using Hadoop for their processing needs. So, what exactly is Big Data and Hadoop and what are its implications for healthcare? Hadoop is a distributed processing and storage platform. The use of Hadoop is rare in the healthcare industry, but healthcare analytics hasn’t necessarily been stalled because of this. In fact, the quality of data healthcare produces doesn’t justify Hadoop-level of processing power. This article answers questions such as what is Hadoop, what are the drivers of this platform in other industries, how might it affect healthcare analytics, how would clinicians use data sources outside their environment, and what drawbacks currently exist for further adoption.
Citation preview
1. Hadoop in Healthcare A No-nonsense Q & A 2014 Health
Catalyst www.healthcatalyst.com Proprietary. Feel free to share but
we would appreciate a Health Catalyst citation. 2014 Health
Catalyst www.healthcatalyst.com Proprietary. Feel free to share but
we would appreciate a Health Catalyst citation. By Jared Crapo
2. 2014 Health Catalyst www.healthcatalyst.com Hadoop in
Healthcare Hadoop is used in all kinds of applications like
Facebook and LinkedIn. The potential for Big Data and Hadoop in
healthcare and managing healthcare data is exciting, butas of
yethas not been fully realized. Proprietary. Feel free to share but
we would appreciate a Health Catalyst citation.
3. Although healthcare analytics havent yet been hampered by
hospital systems not using Hadoop, it never hurts to look forward
and consider the possibilities. Hadoop is an indispensable tool for
efficiently storing and processing large quantities of data. Its
unique capabilities will offer new ways of thinking about how we
use healthcare data and analytics to provide improved patient care
at reduced costs. What follows is a Q & A on Hadoop and its
implications for the future of healthcare. 2014 Health Catalyst
www.healthcatalyst.com Hadoop in Healthcare Proprietary. Feel free
to share but we would appreciate a Health Catalyst citation.
4. 2014 Health Catalyst www.healthcatalyst.com What is Hadoop?
1 Hadoop is an open-source distributed data storage and analysis
application that was developed by Yahoo! based on research papers
published by Google. Hadoop implements Googles MapReduce algorithm
by divvying up a large query into many parts, sending those
respective parts to many different processing nodes, and then
combining the results from each node. Proprietary. Feel free to
share but we would appreciate a Health Catalyst citation. QUESTIONS
HADOOP
5. 2014 Health Catalyst www.healthcatalyst.com 1 What is
Hadoop? Hadoop also refers to the tools and software that works
with and enhances Hadoops core storage and processing components:
Proprietary. Feel free to share but we would appreciate a Health
Catalyst citation. QUESTIONS HADOOP Hive a SQL-like query language
for Hadoop Pig a high-level query language for MapReduce HBase a
columnar data store that runs on top of the Hadoop distributed file
storage mechanism Spark general purpose cluster computing
framework
6. What are some key reasons to adopt Hadoop? 2014 Health
Catalyst www.healthcatalyst.com 2 Large companies are moving to
Hadoop for generally two reasons: 1. Enormous data sets 2. Costs
Proprietary. Feel free to share but we would appreciate a Health
Catalyst citation. QUESTIONS HADOOP For example, Yahoo! implemented
42,000 nodes in several different Hadoop clusters with a combined
capacity of about 200 petabytes (200,000 terabytes).
7. What are some key reasons to adopt Hadoop? 2014 Health
Catalyst www.healthcatalyst.com Proprietary. Feel free to share but
we would appreciate a Health Catalyst citation. QUESTIONS 2 HADOOP
Even if existing database applications could accommodate these
large data sets, the cost of typical enterprise hardware and disk
storage becomes prohibitive. Hadoop was designed from the beginning
to run on commodity hardware which substantially reduces the need
for expensive hardware infrastructure. Because Hadoop is open
source, there are no licensing fees for the software either,
another substantial savings.
8. How will Hadoop impact and/or change healthcare analytics?
2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free
to share but we would appreciate a Health Catalyst citation.
QUESTIONS 3 HADOOP Hadoop has been called the most significant data
processing platform for big data analytics in healthcare. Using
Hadoop, researchers can now use data sets that were traditionally
impossible to handle. A team in Colorado is correlating air quality
data with asthma admissions. Life sciences companies use genomic
and proteomic data to speed drug development.
9. How will Hadoop impact and/or change healthcare analytics?
2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free
to share but we would appreciate a Health Catalyst citation.
QUESTIONS 3 HADOOP Healthcare analytics is generally not held back
by the capability of the data processing platforms. There are a few
exceptions in life sciences. But for most healthcare providers, the
limiting factor is the willingness and ability let data inform and
change the way care is delivered. Today, it takes more than a
decade for compelling clinical evidence to become common clinical
practice. Its not how much data you have that matters, but how you
use it.
10. How will clinicians use outside data sources? 2014 Health
Catalyst www.healthcatalyst.com Proprietary. Feel free to share but
we would appreciate a Health Catalyst citation. QUESTIONS 4 HADOOP
Data from other clinical providers in your geography can be very
useful. Claims data give a broad picture but not a deep one. Data
from other non-traditional sources also has surprising relevance;
in some cases, its a better predictor than clinical data. For
example: EPA data on geographical toxic chemical load adds
additional insight to cancer rates for long-term residents. The
CMS-HCC risk adjustment model can help providers understand why
patients in their area seem to have higher or lower risk for
certain disease conditions. Household size of one increases the
risk of readmissions because there is no other caregiver in the
home.
11. 2014 Health Catalyst www.healthcatalyst.com What are the
drawbacks of Hadoop? Proprietary. Feel free to share but we would
appreciate a Health Catalyst citation. QUESTIONS 5 HADOOP What do
CTOs, CIOs and other IT leaders need to consider? Hadoop is very
young technology and the capabilities and tools are relatively
immature. So too are the number of people who have Hadoop
experience. Competition for these resources will be large
technology and financial services companies. People with Hadoop
experience are in high demand.
12. 2014 Health Catalyst www.healthcatalyst.com What are the
drawbacks of Hadoop? Proprietary. Feel free to share but we would
appreciate a Health Catalyst citation. QUESTIONS 5 HADOOP You
should also consider alternate hardware maintenance schemes. Hadoop
was designed for commodity hardware which generally experienced
higher failure rates. Instead of purchasing hardware maintenance
you should plan to have spare nodes on standby. The good news is
that commercial database vendors, including Microsoft, Oracle, and
Teradata, are all racing to integrate Hadoop into their
offerings.
13. Where is Hadoop headed and how will it impact big data?
2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free
to share but we would appreciate a Health Catalyst citation.
QUESTIONS 6 HADOOP Fifteen years ago, we didnt capture data unless
we knew we needed it. The cost to capture and store it was just too
high. Fifteen years from now, reductions in the cost to capture and
store data will likely mean that we will capture and store
everything. Hadoop is a huge leap forward in our ability to
efficiently store and process large quantities of data and allows
creative thinking about how to apply the resulting answers in a
meaningful and useful way.
14. 2014 Health Catalyst www.healthcatalyst.com More about this
topic Five Reasons Healthcare Data Is Different Dan LeSueur, Vice
President, Technical Operations Big Data in Healthcare: Separating
the Hype from Reality Jared Crapo, Vice President In Healthcare
Predictive Analytics, Sometimes Big Data Is a Big Mess David
Crockett, Senior Director, Research and Predictive Analytics Data
Alone Is Not Enough: A Clinical Perspective (free, on-demand
webinar, transcript, and slides) Dale Sanders, Senior Vice
President, Strategy and John Kenagy, MD Using Healthcare Data:
Healthcare Analytics Adoption Model (white paper) Dale Sanders,
Senior Vice President, Strategy Proprietary. Feel free to share but
we would appreciate a Health Catalyst citation.
15. 2014 Health Catalyst www.healthcatalyst.com For more
information: Proprietary. Feel free to share but we would
appreciate a Health Catalyst citation.
16. Other Clinical Quality Improvement Resources 2013 Health
Catalyst www.healthcatalyst.com Click to read additional
information at www.healthcatalyst.com Jared Crapo joined Health
Catalyst in February 2013 as a Vice President. Prior to coming to
Catalyst, he worked for Medicity as the Chief of Staff to the CEO.
During his tenure at Medicity, he was also the Director of Product
Management and the Director of Product Strategy. Jared co-founded
Allviant, a spin-out of Medicity, that created consumer health
management tools. In his early career, he developed physician
accounting systems and health claims payment systems.