Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Data Analytics!
Susan Fenton, RHIA, PhD; UTHealth School of Biomedical Informatics Gene Hill, BBA, MA; April Salisbury, MBA-HC; LCF Research/NMHIC
Disclaimer
• This material is designed and provided to communicate information about data
analytics in an educational format and manner.
• The author is not providing or offering legal advice but, rather, practical and useful
information and tools to achieve compliant results in the area of data analytics.
• Every reasonable effort has been taken to ensure that the educational information
provided is accurate and useful.
• Applying best practice solutions and achieving results will vary in each
hospital/facility and clinical situation.
• All material in this workshop was adapted from materials developed by The
University of Texas Health Science Center at Houston, funded by the Department
of Health and Human Services, Office of the National Coordinator for Health
Information Technology under Award Number 90WT0006.
• This work is licensed under the Creative Commons Attribution-NonCommercial-
ShareAlike 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-sa/4.0/.
Goals/Objectives or Agenda
• Determine the essential skills for effective healthcare data analysis
• Articulate different data types and appropriate uses for each
• Compare and contrast data analytic types and tools
• Practice data analytics skills
Skills Needed
• Soft Skills
• Curiosity
• Critical Thinking
• Listening
• Technical Skills
• Understand Data
• Basic Stats
• Communication
Introduction to Analytics Definition
Types of analytics ◦Descriptive
◦Diagnostic
◦Predictive
◦Prescriptive
What is Analytics? “The discovery of meaningful patterns in data, and is one of the steps in the data life cycle of collection of raw data, preparation of information, analysis of patterns to synthesize knowledge, and action to produce value.”
NIST Big Data. (2015) Public Working Group Definitions and Taxonomies Subgroup. Retrieved from http://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf. http://dx.doi.org/10.6028/NIST.SP.1500-1
What is Analytics?
Entire process of data collection, extraction, transformation, analysis, interpretation, and reporting
"Data visualization process v1" by Farcaster at English Wikipedia.
Licensed under CC BY-SA 3.0 via Commons
https://commons.wikimedia.org/wiki/File:Data_visualization_process_
v1.png#/media/File:Data_visualization_process_v1.png
What is Analytics?
“Analytics is used to refer to the methods, their implementations in tools, and the results of the use of the tools as interpreted by the practitioner.”
NIST Big Data. (2015) Public Working Group Definitions and Taxonomies Subgroup. Retrieved from http://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf. http://dx.doi.org/10.6028/NIST.SP.1500-1
The analytics process is the synthesis of knowledge from information.
Types of Analytics: Overview Descriptive: uses business intelligence and data mining to ask: “What has happened?”
Diagnostic: examines data to answer “Why did it happen?” Gartner. (n.d.) Gartner IT Glossary: Diagnostic Analytics. Retrieved 2/21/2016 from http://www.gartner.com/it-glossary/diagnostic-analytics.
Predictive: uses statistical models and forecasts to ask: “What could happen?”
Prescriptive: uses optimization and simulation to ask: “What should we do?”
IBM Software. (2013). Descriptive, predictive, prescriptive: Transforming asset and facilities management with analytics. Retrieved 2/21/2016 from http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=TIW14162USEN.
Types of Analytics: Overview
http://www.gartner.com/it-glossary/predictive-analytics
Descriptive Analytics
Describe the data
Common statistics: ◦ counts
◦ averages
Typical reporting methods: ◦ Tables
◦ Pie charts
◦ Column / bar charts
◦ Written narratives
http://www.gartner.com/it-glossary/predictive-analytics
Diagnostic Analytics
Attempts to answer “why did it happen?”
Drill-down techniques
Data discovery
Correlations http://www.gartner.com/it-glossary/predictive-analytics
Predictive Analytics Predicts instead of describing or classifying
Rapid analysis
Relevant insights
Ease of use
http://www.gartner.com/it-glossary/predictive-analytics
What Predictive Analytics Cannot Do
“The purpose of predictive analytics is NOT to tell you what will happen in the future. It cannot do that. In fact, no analytics can do that. Predictive analytics can only forecast what might happen in the future, because all predictive analytics are probabilistic in nature.”
◦ Michael Wu as quoted by Jeff Bertolucci in Big Data Analytics: Descriptive vs.
Predictive vs. Prescriptive. Information Week. December 31, 2014, para 13. Retrieved from http://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-id/1113279.
Prescriptive Analytics
Examines data or content to answer the question “What should be done?” or “What can we do to make _______ happen?
Is characterized by techniques such as ◦ graph analysis ◦ simulation ◦ complex event processing ◦ neural networks ◦ recommendation engines ◦ heuristics ◦ machine learning
http://www.gartner.com/it-glossary/prescriptive-analytics
Steps in Data Analytics 1. Identify the problem and the stakeholders
2. Identify what data are needed and where those data are located
3. Develop a plan for analysis and a plan for retrieval
4. Extract / transform/ load the data
5. Check, clean, and prepare the data for analysis
6. Analyze and interpret the data
7. Visualize the data
8. Disseminate the new knowledge
9. Implement the knowledge into the organization
1. Identify the Problem or Question and the Stakeholders
Why is this an important problem?
How will the results impact patient care or the institution?
What is the business case?
Who are the stakeholders?
2. Identify what data are needed
What data elements, such as date of birth, gender, medications, laboratory results, and so on are needed?
Where are these data elements located – in what system or systems and what database tables?
Is there a clinical data warehouse?
Who is the contact person for each system who will be responsible for retrieving the data?
3. Develop plans for retrieval and analysis
Retrieval
Enlist database administrator for each system
Develop specific plan for retrieving the required data elements
Method for cross-checking number of records as well as completeness – how many should you expect and did you get everything?
Analysis
Enlist statistician
Identify population, sample size, statistical tests to be performed
4. Extract / Transform/ Load (ETL) Process
Extraction
May be an iterative process
The data are retrieved
Checked for completeness
Descriptive statistics
Errors corrected, empty fields addressed
Transformation
Data synchronized (“transformed”) – e.g. M, F, U vs 1, 2, 9
Loading
Data then imported into destination system
5. Check, clean, and prepare the data
Data are now in the system where analysis will be run
Should be a complete set of data
Need to check that everything is ready for analysis
Descriptive statistics
Double-check problem or question being investigated
Double-check against analysis plan
6. Analyze and interpret the data
Use the data analysis plan
Perform the actual statistical analyses as described in the plan
Consult with statistician to confirm interpretations and conclusions
7. Visualize the Data Nominal (categorical) data: column or bar charts, tables, pie charts, pivot tables
Quantitative data: histograms, scatter plots, star plots
Examples of tools
◦ Microsoft® Excel Chart function
◦ Tableau®
Pie chart https://commons.wikimedia.org/wiki/File: Charts_SVG_Example_5_-_Simple_Pie_Chart.svg
Histogram https://commons.wikimedia.org/wiki/Histogram#/ media/File:Histogram_example.svg
8 & 9: Disseminating and Implementing
Disseminating the new knowledge
Write up the findings
Disseminate to the stakeholders
Implementing the new knowledge
Requires participation of stakeholders
Data, Information, Knowledge, Wisdom Hierarchy Data: symbols, facts, and measurements Information: data processed to be useful; provides the “who, what, when, where” Knowledge: application of data and information; provides the “how” Wisdom: evaluated understanding; provides the “why”
Wisdom
Knowledge
Information
Data
Types of Data in an EHR Quantitative data (eg, laboratory values)
Qualitative data (eg, text-based documents and demographics)
Transactional data (eg, a record of medication delivery).
Murdoch, T. B., & Detsky, A. S. (2013). The inevitable application of big data to health care. Jama, 309(13), 1351-1352.
Understanding the Data: Scales of Measure
Data come in many forms, and those forms determine what can or cannot be done with the data.
For example, two patient names cannot be added together.
Likewise, interpreting the relative distance between two measurements can only be done with certain kinds of data and not others.
There are four scales: Nominal, ordinal, interval, and ratio.
Scales of Measure: Nominal From Latin
Names, labels, categories
Examples: ◦ Patient names (John Doe, Maria Garcia) ◦ Drug names (Ampicillin, Valium) ◦ Eye color (blue, brown, green, gray) ◦ Gender: male, female, unknown ◦ Religious preference (Catholic, Jewish, none)
May be mapped to a number in a database ◦ Example: brown eyes=1, blue eyes=2
Eye image from https://commons.wikimedia.org/wiki/File:Deep_Blue_eye.jpg
Scales of Measure: Ordinal Includes all properties of Nominal (so Ordinal data all have a name of some sort)
Example: first, second, third, i.e., a ranking or order
But intervals are not necessarily equal
http://www.cdc.gov/growthcharts/
Photo by Paul Kehrer.
https://www.flickr.com/photos/paulkehrer/3659279740
Creative Commons Attribution 2.0 Generic (CC BY 2.0) license.
Scales of Measure: Interval and Ratio
Continuous
Has equal intervals; Ratio also has absolute zero.
Examples: distance, length, temperature, weight
Includes properties of Nominal and Ordinal
May be grouped together in one category called “scale”
https://commons.wikimedia.org/wiki/
File:Soft_ruler.jpg
"Clinical thermometer 38.7" by Menchi - Own work.
Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:Clinical_thermomet
er_38.7.JPG#/media/File:Clinical_thermometer_38.7.JPG
Check Understanding
Zip Code
Blood Pressure
Heart Failure Classification I, II, III, IV
Age
Ethnicity
Marital Status
Length of Stay
Discharge Disposition (home, SNF, and so on)
Weight
Level of Education
Data Inconsistencies Inconsistent naming conventions, such as “systolic blood pressure” versus “blood pressure, systolic”
Inconsistent definitions, such as how the date of admission is defined across departments;
Varying field lengths for the same data element, such as one system allowing a patient’s last name to be up to 50 characters while another system allows 25 characters
Varied data elements, such as M, F, or U for patient gender in one system while another system uses 1, 2, or 9 or Male, Female, or Unknown.
[AHIMA. "Managing a Data Dictionary." Journal of AHIMA 83, no.1 (January 2012): 48-52. Retrieved from http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049331.hcsp?dDocName=bok1_049331]
Data Dictionaries The first step to understanding the data you are working with
Synthetic data set
Data Dictionaries The first step to understanding the data you are working with
“a standard definition of data elements”.
Health Information Management Systems Society (HIMSS). (November, 2014). Clinical & Business Intelligence: An Analytics Executive Review Needs Assessment. Retrieved 2/21/16 from http://www.himss.org/ResourceLibrary/genResourceDetailPDF.aspx?ItemNumber=34692
Data Dictionaries
Common Terms Used in Statistical Analysis
Population
Sample
Paired samples
Data set
Descriptive statistics
Frequency table
Histogram
Chi square
T-Test
Correlation vs. causation
Term: Population A group of things that have something in common
Examples: ◦ Patients in a particular hospital
◦ Patients with a certain diagnosis
◦ Patients with a particular attribute (gender, smoking status, age group)
◦ Patients who had a certain surgical procedure in a given year by a specific surgeon
Term: Sample A representative portion or subset of a group of things – part of a population
Example population: babies born in the United States in 2015
Example sample: a selection of those babies
Paired samples: before-and-after studies, or matched on one or more characteristics
Image credit: Kernler, D. Simple Random Sampling. Retrieved from https://commons.wikimedia.org/wiki/File:Simple_random_sampling.PNG. Licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
Confidence Intervals How well does a sample approximate the entire population?
Often set at 95%
The resulting intervals would bracket the true population parameter in approximately 95 % of the cases
NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm
Data Set
A data set is a collection of data for a specific purpose. For this presentation, for example, the data set is a collection of 500 records that consists of age, gender, state of residence, marital status, blood type, weight, eye color, and smoking status.
Descriptive Statistics Basic overview of the data
Excel: Data Data Analysis Descriptive Statistics
Should be among the first analyses done on a set of data
Can identify some errors
Mean (average), number of records (count), range of values, maximum and minimum values
Patient Weights
Mean 189.1554
Standard Error 2.916985099
Median 180.6
Mode 192.3
Standard Deviation 65.2257697
Sample Variance 4254.401033
Kurtosis 8.86101958
Skewness 2.554839369
Range 475.6
Minimum 89.4
Maximum 565
Sum 94577.7
Count 500
Confidence Level(95.0%) 5.731086356
Measures of Central Tendency
Mean – arithmetic average of an interval or ratio; very sensitive to outliers
Median – midpoint of a frequency, with 50% of the observations above and 50% of the observations below
Mode – the most frequent observation(s) in a frequency; may not be unique; can be used with nominal data
1, 1, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 9, 10
Measures of Variation
Range – difference between the smallest and largest values in a frequency; simple measure of spread; can also be affected by extreme values or outliers
Variance – amount of variation of all values or scores for a variable; average of the squared deviations from the mean (variance not meaningful at descriptive level)
Standard deviation – amount of dispersion around the mean; square root of the variance; most widely used measure of variability in descriptive statistics
Inferential Statistics Inferring from sample to population and draw conclusions
Depend upon ◦ Data type
◦ Parametric vs. nonparametric data
Chi-square – categorical data
Correlation – continuous data – relationships
ANOVA – continuous data – difference in mean
Correlation and Causation Correlation: relationship between two things
Causation: one causes another
Correlation does not equal causation
The Potential of Big Data in Healthcare
1. Expand capacity to generate new knowledge ◦ the effectiveness of treatments [Schneeweiss, 2014] ◦ the prediction of outcomes [Schneeweiss, 2014]
2. Knowledge dissemination
3. Using analytics to combine EHR and genomic data to translate personalized medicine to clinical practice
4. Deliver information directly to patients and increase patient participation in their health care
Murdoch, T. B., & Detsky, A. S. (2013). The inevitable application of big data to health care. Jama, 309(13), 1351-1352.
Schneeweiss, S. (2014). Learning from big health care data. New England Journal of Medicine, 370(23), 2161-2163.
What is Big Data? Characteristics of big data:
◦ Volume (i.e., the size of the dataset) ◦ Variety (i.e., data from multiple repositories, domains, or
types) ◦ Velocity (i.e., rate of flow) ◦ Variability (i.e., the change in other characteristics) ◦ Value (i.e., is the cost worth it?)
Traditional data architectures (such as typical relational databases) cannot handle this type of data
New architectures are required
Source: NIST Special Publication 1500-1. NIST Big Data Interoperability Framework: Volume 1, Definitions. Final Version 1, page 4. NIST Big Data Public Working Group Definitions and Taxonomies Subgroup. Retrieved from http://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf http://dx.doi.org/10.6028/NIST.SP.1500-1
Tools Hadoop
◦ Runs on clusters of hardware
MongoDB ◦ Stores data using documents with fields
NoSQL utilities
Requirements For Analytics for Learning Systems
A way to ensure that patient groups being compared are truly similar
Automated tools for analysis
Ability to rapidly run automated tools against new data
Software that can be used with little training and helps prevent errors in interpretation
Easily understood results
Schneeweiss, S. (2014). Learning from big health care data. New England Journal of Medicine, 370(23), 2161-2163.
Challenges Facing Biomedical Big Data
Amount of information
Lack of organization
Lack of access to data and tools
Insufficient training in data science methods
National Institutes of Health. What is Big Data? Retrieved from http://datascience.nih.gov/bd2k/about/what, para 3.
Summary Types of data.
Technology or tools for working with different data types.
Determine whether data fits the definition of Big Data.
Challenges faced when working with Big Data.
Common terms used in data analysis, such as sample, paired, histogram, population, correlation vs. causation, and descriptive.
Communicating and Data Analytics
Visualization Objectives 1. Select the best data communication mode, given the analysis goals and results.
2. Interpret data analysis results.
3. Present solutions for a variety of technical data communication challenges.
4. Prepare a simple data visualization using open-source tools.
5. Participate in the design and development of a complex data visualization.
Communication Basics Delineate the Problem
Define Your Audience
Choose the Right Mode
Choose the Right Words
Choose Supporting Visuals
Have an “Elevator” Speech
http://www.aaas.org/pes/communication-101-communication-basics-scientists-and-engineers
Delineate the Problem What are you trying to do with this communication?
Is there a particular timeframe involved?
Does the number of people matter?
Define Your Audience Colleagues or co-workers
Staff or supervisors
Subject matter experts
Other scientists
Journalists
Policymakers
Others
Choose the Right Mode Brainstorm the possible communication channels
◦ Website or blog
◦ Podcast or YouTube
◦ Peer-reviewed manuscript
◦ Conference presentation
◦ Dashboard
Choose the Right Words For the audience
Be careful of acronyms (ONC, EHR or MU, as examples)
Jargon can hard to follow
Short words and short sentences
Not too many!
Data Visualization
show the data.
induce the viewer to think about the substance rather than the methodology, graphic design, the technology, or other things.
avoid distorting what the data have to say.
present many numbers in a small space.
make large data sets coherent.
encourage the eye to compare different pieces of data.
reveal the data at several levels of detail.
serve a reasonably clear purpose.
are closely integrated with the statistical and verbal descriptions of the data set. (Tufte)
Bar Charts
Show comparisons between groups
Can be vertical (aka column charts) or horizontal
Can be a histogram (next slide)
Can be Pareto (most to least)
Histogram
This sample is from fictional data.
Stacked Bar Graph
0%
20%
40%
60%
80%
100%
Yr1 Yr2 Yr3 Yr4
20% 25% 30%
10%
40% 25%
40%
25%
30% 40%
20%
45%
5% 5% 5% 10%
5% 5% 5% 10%
Student Grade Distribution
F
D
C
B
A
This graph is total fiction. It does not represent actual grade distribution.
Line Charts
Used for large amounts of data occurring over time
Pie Charts (or Shape Charts) Display data as a proportion of a whole
No axes
Can explode out for emphasis
Can be any shape
Polar or Radar Charts
Multiple series or categories of data
Larger values are farther from the center
Scatterplots or scatter charts Values represented as a series of points on a chart
Distributions of values and clusters of data
Displaying and comparing numerical data
Stanfill, M. H., Williams, M., Fenton, S. H., Jenders, R. A., & Hersh, W. R. (2010). A systematic literature review of automated clinical coding and classification systems. Journal of the American Medical Informatics Association: JAMIA, 17(6), 646–651. http://doi.org/10.1136/jamia.2009.001024
Display in Action! www.texmed.org/WorkArea/DownloadAsset.aspx?id=24815
The “Elevator” Speech Can you tell the story in the time it takes to ride the elevator?
Three main points
Meaningful
Easy to understand
Summary Effective data communication requires thought and planning
All communication is not equal for all audiences
The visual presentation is particularly important, but can be more difficult
Have an “elevator” speech
Working with Data
Prepare! https://uth.instructure.com/courses/27078/pages/16-activity?module_item_id=269525
Download each dataset.
Exercise Objectives Describe reasons why data need to be cleaned or modified before analysis
Demonstrate ability to identify and correct basic errors in data
Demonstrate ability to perform descriptive statistics
Demonstrate ability to use pivot tables
Describe the relationship between a database in an HIT system and data analysis tools
Technologies and Tools Common technologies and tools used for data analytics include: ◦ Spreadsheet programs such as Microsoft Excel®
◦ Statistical programs such as R, SAS, SPSS, and Stata
◦ Database management systems such as MySQL and Microsoft SQL Server® - can perform some basic analysis
◦ Business intelligence applications such as Tableau®, QlikView®, IBM Cognos
Install the Excel Analysis ToolPak
You must already have Microsoft Office with Excel on your computer
Click the File tab, then click Options.
Click Add-Ins, and then in the Manage box, select Excel Add-ins.
Click Go.
In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab.
https://support.office.com/en-us/article/Load-the-Analysis-ToolPak-305c260e-224f-4739-9777-2d86f1a5bd89
Cleaning Data Identify errors
◦ Descriptive statistics
◦ Categorical data
◦ Use of pivot tables
Determine correct values or infer/impute
If uncorrectable delete the record
Work with a copy of your dataset and log all changes!
Data Cleaning – Continuous Data Descriptive Statistics
To generate descriptive statistics in Excel:
Data Data Analysis Descriptive Statistics
Patient Weights
Mean 189.1554
Standard Error 2.916985099
Median 180.6 Mode 192.3
Standard Deviation 65.2257697 Sample Variance 4254.401033
Kurtosis 8.86101958 Skewness 2.554839369
Range 475.6 Minimum 89.4 Maximum 565
Sum 94577.7 Count 500
Confidence Level(95.0%) 5.731086356
Data Cleaning – Categorical Data
A B
1 F 2 U 3 M 4 M 5 F 6 D 7 M 8 M 9 F
10 M
COUNTIF function
=COUNTIF(range, criteria)
=COUNTIF($B$1:$B$10, “M”) - will give 5
=COUNTIF($B$1:$B$10, “F”) - will give 3
=COUNTIF($B$1:$B$10, “U”) - will give 1
• Can identify some errors
Filtering Records • Displays only those records that meet
certain criteria
• Click a cell in the column to be filtered
• On the Data tab, click the Filter icon
Unfiltered Filtered
Filtering Records, continued
• Dialog box displays all the values present in the column
• Can check only values you are interested in – Excel will display only those records
Column Graph Column graph shows individual weights
But doesn’t show us how many patients are in a particular weight category
Frequencies and Histograms Frequency: “How many of X and Y are there?”
A frequency calculation gives how many times a particular value occurs
Can be shown as:
Frequency table
Histogram: a graph of the number of times values occur in a set of data
Example Frequency Table and Histogram
0 Frequency
100 4
149 107
199 222
249 136
299 4
349 6
399 5
449 7
499 7
1000 1
More 0
4
107
222
136
4 6 5 7 7 1 0 0
50
100
150
200
250
100 149 199 249 299 349 399 449 499 1000 More
Freq
uen
cy
0
Histogram
Frequency
Frequency Table
Cat
ego
ries
or
”Bin
s”
Ho
w m
any reco
rds fell in
to th
at catego
ry (or “b
in”)
Example How many patients are in each of the following weight categories (in pounds)?
< 100 300-349
100-149 350-399
150-199 400-499
200-249 500-1000
250-299 1000+
Set up the category bins
Add a column to your Excel spreadsheet with the bins that you want to use to categorize the patient weights
Creating a Frequency Table and Histogram
In Microsoft Excel: Click Data, then Data Analysis, then choose Histogram
Creating a Frequency Table and Histogram
In the Input Range field, enter the range of cells that contain the weights
In the Bin Range field, enter the range of cells that contain the category bins that you created
Click Chart Output
Frequency Table and Histogram Output
Using the Excel Data Analysis ToolPak Frequency function for 500 records
0 Frequency
100 4
149 107
199 222
249 136
299 4
349 6
399 5
449 7
499 7
1000 1
More 0
4
107
222
136
4 6 5 7 7 1 0 0
50
100
150
200
250
100 149 199 249 299 349 399 449 499 1000 More
Freq
uen
cy
0
Histogram
Frequency
Frequency Table
Cat
ego
ries
or
”Bin
s”
Ho
w m
any reco
rds fell in
to th
at catego
ry (or “b
in”)
Sorted Histogram (Pareto)
Pivot Tables Pivot tables are an Excel tool that let you summarize, analyze, and create different views of your your data. You can arrange how the data is displayed.
Pivot tables are very useful for identifying trends or relationships among data in large datasets.
Use the laboratory exercise on pivot tables to explore data on hospital-acquired infections
Example Pivot Table FROM THIS: TO THIS:
90
Chi-Square Test Are two categorical variables related?
Categorical variable examples: ◦ Gender
◦ Ethnicity
◦ Age group (e.g. 40-49, 50-59)
◦ Disease stage (I, II, III, IV)
◦ Presence or absence of a disease
Visualization Objectives 1. Select the best data communication mode, given the analysis goals and results.
2. Interpret data analysis results.
3. Present solutions for a variety of technical data communication challenges.
4. Prepare a simple data visualization using open-source tools.
5. Participate in the design and development of a complex data visualization.
Where to Get Skills
• NMHIA/AHIMA
• Local Colleges
• MOOCs
• Coursera
• MIT OpenCourseWare
Jobs Using These Skills • Health Care Data Analyst
• Operations Data Analyst
• Revenue Analyst
• Quality Improvement Analyst
• Data Integration Associate
• An So On!!!!
Conclusion It’s all about the data!!!!
Question/Answer
• THANK YOU FOR YOUR ATTENTION!
Bibliography • Health and Medicine Division. (n.d.). Retrieved April 28, 2016, from
http://www.nationalacademies.org/hmd/Activities/Quality/LearningHealthCare.aspx • IBM (2013). Descriptive, predictive, prescriptive: Transforming asset and facilities
management with analytics. Retrieved from http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=TIW14162USEN.
• Managing a Data Dictionary. (2012). Journal Of AHIMA, 83(1), 48-52. Retrieved from http://library.ahima.org/doc?oid=105176#.VyeKJoQrJaQ
• Murdoch, T. & Detsky, A. (2013). The Inevitable Application of Big Data to Health Care.JAMA, 309(13), 1351. http://dx.doi.org/10.1001/jama.2013.393
• National Institute of Standards and Technology,. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-1.pdf
• NIST/SEMATECH e-Handbook of Statistical Methods. (n.d.). Retrieved May 02, 2016, from http://www.itl.nist.gov/div898/handbook/
• Overview - Sepsis - Mayo Clinic. (2016). Mayoclinic.org. Retrieved 2 May 2016, from http://www.mayoclinic.org/diseases-conditions/sepsis/home/ovc-20169784
• Ideal Graphs… Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd edition). Cheshire, Conn: Graphics Pr.
Bibliography • Schneeweiss, S. (2014). Learning from big health care data. New England Journal of
Medicine, 370(23), 2161-2163.
• Shapira, G. (2016). The Seven Key Steps of Data Analysis. Oracle.com. Retrieved 28 April 2016, from http://www.oracle.com/us/corporate/profit/big-ideas/052313-gshapira-1951392.html
• Six Steps Of An Analytics Project - Quality Assurance and Project Management. (2015). Quality Assurance and Project Management. Retrieved 2 May 2016, from http://itknowledgeexchange.techtarget.com/quality-assurance/six-steps-of-an-analytics-project/
• What is Hadoop?. (2016). Sas.com. Retrieved 2 May 2016, from http://www.sas.com/en_my/insights/big-data/hadoop.html
• What is Big Data? | Data Science at NIH. (2015). Datascience.nih.gov. Retrieved 2 May 2016, from http://datascience.nih.gov/bd2k/about/what
• Charts, Tables and Figures
• 1.1 Figure: Smith, K. (2016). Clinical Data Warehouse. Used with permission from Kimberly Smith.
• 1.2-1.6 Figures: Definition, P. (2012). Big Data Analytics - Predictive Analytics - Gartner Glossary. Gartner IT Glossary. Retrieved 28 April 2016, from http://www.gartner.com/it-glossary/predictive-analytics
Bibliography Images
• Slide 9: Farcaster. (2014). Data visualization process v1 [Online Image]. Retrieved April 28, 2016 from https://commons.wikimedia.org/wiki/File:Data_visualization_process_v1.png#/media/File:Data_visualization_process_v1.png
• Slide 25: Innesw. (2014). Simple pie chart [Online Image]. Retrieved May 2, 2016 from https://commons.wikimedia.org/wiki/File:Charts_SVG_Example_5_-_Simple_Pie_Chart.svg