99
Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of Biomedical Informatics Gene Hill, BBA, MA; April Salisbury, MBA-HC; LCF Research/NMHIC

Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Analytics!

Susan Fenton, RHIA, PhD; UTHealth School of Biomedical Informatics Gene Hill, BBA, MA; April Salisbury, MBA-HC; LCF Research/NMHIC

Page 2: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Disclaimer

• This material is designed and provided to communicate information about data

analytics in an educational format and manner.

• The author is not providing or offering legal advice but, rather, practical and useful

information and tools to achieve compliant results in the area of data analytics.

• Every reasonable effort has been taken to ensure that the educational information

provided is accurate and useful.

• Applying best practice solutions and achieving results will vary in each

hospital/facility and clinical situation.

• All material in this workshop was adapted from materials developed by The

University of Texas Health Science Center at Houston, funded by the Department

of Health and Human Services, Office of the National Coordinator for Health

Information Technology under Award Number 90WT0006.

• This work is licensed under the Creative Commons Attribution-NonCommercial-

ShareAlike 4.0 International License. To view a copy of this license, visit

http://creativecommons.org/licenses/by-nc-sa/4.0/.

Page 3: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Goals/Objectives or Agenda

• Determine the essential skills for effective healthcare data analysis

• Articulate different data types and appropriate uses for each

• Compare and contrast data analytic types and tools

• Practice data analytics skills

Page 4: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Skills Needed

• Soft Skills

• Curiosity

• Critical Thinking

• Listening

• Technical Skills

• Understand Data

• Basic Stats

• Communication

Page 5: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Introduction to Analytics Definition

Types of analytics ◦Descriptive

◦Diagnostic

◦Predictive

◦Prescriptive

Page 6: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

What is Analytics? “The discovery of meaningful patterns in data, and is one of the steps in the data life cycle of collection of raw data, preparation of information, analysis of patterns to synthesize knowledge, and action to produce value.”

NIST Big Data. (2015) Public Working Group Definitions and Taxonomies Subgroup. Retrieved from http://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf. http://dx.doi.org/10.6028/NIST.SP.1500-1

Page 7: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

What is Analytics?

Entire process of data collection, extraction, transformation, analysis, interpretation, and reporting

"Data visualization process v1" by Farcaster at English Wikipedia.

Licensed under CC BY-SA 3.0 via Commons

https://commons.wikimedia.org/wiki/File:Data_visualization_process_

v1.png#/media/File:Data_visualization_process_v1.png

Page 8: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

What is Analytics?

“Analytics is used to refer to the methods, their implementations in tools, and the results of the use of the tools as interpreted by the practitioner.”

NIST Big Data. (2015) Public Working Group Definitions and Taxonomies Subgroup. Retrieved from http://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf. http://dx.doi.org/10.6028/NIST.SP.1500-1

The analytics process is the synthesis of knowledge from information.

Page 9: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Types of Analytics: Overview Descriptive: uses business intelligence and data mining to ask: “What has happened?”

Diagnostic: examines data to answer “Why did it happen?” Gartner. (n.d.) Gartner IT Glossary: Diagnostic Analytics. Retrieved 2/21/2016 from http://www.gartner.com/it-glossary/diagnostic-analytics.

Predictive: uses statistical models and forecasts to ask: “What could happen?”

Prescriptive: uses optimization and simulation to ask: “What should we do?”

IBM Software. (2013). Descriptive, predictive, prescriptive: Transforming asset and facilities management with analytics. Retrieved 2/21/2016 from http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=TIW14162USEN.

Page 11: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Descriptive Analytics

Describe the data

Common statistics: ◦ counts

◦ averages

Typical reporting methods: ◦ Tables

◦ Pie charts

◦ Column / bar charts

◦ Written narratives

http://www.gartner.com/it-glossary/predictive-analytics

Page 14: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

What Predictive Analytics Cannot Do

“The purpose of predictive analytics is NOT to tell you what will happen in the future. It cannot do that. In fact, no analytics can do that. Predictive analytics can only forecast what might happen in the future, because all predictive analytics are probabilistic in nature.”

◦ Michael Wu as quoted by Jeff Bertolucci in Big Data Analytics: Descriptive vs.

Predictive vs. Prescriptive. Information Week. December 31, 2014, para 13. Retrieved from http://www.informationweek.com/big-data/big-data-analytics/big-data-analytics-descriptive-vs-predictive-vs-prescriptive/d/d-id/1113279.

Page 15: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Prescriptive Analytics

Examines data or content to answer the question “What should be done?” or “What can we do to make _______ happen?

Is characterized by techniques such as ◦ graph analysis ◦ simulation ◦ complex event processing ◦ neural networks ◦ recommendation engines ◦ heuristics ◦ machine learning

http://www.gartner.com/it-glossary/prescriptive-analytics

Page 16: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Steps in Data Analytics 1. Identify the problem and the stakeholders

2. Identify what data are needed and where those data are located

3. Develop a plan for analysis and a plan for retrieval

4. Extract / transform/ load the data

5. Check, clean, and prepare the data for analysis

6. Analyze and interpret the data

7. Visualize the data

8. Disseminate the new knowledge

9. Implement the knowledge into the organization

Page 17: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

1. Identify the Problem or Question and the Stakeholders

Why is this an important problem?

How will the results impact patient care or the institution?

What is the business case?

Who are the stakeholders?

Page 18: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

2. Identify what data are needed

What data elements, such as date of birth, gender, medications, laboratory results, and so on are needed?

Where are these data elements located – in what system or systems and what database tables?

Is there a clinical data warehouse?

Who is the contact person for each system who will be responsible for retrieving the data?

Page 19: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

3. Develop plans for retrieval and analysis

Retrieval

Enlist database administrator for each system

Develop specific plan for retrieving the required data elements

Method for cross-checking number of records as well as completeness – how many should you expect and did you get everything?

Analysis

Enlist statistician

Identify population, sample size, statistical tests to be performed

Page 20: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

4. Extract / Transform/ Load (ETL) Process

Extraction

May be an iterative process

The data are retrieved

Checked for completeness

Descriptive statistics

Errors corrected, empty fields addressed

Transformation

Data synchronized (“transformed”) – e.g. M, F, U vs 1, 2, 9

Loading

Data then imported into destination system

Page 21: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

5. Check, clean, and prepare the data

Data are now in the system where analysis will be run

Should be a complete set of data

Need to check that everything is ready for analysis

Descriptive statistics

Double-check problem or question being investigated

Double-check against analysis plan

Page 22: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

6. Analyze and interpret the data

Use the data analysis plan

Perform the actual statistical analyses as described in the plan

Consult with statistician to confirm interpretations and conclusions

Page 23: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

7. Visualize the Data Nominal (categorical) data: column or bar charts, tables, pie charts, pivot tables

Quantitative data: histograms, scatter plots, star plots

Examples of tools

◦ Microsoft® Excel Chart function

◦ Tableau®

Pie chart https://commons.wikimedia.org/wiki/File: Charts_SVG_Example_5_-_Simple_Pie_Chart.svg

Histogram https://commons.wikimedia.org/wiki/Histogram#/ media/File:Histogram_example.svg

Page 24: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

8 & 9: Disseminating and Implementing

Disseminating the new knowledge

Write up the findings

Disseminate to the stakeholders

Implementing the new knowledge

Requires participation of stakeholders

Page 25: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data, Information, Knowledge, Wisdom Hierarchy Data: symbols, facts, and measurements Information: data processed to be useful; provides the “who, what, when, where” Knowledge: application of data and information; provides the “how” Wisdom: evaluated understanding; provides the “why”

Wisdom

Knowledge

Information

Data

Page 26: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Types of Data in an EHR Quantitative data (eg, laboratory values)

Qualitative data (eg, text-based documents and demographics)

Transactional data (eg, a record of medication delivery).

Murdoch, T. B., & Detsky, A. S. (2013). The inevitable application of big data to health care. Jama, 309(13), 1351-1352.

Page 27: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Understanding the Data: Scales of Measure

Data come in many forms, and those forms determine what can or cannot be done with the data.

For example, two patient names cannot be added together.

Likewise, interpreting the relative distance between two measurements can only be done with certain kinds of data and not others.

There are four scales: Nominal, ordinal, interval, and ratio.

Page 28: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Scales of Measure: Nominal From Latin

Names, labels, categories

Examples: ◦ Patient names (John Doe, Maria Garcia) ◦ Drug names (Ampicillin, Valium) ◦ Eye color (blue, brown, green, gray) ◦ Gender: male, female, unknown ◦ Religious preference (Catholic, Jewish, none)

May be mapped to a number in a database ◦ Example: brown eyes=1, blue eyes=2

Eye image from https://commons.wikimedia.org/wiki/File:Deep_Blue_eye.jpg

Page 29: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Scales of Measure: Ordinal Includes all properties of Nominal (so Ordinal data all have a name of some sort)

Example: first, second, third, i.e., a ranking or order

But intervals are not necessarily equal

http://www.cdc.gov/growthcharts/

Photo by Paul Kehrer.

https://www.flickr.com/photos/paulkehrer/3659279740

Creative Commons Attribution 2.0 Generic (CC BY 2.0) license.

Page 30: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Scales of Measure: Interval and Ratio

Continuous

Has equal intervals; Ratio also has absolute zero.

Examples: distance, length, temperature, weight

Includes properties of Nominal and Ordinal

May be grouped together in one category called “scale”

https://commons.wikimedia.org/wiki/

File:Soft_ruler.jpg

"Clinical thermometer 38.7" by Menchi - Own work.

Licensed under CC BY-SA 3.0 via Commons -

https://commons.wikimedia.org/wiki/File:Clinical_thermomet

er_38.7.JPG#/media/File:Clinical_thermometer_38.7.JPG

Page 31: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Check Understanding

Zip Code

Blood Pressure

Heart Failure Classification I, II, III, IV

Age

Ethnicity

Marital Status

Length of Stay

Discharge Disposition (home, SNF, and so on)

Weight

Level of Education

Page 32: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Inconsistencies Inconsistent naming conventions, such as “systolic blood pressure” versus “blood pressure, systolic”

Inconsistent definitions, such as how the date of admission is defined across departments;

Varying field lengths for the same data element, such as one system allowing a patient’s last name to be up to 50 characters while another system allows 25 characters

Varied data elements, such as M, F, or U for patient gender in one system while another system uses 1, 2, or 9 or Male, Female, or Unknown.

[AHIMA. "Managing a Data Dictionary." Journal of AHIMA 83, no.1 (January 2012): 48-52. Retrieved from http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049331.hcsp?dDocName=bok1_049331]

Page 33: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Dictionaries The first step to understanding the data you are working with

Synthetic data set

Page 34: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Dictionaries The first step to understanding the data you are working with

“a standard definition of data elements”.

Health Information Management Systems Society (HIMSS). (November, 2014). Clinical & Business Intelligence: An Analytics Executive Review Needs Assessment. Retrieved 2/21/16 from http://www.himss.org/ResourceLibrary/genResourceDetailPDF.aspx?ItemNumber=34692

Page 35: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Dictionaries

Page 36: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Common Terms Used in Statistical Analysis

Population

Sample

Paired samples

Data set

Descriptive statistics

Frequency table

Histogram

Chi square

T-Test

Correlation vs. causation

Page 37: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Term: Population A group of things that have something in common

Examples: ◦ Patients in a particular hospital

◦ Patients with a certain diagnosis

◦ Patients with a particular attribute (gender, smoking status, age group)

◦ Patients who had a certain surgical procedure in a given year by a specific surgeon

Page 38: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Term: Sample A representative portion or subset of a group of things – part of a population

Example population: babies born in the United States in 2015

Example sample: a selection of those babies

Paired samples: before-and-after studies, or matched on one or more characteristics

Image credit: Kernler, D. Simple Random Sampling. Retrieved from https://commons.wikimedia.org/wiki/File:Simple_random_sampling.PNG. Licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

Page 39: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Confidence Intervals How well does a sample approximate the entire population?

Often set at 95%

The resulting intervals would bracket the true population parameter in approximately 95 % of the cases

NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm

Page 40: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Set

A data set is a collection of data for a specific purpose. For this presentation, for example, the data set is a collection of 500 records that consists of age, gender, state of residence, marital status, blood type, weight, eye color, and smoking status.

Page 41: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Descriptive Statistics Basic overview of the data

Excel: Data Data Analysis Descriptive Statistics

Should be among the first analyses done on a set of data

Can identify some errors

Mean (average), number of records (count), range of values, maximum and minimum values

Patient Weights

Mean 189.1554

Standard Error 2.916985099

Median 180.6

Mode 192.3

Standard Deviation 65.2257697

Sample Variance 4254.401033

Kurtosis 8.86101958

Skewness 2.554839369

Range 475.6

Minimum 89.4

Maximum 565

Sum 94577.7

Count 500

Confidence Level(95.0%) 5.731086356

Page 42: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Measures of Central Tendency

Mean – arithmetic average of an interval or ratio; very sensitive to outliers

Median – midpoint of a frequency, with 50% of the observations above and 50% of the observations below

Mode – the most frequent observation(s) in a frequency; may not be unique; can be used with nominal data

1, 1, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 9, 10

Page 43: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Measures of Variation

Range – difference between the smallest and largest values in a frequency; simple measure of spread; can also be affected by extreme values or outliers

Variance – amount of variation of all values or scores for a variable; average of the squared deviations from the mean (variance not meaningful at descriptive level)

Standard deviation – amount of dispersion around the mean; square root of the variance; most widely used measure of variability in descriptive statistics

Page 44: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Inferential Statistics Inferring from sample to population and draw conclusions

Depend upon ◦ Data type

◦ Parametric vs. nonparametric data

Chi-square – categorical data

Correlation – continuous data – relationships

ANOVA – continuous data – difference in mean

Page 45: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Correlation and Causation Correlation: relationship between two things

Causation: one causes another

Correlation does not equal causation

Page 46: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

The Potential of Big Data in Healthcare

1. Expand capacity to generate new knowledge ◦ the effectiveness of treatments [Schneeweiss, 2014] ◦ the prediction of outcomes [Schneeweiss, 2014]

2. Knowledge dissemination

3. Using analytics to combine EHR and genomic data to translate personalized medicine to clinical practice

4. Deliver information directly to patients and increase patient participation in their health care

Murdoch, T. B., & Detsky, A. S. (2013). The inevitable application of big data to health care. Jama, 309(13), 1351-1352.

Schneeweiss, S. (2014). Learning from big health care data. New England Journal of Medicine, 370(23), 2161-2163.

Page 47: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

What is Big Data? Characteristics of big data:

◦ Volume (i.e., the size of the dataset) ◦ Variety (i.e., data from multiple repositories, domains, or

types) ◦ Velocity (i.e., rate of flow) ◦ Variability (i.e., the change in other characteristics) ◦ Value (i.e., is the cost worth it?)

Traditional data architectures (such as typical relational databases) cannot handle this type of data

New architectures are required

Source: NIST Special Publication 1500-1. NIST Big Data Interoperability Framework: Volume 1, Definitions. Final Version 1, page 4. NIST Big Data Public Working Group Definitions and Taxonomies Subgroup. Retrieved from http://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf http://dx.doi.org/10.6028/NIST.SP.1500-1

Page 48: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Tools Hadoop

◦ Runs on clusters of hardware

MongoDB ◦ Stores data using documents with fields

NoSQL utilities

Page 49: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Requirements For Analytics for Learning Systems

A way to ensure that patient groups being compared are truly similar

Automated tools for analysis

Ability to rapidly run automated tools against new data

Software that can be used with little training and helps prevent errors in interpretation

Easily understood results

Schneeweiss, S. (2014). Learning from big health care data. New England Journal of Medicine, 370(23), 2161-2163.

Page 50: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Challenges Facing Biomedical Big Data

Amount of information

Lack of organization

Lack of access to data and tools

Insufficient training in data science methods

National Institutes of Health. What is Big Data? Retrieved from http://datascience.nih.gov/bd2k/about/what, para 3.

Page 51: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Summary Types of data.

Technology or tools for working with different data types.

Determine whether data fits the definition of Big Data.

Challenges faced when working with Big Data.

Common terms used in data analysis, such as sample, paired, histogram, population, correlation vs. causation, and descriptive.

Page 52: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Communicating and Data Analytics

Page 53: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Visualization Objectives 1. Select the best data communication mode, given the analysis goals and results.

2. Interpret data analysis results.

3. Present solutions for a variety of technical data communication challenges.

4. Prepare a simple data visualization using open-source tools.

5. Participate in the design and development of a complex data visualization.

Page 54: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Communication Basics Delineate the Problem

Define Your Audience

Choose the Right Mode

Choose the Right Words

Choose Supporting Visuals

Have an “Elevator” Speech

http://www.aaas.org/pes/communication-101-communication-basics-scientists-and-engineers

Page 55: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Delineate the Problem What are you trying to do with this communication?

Is there a particular timeframe involved?

Does the number of people matter?

Page 56: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Define Your Audience Colleagues or co-workers

Staff or supervisors

Subject matter experts

Other scientists

Journalists

Policymakers

Others

Page 57: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Choose the Right Mode Brainstorm the possible communication channels

◦ Email

◦ Website or blog

◦ Podcast or YouTube

◦ Peer-reviewed manuscript

◦ Conference presentation

◦ Dashboard

Page 58: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Choose the Right Words For the audience

Be careful of acronyms (ONC, EHR or MU, as examples)

Jargon can hard to follow

Short words and short sentences

Not too many!

Page 59: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Visualization

show the data.

induce the viewer to think about the substance rather than the methodology, graphic design, the technology, or other things.

avoid distorting what the data have to say.

present many numbers in a small space.

make large data sets coherent.

encourage the eye to compare different pieces of data.

reveal the data at several levels of detail.

serve a reasonably clear purpose.

are closely integrated with the statistical and verbal descriptions of the data set. (Tufte)

Page 60: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Bar Charts

Show comparisons between groups

Can be vertical (aka column charts) or horizontal

Can be a histogram (next slide)

Can be Pareto (most to least)

Page 61: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Histogram

This sample is from fictional data.

Page 62: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Stacked Bar Graph

0%

20%

40%

60%

80%

100%

Yr1 Yr2 Yr3 Yr4

20% 25% 30%

10%

40% 25%

40%

25%

30% 40%

20%

45%

5% 5% 5% 10%

5% 5% 5% 10%

Student Grade Distribution

F

D

C

B

A

This graph is total fiction. It does not represent actual grade distribution.

Page 63: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Line Charts

Used for large amounts of data occurring over time

Page 64: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Pie Charts (or Shape Charts) Display data as a proportion of a whole

No axes

Can explode out for emphasis

Can be any shape

Page 65: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Polar or Radar Charts

Multiple series or categories of data

Larger values are farther from the center

Page 66: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Scatterplots or scatter charts Values represented as a series of points on a chart

Distributions of values and clusters of data

Displaying and comparing numerical data

Stanfill, M. H., Williams, M., Fenton, S. H., Jenders, R. A., & Hersh, W. R. (2010). A systematic literature review of automated clinical coding and classification systems. Journal of the American Medical Informatics Association: JAMIA, 17(6), 646–651. http://doi.org/10.1136/jamia.2009.001024

Page 67: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Display in Action! www.texmed.org/WorkArea/DownloadAsset.aspx?id=24815

Page 68: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

The “Elevator” Speech Can you tell the story in the time it takes to ride the elevator?

Three main points

Meaningful

Easy to understand

Page 69: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Summary Effective data communication requires thought and planning

All communication is not equal for all audiences

The visual presentation is particularly important, but can be more difficult

Have an “elevator” speech

Page 70: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Working with Data

Page 72: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Exercise Objectives Describe reasons why data need to be cleaned or modified before analysis

Demonstrate ability to identify and correct basic errors in data

Demonstrate ability to perform descriptive statistics

Demonstrate ability to use pivot tables

Describe the relationship between a database in an HIT system and data analysis tools

Page 73: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Technologies and Tools Common technologies and tools used for data analytics include: ◦ Spreadsheet programs such as Microsoft Excel®

◦ Statistical programs such as R, SAS, SPSS, and Stata

◦ Database management systems such as MySQL and Microsoft SQL Server® - can perform some basic analysis

◦ Business intelligence applications such as Tableau®, QlikView®, IBM Cognos

Page 74: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Install the Excel Analysis ToolPak

You must already have Microsoft Office with Excel on your computer

Click the File tab, then click Options.

Click Add-Ins, and then in the Manage box, select Excel Add-ins.

Click Go.

In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.

After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab.

https://support.office.com/en-us/article/Load-the-Analysis-ToolPak-305c260e-224f-4739-9777-2d86f1a5bd89

Page 75: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Cleaning Data Identify errors

◦ Descriptive statistics

◦ Categorical data

◦ Use of pivot tables

Determine correct values or infer/impute

If uncorrectable delete the record

Work with a copy of your dataset and log all changes!

Page 76: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Cleaning – Continuous Data Descriptive Statistics

To generate descriptive statistics in Excel:

Data Data Analysis Descriptive Statistics

Patient Weights

Mean 189.1554

Standard Error 2.916985099

Median 180.6 Mode 192.3

Standard Deviation 65.2257697 Sample Variance 4254.401033

Kurtosis 8.86101958 Skewness 2.554839369

Range 475.6 Minimum 89.4 Maximum 565

Sum 94577.7 Count 500

Confidence Level(95.0%) 5.731086356

Page 77: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Data Cleaning – Categorical Data

A B

1 F 2 U 3 M 4 M 5 F 6 D 7 M 8 M 9 F

10 M

COUNTIF function

=COUNTIF(range, criteria)

=COUNTIF($B$1:$B$10, “M”) - will give 5

=COUNTIF($B$1:$B$10, “F”) - will give 3

=COUNTIF($B$1:$B$10, “U”) - will give 1

• Can identify some errors

Page 78: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Filtering Records • Displays only those records that meet

certain criteria

• Click a cell in the column to be filtered

• On the Data tab, click the Filter icon

Unfiltered Filtered

Page 79: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Filtering Records, continued

• Dialog box displays all the values present in the column

• Can check only values you are interested in – Excel will display only those records

Page 80: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Column Graph Column graph shows individual weights

But doesn’t show us how many patients are in a particular weight category

Page 81: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Frequencies and Histograms Frequency: “How many of X and Y are there?”

A frequency calculation gives how many times a particular value occurs

Can be shown as:

Frequency table

Histogram: a graph of the number of times values occur in a set of data

Page 82: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Example Frequency Table and Histogram

0 Frequency

100 4

149 107

199 222

249 136

299 4

349 6

399 5

449 7

499 7

1000 1

More 0

4

107

222

136

4 6 5 7 7 1 0 0

50

100

150

200

250

100 149 199 249 299 349 399 449 499 1000 More

Freq

uen

cy

0

Histogram

Frequency

Frequency Table

Cat

ego

ries

or

”Bin

s”

Ho

w m

any reco

rds fell in

to th

at catego

ry (or “b

in”)

Page 83: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Example How many patients are in each of the following weight categories (in pounds)?

< 100 300-349

100-149 350-399

150-199 400-499

200-249 500-1000

250-299 1000+

Page 84: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Set up the category bins

Add a column to your Excel spreadsheet with the bins that you want to use to categorize the patient weights

Page 85: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Creating a Frequency Table and Histogram

In Microsoft Excel: Click Data, then Data Analysis, then choose Histogram

Page 86: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Creating a Frequency Table and Histogram

In the Input Range field, enter the range of cells that contain the weights

In the Bin Range field, enter the range of cells that contain the category bins that you created

Click Chart Output

Page 87: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Frequency Table and Histogram Output

Using the Excel Data Analysis ToolPak Frequency function for 500 records

0 Frequency

100 4

149 107

199 222

249 136

299 4

349 6

399 5

449 7

499 7

1000 1

More 0

4

107

222

136

4 6 5 7 7 1 0 0

50

100

150

200

250

100 149 199 249 299 349 399 449 499 1000 More

Freq

uen

cy

0

Histogram

Frequency

Frequency Table

Cat

ego

ries

or

”Bin

s”

Ho

w m

any reco

rds fell in

to th

at catego

ry (or “b

in”)

Page 88: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Sorted Histogram (Pareto)

Page 89: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Pivot Tables Pivot tables are an Excel tool that let you summarize, analyze, and create different views of your your data. You can arrange how the data is displayed.

Pivot tables are very useful for identifying trends or relationships among data in large datasets.

Use the laboratory exercise on pivot tables to explore data on hospital-acquired infections

Page 90: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Example Pivot Table FROM THIS: TO THIS:

90

Page 91: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Chi-Square Test Are two categorical variables related?

Categorical variable examples: ◦ Gender

◦ Ethnicity

◦ Age group (e.g. 40-49, 50-59)

◦ Disease stage (I, II, III, IV)

◦ Presence or absence of a disease

Page 92: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Visualization Objectives 1. Select the best data communication mode, given the analysis goals and results.

2. Interpret data analysis results.

3. Present solutions for a variety of technical data communication challenges.

4. Prepare a simple data visualization using open-source tools.

5. Participate in the design and development of a complex data visualization.

Page 93: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Where to Get Skills

• NMHIA/AHIMA

• Local Colleges

• MOOCs

• Coursera

• MIT OpenCourseWare

Page 94: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Jobs Using These Skills • Health Care Data Analyst

• Operations Data Analyst

• Revenue Analyst

• Quality Improvement Analyst

• Data Integration Associate

• An So On!!!!

Page 95: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Conclusion It’s all about the data!!!!

Page 96: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Question/Answer

• THANK YOU FOR YOUR ATTENTION!

Page 97: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Bibliography • Health and Medicine Division. (n.d.). Retrieved April 28, 2016, from

http://www.nationalacademies.org/hmd/Activities/Quality/LearningHealthCare.aspx • IBM (2013). Descriptive, predictive, prescriptive: Transforming asset and facilities

management with analytics. Retrieved from http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=TIW14162USEN.

• Managing a Data Dictionary. (2012). Journal Of AHIMA, 83(1), 48-52. Retrieved from http://library.ahima.org/doc?oid=105176#.VyeKJoQrJaQ

• Murdoch, T. & Detsky, A. (2013). The Inevitable Application of Big Data to Health Care.JAMA, 309(13), 1351. http://dx.doi.org/10.1001/jama.2013.393

• National Institute of Standards and Technology,. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-1.pdf

• NIST/SEMATECH e-Handbook of Statistical Methods. (n.d.). Retrieved May 02, 2016, from http://www.itl.nist.gov/div898/handbook/

• Overview - Sepsis - Mayo Clinic. (2016). Mayoclinic.org. Retrieved 2 May 2016, from http://www.mayoclinic.org/diseases-conditions/sepsis/home/ovc-20169784

• Ideal Graphs… Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd edition). Cheshire, Conn: Graphics Pr.

Page 98: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Bibliography • Schneeweiss, S. (2014). Learning from big health care data. New England Journal of

Medicine, 370(23), 2161-2163.

• Shapira, G. (2016). The Seven Key Steps of Data Analysis. Oracle.com. Retrieved 28 April 2016, from http://www.oracle.com/us/corporate/profit/big-ideas/052313-gshapira-1951392.html

• Six Steps Of An Analytics Project - Quality Assurance and Project Management. (2015). Quality Assurance and Project Management. Retrieved 2 May 2016, from http://itknowledgeexchange.techtarget.com/quality-assurance/six-steps-of-an-analytics-project/

• What is Hadoop?. (2016). Sas.com. Retrieved 2 May 2016, from http://www.sas.com/en_my/insights/big-data/hadoop.html

• What is Big Data? | Data Science at NIH. (2015). Datascience.nih.gov. Retrieved 2 May 2016, from http://datascience.nih.gov/bd2k/about/what

• Charts, Tables and Figures

• 1.1 Figure: Smith, K. (2016). Clinical Data Warehouse. Used with permission from Kimberly Smith.

• 1.2-1.6 Figures: Definition, P. (2012). Big Data Analytics - Predictive Analytics - Gartner Glossary. Gartner IT Glossary. Retrieved 28 April 2016, from http://www.gartner.com/it-glossary/predictive-analytics

Page 99: Data Analytics! Susan Fenton, RHIA, PhD; UTHealth School of … · 2019. 1. 25. · Develop specific plan for retrieving the required data elements Method for cross-checking number

Bibliography Images

• Slide 9: Farcaster. (2014). Data visualization process v1 [Online Image]. Retrieved April 28, 2016 from https://commons.wikimedia.org/wiki/File:Data_visualization_process_v1.png#/media/File:Data_visualization_process_v1.png

• Slide 25: Innesw. (2014). Simple pie chart [Online Image]. Retrieved May 2, 2016 from https://commons.wikimedia.org/wiki/File:Charts_SVG_Example_5_-_Simple_Pie_Chart.svg