32
Introduction to Data Analytics Glossary TERM DEFINITION SOURCE American Recovery and Reinvestment Act (ARRA) Stimulus package enacted by the 111th United States Congress in February 2009 and signed into law on February 17, 2009. The Act included direct spending in infrastructure, education, health, and energy, federal tax incentives, and expansion of unemployment benefits and other social welfare provisions. https://en.wikipedia.org/wiki/ American_Recovery_and_Rein vestment_Act_of_2009 Attribute A characteristic of a database entity; each column (or field) in a database table. Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall. Attributes Properties, characteristics or qualities such as color, size and shape, that are used to describe a person, place or thing. https://en.wikipedia.org/wiki/ Relation_%28database%29 Average "The sum of a list of numbers divided by the number of numbers in the list. In mathematics and statistics, this would be called the arithmetic mean. In statistics, mean, median, and mode are all known as measures of central tendency." https://en.wikipedia.org/wiki/ Average Balanced Scorecard An analytic technique designed to translate an organization's mission statement and overall business strategy into specific, quantifiable goals and to monitor the organization's performance in terms of achieving these goals. It is a comprehensive approach that assesses performance with respect to internal processes, customer satisfaction and retention, employee satisfaction and retention and financial indicators. http://www.beckershospitalre view.com/strategic- planning/balanced-scorecards- revealed.html Balancing Indicators Indicators that provide a look at the system as a whole as processes and outcomes are improved. They may help identify unintended consequences. Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (pp. 126-127).

Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary

TERM DEFINITION SOURCE

American Recovery and Reinvestment Act (ARRA)

Stimulus package enacted by the 111th United States Congress in February 2009 and signed into law on February 17, 2009. The Act included direct spending in infrastructure, education, health, and energy, federal tax incentives, and expansion of unemployment benefits and other social welfare provisions.

https://en.wikipedia.org/wiki/American_Recovery_and_Reinvestment_Act_of_2009

Attribute A characteristic of a database entity; each column (or field) in a database table.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Attributes Properties, characteristics or qualities such as color, size and shape, that are used to describe a person, place or thing.

https://en.wikipedia.org/wiki/Relation_%28database%29

Average "The sum of a list of numbers divided by the number of numbers in the list. In mathematics and statistics, this would be called the arithmetic mean. In statistics, mean, median, and mode are all known as measures of central tendency."

https://en.wikipedia.org/wiki/Average

Balanced Scorecard

An analytic technique designed to translate an organization's mission statement and overall business strategy into specific, quantifiable goals and to monitor the organization's performance in terms of achieving these goals. It is a comprehensive approach that assesses performance with respect to internal processes, customer satisfaction and retention, employee satisfaction and retention and financial indicators.

http://www.beckershospitalreview.com/strategic-planning/balanced-scorecards-revealed.html

Balancing Indicators

Indicators that provide a look at the system as a whole as processes and outcomes are improved. They may help identify unintended consequences.

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (pp. 126-127).

Page 2: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 2

TERM DEFINITION SOURCE

Hoboken, NJ: John Wiley & Sons, Inc.

Bar Chart "A chart that presents data using rectangular bars with lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally."

https://en.wikipedia.org/wiki/Bar_chart

Big Data Unprecedented growth, collection, and generation of data.

http://www.webopedia.com/TERM/B/big_data.html

Bimodal Distribution

A distribution with two different modes which appear as distinct peaks.

https://en.wikipedia.org/wiki/Multimodal_distribution

Binary Data that can take only two possible values, such as heads or tails in a coin toss.

https://en.wikipedia.org/wiki/Binary_data

Bivariate

Statistical term for data involving two variables. For example, you might compare number of flu shots per month against number of flu cases per month.

https://en.wikipedia.org/wiki/Bivariate_data

Bottlenecks "A point of congestion in a system that occurs when workloads arrive too quickly for the production process to handle." Bottlenecks often cause delays in increased production costs. "Bottleneck" refers to the shape of a bottle, the neck being the narrowest point, which is the most likely place for congestion to occur.

http://www.investopedia.com/terms/b/bottleneck.asp

Box-and-Whisker Plot

A graphic display in which the first and third quartiles are at the ends of a box, the median is indicated with a vertical line in the interior of the box, and the maximum and minimum are at the ends of whiskers, lines extending above and below the box. Box-and-whisker plots are helpful in interpreting the distribution of data.

https://en.wikipedia.org/wiki/Box_plot

Business Intelligence

"A technology-driven process for analyzing data and presenting actionable information to help corporate executives, business managers and other end users make more informed business decisions."

http://searchdatamanagement.techtarget.com/definition/business-intelligence

Page 3: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 3

TERM DEFINITION SOURCE

Business Rules "Precise, non-ambiguous descriptions of policies, procedures or principles within a specific organization’s environment;” used to guide the structure and relationships of data within a database to meet the needs of the organization.

Rob, P., & Coronel, P. (2004). Database systems: Design, implementation & management (6th ed.) (p. 32). Boston, MA: Course Technology.

Cardinality A term used to "describe how instances of one entity are related to instances of another entity".

Shelly, G. B., Cashman, T. J. & Rosenblatt, H. J. (2003). Systems analysis and design (5th ed.) (p. 264). Boston: Course Technology.

Categorical Categorical variables represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level.

http://www.stat.yale.edu/Courses/1997-98/101/catdat.htm

Centers for Medicare & Medicaid Services (CMS)

Part of the Department of Health and Human Services (HHS) which administers Medicare, Medicaid, the Children’s Health Insurance Program (CHIP), and the Health Insurance Marketplace.

https://www.cms.gov/About-CMS/About-CMS.html

Central Tendency

A central or typical value of a set of numbers. The most common measures of central tendency are the arithmetic mean, the median and the mode.

https://en.wikipedia.org/wiki/Central_tendency

Chart Smart Using effective design principles, examples, and coding solutions for making the best visual communications for quick and easy interpretation and comprehension of precise data values, trends and patterns.

http://calanswers.berkeley.edu/sites/default/files/tufteintwentyminutes.pdf

Chartjunk Visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information.

http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=00040Z

Clinical Decision Support (CDS) Systems

Provides clinicians, staff, patients or other individuals with knowledge and person-specific information, intelligently filtered or presented at appropriate times, to enhance health

https://www.healthit.gov/policy-researchers-implementers/clinical-decision-support-cds

Page 4: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 4

TERM DEFINITION SOURCE

and health care via tools such as computerized alerts and reminders to care providers and patients; clinical guidelines; condition-specific order sets; focused patient data reports and summaries; documentation templates; diagnostic support, and contextually relevant reference information, among other tools.

Computerized Provider Order Entry (CPOE)

The process of a medical professional entering medication orders or other physician instructions electronically instead of on paper charts. A primary benefit of CPOE is that it can help reduce errors related to poor handwriting or transcription of medication orders.

http://searchhealthit.techtarget.com/definition/computerized-physician-order-entry-CPOE

Context The environment or setting in which something exists. In data analytics, it is often said that data without context is meaningless.

http://data-informed.com/visualization-experts-data-needs-context-and-clarity-to-connect-with-audience

Continuous Data that can be broken down into smaller and smaller units, such as weight.

https://www.isixsigma.com/dictionary/continuous-data/

Corporate Data Warehouse (CDW)

Also known as Enterprise Data Warehouse (EDW); a large scale data warehouse that incorporates the data of the entire organization, or a major division, site or activity of an organization.

https://en.wikipedia.org/wiki/Data_warehouse

Correlation "Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A positive correlation indicates the extent to which those variables increase in parallel; a negative correlation indicates the extent to which one variable increases as the other decreases."

http://www.ijetts.org/admin/issues/IJETTS%20-%20030401.pdf

Crow's Foot Model

A method used to depict relationships between entities using circles, bars and symbol.

Shelly, G. B., Cashman, T. J. & Rosenblatt, H. J. (2003). Systems analysis and design

Page 5: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 5

TERM DEFINITION SOURCE

(5th ed.). Boston: Course Technology.

Dashboard "A visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance."

http://perceptualedge.com/articles/ie/dashboard_confusion.pdf

Data Raw facts, symbols or signs. "Data simply exists and has no significance beyond its existence (in and of itself). It can exist in any form, usable or not. It does not have meaning of itself."

http://www.systems-thinking.org/dikw/dikw.htm

Data Accessibility

The level of ease and efficiency at which data are legally obtainable, within a well-protected and controlled environment.

https://www.alation.com/content/data-accessibility-definition

Data Accuracy "The extent to which the data are free of identifiable errors."

http://bok.ahima.org/PB/DataQualityModel#.WHlR7v7ruZN

Data Analysis "A process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making."

https://en.wikipedia.org/wiki/Data_analysis

Data Analytics The science of examining raw data with the purpose of drawing conclusions about that information.

http://searchdatamanagement.techtarget.com/definition/data-analytics

Data Cleansing "The process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated. An organization in a data-intensive field like banking, insurance, retailing, telecommunications, or transportation might use a data scrubbing tool to systematically examine data for flaws by using rules, algorithms, and look-up tables." Also called "data scrubbing."

http://searchdatamanagement.techtarget.com/definition/data-scrubbing

Data Comprehensive-ness

"The extent to which all required data within the entire scope are collected, documenting intended exclusions."

http://bok.ahima.org/PB/DataQualityModel#.WHlR7v7ruZN

Page 6: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 6

TERM DEFINITION SOURCE

Data Consistency

"The extent to which the healthcare data are reliable, identical, and reproducible by different users across applications."

http://bok.ahima.org/PB/DataQualityModel#.WHlR7v7ruZN

Data Cube A multidimensional view of the data using fact and dimension tables that can be analyzed using simple windowing techniques.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Data Currency "The extent to which data are up to date; a datum value is up to date if it is current for a specific point in time, and it is outdated if it was current at a preceding time but incorrect at a later time."

http://bok.ahima.org/PB/DataQualityModel#.WHlR7v7ruZN

Data Definition "The specific meaning of a healthcare related data element." Specifying data definitions ensures that the meaning of the data is understandable and consistent across an enterprise.

http://bok.ahima.org/PB/DataQualityModel#.WBEhy_Q1BiY

Data Dictionary A structure that stores metadata. Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.). Upper Saddle River, NJ: Prentice Hall.

Data Exploration

A step in the data analysis process that typically involves summarizing the main characteristics of a dataset by determining the relationships among the variables looking for patterns, trends and clusters. It is commonly conducted using visual analytics tools, but can also be done in more advanced statistical software, such as R.

http://searchbusinessanalytics.techtarget.com/definition/data-exploration

Data Governance

"The overall management of the availability, usability, integrity, and security of the data employed in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures."

http://searchdatamanagement.techtarget.com/definition/data-governance

Page 7: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 7

TERM DEFINITION SOURCE

Data Granularity

"The level of detail at which the attributes and characteristics of data quality in healthcare data are defined."

http://bok.ahima.org/PB/DataQualityModel#.WBEhy_Q1BiY

Data Integration "Involves combining data residing in different sources and providing users with a unified view of these data."

https://en.wikipedia.org/wiki/Data_integration

Data Integrity The "degree in which data is accurate and reliable".

Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.) (p. 381). Upper Saddle River, NJ: Prentice Hall.

Data Management

"The development and execution of architectures, policies, practices and procedures in order to manage the information lifecycle needs of an enterprise in an effective manner. The process of collecting and integrating data from multiple sources, evaluating the data for quality, cleansing the data to remove data that is inaccurate, incomplete, improperly formatted or duplicated so that data can be further analyzed."

http://www.techtarget.com/search/query?q=data+management

Data Mart "A small single subject data warehouse subset that provides decision support to a small group of people."

Rob, P., & Coronel, P. (2004). Database systems: Design, implementation & management (6th ed.) (p. 570). Boston, MA: Course Technology.

Data Mining The process of" sorting through data to identify patterns and establish relationships."

http://searchsqlserver.techtarget.com/definition/data-mining

Data Model Also known as schema. It is a model that depicts the structure of a database, capturing 'the relationship among data within a database and used in the conceptualization and design of the database".

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.) (p. 207). Upper Saddle River, NJ: Pearson/Prentice Hall.

Data Modeling "The analysis of data objects and their relationships to other data objects. Data modeling is often the first step in database design and object-oriented programming as the designers first

http://www.webopedia.com/TERM/D/data_modeling.html

Page 8: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 8

TERM DEFINITION SOURCE

create a conceptual model of how data items relate to each other. Data modeling involves a progression from conceptual model to logical model to physical schema."

Data Precision "The degree to which measures support their purpose, and/or the closeness of two or more measures to each other."

http://bok.ahima.org/PB/DataQualityModel#.WBEhy_Q1BiY

Data Quality "The perception or assessment of data's fitness to serve its purpose in a given context.... Within an organization, acceptable data quality is crucial to operational and transactional processes and to the reliability of business analytics / business intelligence reporting. Data quality is affected by the way data is entered, stored and managed."

http://searchdatamanagement.techtarget.com/definition/data-quality

Data Relevancy "The extent to which healthcare-related data are useful for the purposes for which they were collected."

http://bok.ahima.org/PB/DataQualityModel#.WBEhy_Q1BiY

Data Segmentation

Grouping data into meaningful units for analysis or data security. In terms of data security, electronic labeling or tagging of patient's health information in a way that allows patients or providers to electronically share parts, but not all, of a patient record. Sensitive portions can be keep private such as HIV status, mental health counseling etc.

https://www.healthit.gov/providers-professionals/data-segmentation-overview

Data Stewardship

"The management and oversight of an organization's data assets to help provide business users with high-quality data that is easily accessible in a consistent manner."

http://searchdatamanagement.techtarget.com/definition/data-stewardship

Data Storage "A general term for archiving data in electromagnetic or other forms for use by a computer or device. Different types of data storage play different roles in a computing environment."

https://www.techopedia.com/definition/23342/data-storage

Page 9: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 9

TERM DEFINITION SOURCE

Data Timeliness "The availability of up-to-date data within the useful, operative, or indicated time."

http://bok.ahima.org/PB/DataQualityModel#.WBEhy_Q1BiY

Data Transformation

Process of converting data from the format of the source system to the format requirements of the data warehouse or data mart.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Data Usage The reason for collecting the data so it can be used by analytics to support quality, performance and or business goals of the organization.

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (pp. 92-93). Hoboken, NJ: John Wiley & Sons, Inc.

Data Visualization

"A general term that describes any effort to help people understand the significance of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier with data visualization software."

http://searchbusinessanalytics.techtarget.com/definition/data-visualization

Data Warehouse

"A subject oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes and business intelligence."

Inmon, W.H. & Hackathorn, R.D. (1994). in Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.) (p.422). Upper Saddle River, NJ: Pearson/Prentice Hall.

Database A collection of data organized in tables for a specific purpose used by individuals, workgroups, department/divisions and organizations/enterprise.

Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.). Upper Saddle River, NJ: Prentice Hall.

Database Management System (DBMS)

A software package used to create, enter and modify data and retrieve information from a database.

Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.). Upper Saddle River, NJ: Prentice Hall.

Data-driven "Means that progress in an activity is compelled by data, rather than by intuition or personal experience. It is

https://en.wikipedia.org/wiki/Data-driven

Page 10: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 10

TERM DEFINITION SOURCE

often labeled as business jargon for what scientists call evidence-based decision making."

Depth Perception

"Depth perception is the ability to judge the distance of objects and the spatial relationship of objects at different distances."

http://www.merriam-webster.com/dictionary/depth%20perception

Descriptive Analytics

The description or summarization of raw data from an already occurring event or events. This type of analytics uses data aggregation and data mining techniques to provide insight into the past and answer: “What has happened?”

https://halobi.com/2014/10/descriptive-predictive-and-prescriptive-analytics-explained/

Descriptive Statistics

Numbers that summarize and describe the main properties of a data set including the central tendency and spread.

http://onlinestatbook.com/2/introduction/descriptive.html

Dimension Part of a logical cube that gives context to the measures data and can be broken down further into hierarchies, levels and attributes.

Oracle Corporation. (2003). Oracle application developer’s guide—10 g Release 1 (10.1), Part Number B10333-02. Retrieved from https://docs.oracle.com/cd/B12037_01/olap.101/b10333/title.htm

Dimension Table

In data warehousing, a table that contains descriptive data about subjects of a business such as products, customers, time etc.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Discrete Data that can only take on certain values and cannot be subdivided into smaller units, such as the number of students in a class.

http://medical-dictionary.thefreedictionary.com/discrete+data

Dispersion The spread, variability, or scatter denotes how stretched or squeezed a distribution is, commonly measured by variance, standard deviation and interquartile range.

https://en.wikipedia.org/wiki/Statistical_dispersion

Page 11: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 11

TERM DEFINITION SOURCE

Distribution A set of data collected to answer a statistical questions has a distribution that describes its center, spread, and overall shape.

http://stattrek.com/statistics/charts/data-patterns.aspx?Tutorial=AP

DMAIC DMAIC (Define, Measure, Analyze, Improve, and Control) is a data-driven quality improvement strategy used to improve processes. It is an integral part of a Six Sigma initiative, but in general can be implemented as a standalone quality improvement procedure or as part of other process improvement initiatives such as Lean.

http://asq.org/learn-about-quality/six-sigma/overview/dmaic.html

End Users "In Information Technology, the term end user is used to distinguish the person for whom a hardware or software product is designed." The end user can be contrasted with the developers or programmers of the product.

http://www.bitpipe.com/tlist/Information-Technology-End-Users.html

Enterprise Data Warehouse (EDW)

Also known as Corporate Data Warehouse (CDW); a large scale data warehouse that incorporates the data of the entire organization, or a major division, site or activity of an organization.

https://en.wikipedia.org/wiki/Data_warehouse

Entity "A person, place, object, event, concept, or thing in the user environment about which the organization wishes to maintain data"; a table in a database.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.) (p. 7). Upper Saddle River, NJ: Pearson/Prentice Hall.

Entity Relationship Diagram (E-R diagram)

An entity-relationship diagram, or ERD, is a chart that visually represents the relationship between database entities. ERDs model an organization’s data storage requirements with three main components: entities, attributes, and relationships.

https://www.lucidchart.com/pages/what-is-ERD

Extract Transform Load

The process where data is removed from the source environment, changed from the source format to the format of the data warehouse, and then placed

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper

Page 12: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 12

TERM DEFINITION SOURCE

into the data warehouse. Often referred to as ETL.

Saddle River, NJ: Pearson/Prentice Hall.

Fact Table In data warehousing, a table that contains quantitative measures, metrics or facts of a business process that can be used in calculations such as units sold, and cost per unit. It is located at the center of a star schema or a snowflake schema with linkages to the dimension tables.

https://en.wikipedia.org/wiki/Fact_table

Field A column in a database table. Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.). Upper Saddle River, NJ: Prentice Hall.

Flat File Database

A database with a single table that contains records without any structured relationship between the records. A flat file contains multiple instances of data redundancy.

Shelly, G. B., Cashman, T. J. & Rosenblatt, H. J. (2003). Systems analysis and design (5th ed.). Boston: Course Technology.

Flow Chart "A flow chart is a type of diagram that represents an algorithm, workflow or process, showing the steps as boxes of various kinds, and their order by connecting them with arrows."

https://en.wikipedia.org/wiki/Flowchart

Foreign Key In the context of relational databases, an identifier that is a primary key in a different table.

Rob, P., & Coronel, P. (2004). Database systems: Design, implementation & management (6th ed.). Boston, MA: Course Technology.

Frequency Distribution

A table or graph that displays the frequency, or count, of the occurrences of values within each group or interval in a data set, and in this way, the table summarizes the distribution of values in the sample.

http://www.investopedia.com/terms/f/frequencydistribution.asp#ixzz4OtaPqXQ0

Gantt Chart A chart showing activities (tasks or events) displayed against time. "On the left of the chart is a list of the activities and along the top is a suitable time scale. Each activity is represented by a bar; the position and length of the bar

http://www.gantt.com/

Page 13: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 13

TERM DEFINITION SOURCE

reflects the start date, duration and end date of the activity."

Gestalt Principles

"A psychology term which means "unified whole." It refers to theories of visual perception developed by German psychologists in the 1920s. These theories attempt to describe how people tend to organize visual elements into groups or unified wholes when certain principles are applied, such as similarity, proximity, color and shape."

http://graphicdesign.spokanefalls.edu/tutorials/process/gestaltprinciples/gestaltprinc.htm

Gestalt Psychology

"Gestalt psychology, school of psychology founded in the 20th century that provided the foundation for the modern study of perception. Gestalt theory emphasizes that the whole of anything is greater than its parts. That is, the attributes of the whole are not deducible from analysis of the parts in isolation."

https://www.britannica.com/science/Gestalt-psychology

Health and Medicine Division (HMD) (formerly known as IOM)

Division of the National Academies of Sciences, Engineering, and Medicine. "HMD’s aim is to help those in government and the private sector make informed health decisions by providing evidence upon which they can rely." Formerly known as the Institute of Medicine (IOM), the division's name was changed in March, 2016.

http://nationalacademies.org/hmd/About-HMD.aspx

Health Care Quality

The degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge.

http://www.hhs.gov/asl/testify/2009/03/t20090318b.html

Health Care Value

Value can be quantified by the ratio of desired outcomes relative to cost of care.

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (p. 53). Hoboken, NJ: John Wiley & Sons, Inc.

Health Information

"HIT is information technology applied to health and health care. It supports

http://en.wikipedia.org/wiki/Health_information_technology

Page 14: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 14

TERM DEFINITION SOURCE

Technology (HIT)

health information management across computerized systems and the secure exchange of health information between consumers, providers, payers, and quality monitors."

Health Information Technology for Economic and Clinical Health Act (HITECH)

Part of the American Recovery and Reinvestment Act that promotes the adoption and meaningful use of health information technology and addresses privacy and security concerns associated with the electronic transmission of health information.

https://www.healthit.gov/policy-researchers-implementers/health-it-legislation

Health Insurance Portability and Accountability Act (HIPAA)

Legislation that provides data privacy and security provisions for safeguarding medical information. The act, which was signed into law by President Bill Clinton in August 1996, contains five sections, or titles.

http://searchdatamanagement.techtarget.com/definition/HIPAA

Heat Map "A graphical representation of data where the individual values contained in a matrix are represented as colors."

https://en.wikipedia.org/wiki/Heat_map

Hierarchical Database

Within this type of database, data is organized in a tree-like structure, implying a single parent for each record. These type of databases were widely used on mainframe computers in the 1960’s and are found today on older systems.

Shelly, G. B., Cashman, T. J. & Rosenblatt, H. J. (2003). Systems analysis and design (5th ed.). Boston: Course Technology.

Hierarchy In the multi-dimensional data model, the parts of a dimension that organizes data at different levels of aggregation.

Oracle Corporation. (2003). Oracle application developer’s guide—10 g Release 1 (10.1), Part Number B10333-02. Retrieved from https://docs.oracle.com/cd/B12037_01/olap.101/b10333/title.htm

Histogram A bar graph that shows the number of times data occur within certain ranges or intervals. The intervals are displayed as adjacent bars with no gaps.

https://en.wikipedia.org/wiki/Histogram

Hypothesis "A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon."

https://en.wikipedia.org/wiki/Hypothesis

Page 15: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 15

TERM DEFINITION SOURCE

Incremental Extraction

The process of capturing" only the changes that occurred in the source data since the last extraction" and is used for ongoing warehouse maintenance.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.) (p. 443). Upper Saddle River, NJ: Pearson/Prentice Hall.

Indicators Metrics that have context assigned to them so that the baseline data is depicted as well as indication of the target or goal.

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (pp. 118-119). Hoboken, NJ: John Wiley & Sons, Inc.

Inferential Statistics

"A statistical analysis that infers properties about a population: this includes testing hypotheses and deriving estimates. The population is assumed to be larger than the observed data set; in other words, the observed data is assumed to be sampled from a larger population."

https://en.wikipedia.org/wiki/Statistical_inference

Infographic "A visual presentation of information in the form of a chart, graph, or other image accompanied by minimal text, intended to give an easily understood overview, often of a complex subject, e.g., a mass-transit infographic that uses different colors to represent different modes of transportation."

https://www.focusforhealth.org/infographics/

Information "Data that has been given meaning by way of relational connection. This 'meaning' can be useful, but does not have to be."

http://www.systems-thinking.org/dikw/dikw.htm

Instance Each row (record) in a database table. Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Institute for Healthcare Improvement (IHI)

"An independent not-for-profit organization based in Cambridge, Massachusetts that is a leading innovator, convener, partner, and driver of results in health and health

http://www.ihi.org/about/Pages/default.aspx

Page 16: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 16

TERM DEFINITION SOURCE

care improvement worldwide. At our core, we believe everyone should get the best care and health possible. This passionate belief fuels our mission to improve health and health care."

Institute of Medicine (IOM)

Now called Health and Medicine Division (HDM). A division of the National Academies of Sciences, Engineering, and Medicine. The Academies are private, nonprofit institutions that provide independent, objective analysis and advice to the nation to solve complex problems and inform public policy decisions related to science, technology, and medicine.

http://nationalacademies.org/hmd/About-HMD.aspx

Integrity Constraints

"Rules that data must follow in a database to ensure that the data is accurate and reliable".

Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.) (p. 382). Upper Saddle River, NJ: Prentice Hall.

Interquartile Range (IQR)

A measure of variability, based on dividing a data set into four equal parts. The IQR includes only the middle half of the data set which helps eliminate the influence of outliers.

https://www.reference.com/math/interquartile-range-show-30139cd87538bead?qo=cdpArticles

Interval "Data that can be rank ordered and categorized like ordinal data, except the intervals between each value are equally split. The most common example is temperature in degrees Fahrenheit."

http://www.usablestats.com/lessons/noir

Key Performance Indicators (KPIs)

"A set of measures (or metrics) focusing on those aspects of organizational performance that are the most critical for the current and future success of the organization."

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (p. 120). Hoboken, NJ: John Wiley & Sons, Inc.

Knowledge Knowledge is the appropriate collection of information, such that its intent is to be useful.

http://searchcio.techtarget.com/definition/knowledge

LEAN A quality improvement framework with a focus on maximizing customer value while minimizing waste. "Simply put,

http://www.lean.org/WhatsLean/

Page 17: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 17

TERM DEFINITION SOURCE

Lean means creating more value for customers with fewer resources."

Learning Health System (LHS)

"As defined by the Institute of Medicine (IOM), are characterized by a number of core attributes. Particularly important is a consistent emphasis on a collaborative approach that shares data and insights across boundaries to drive better, more efficient medical practice and patient care."

https://sites.duke.edu/rethinkingclinicaltrials/learning-healthcare-systems/

Levels In the multi-dimensional data model, indicates the position in the hierarchy in a dimension.

Oracle Corporation. (2003). Oracle application developer’s guide—10 g Release 1 (10.1), Part Number B10333-02. Retrieved from https://docs.oracle.com/cd/B12037_01/olap.101/b10333/title.htm

Line Graph Line graphs compare two variables. Each variable is plotted along an axis. A line graph has a vertical axis and a horizontal axis. So, for example, if you wanted to graph the height of a ball after you have thrown it, you could put time along the horizontal, or x-axis, and height along the vertical, or y-axis.

http://mste.illinois.edu/courses/ci330ms/youtsey/lineinfo.html

Line of Best Fit A line of best fit (or "trend" line) is a straight line that best represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points.

http://www.regentsprep.org/regents/math/algebra/AD4/linefit.htm

Linear Equation An equation between two variables that gives a straight line when plotted on a graph.

https://www.wordnik.com/words/linear%20equation

Machine Learning

"A type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data."

http://whatis.techtarget.com/definition/machine-learning

Page 18: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 18

TERM DEFINITION SOURCE

Mean The "mean" is the "average" you're used to, where you add up all the numbers and then divide by the number of numbers.

https://www.techopedia.com/definition/26136/statistical-mean

Measures Measures are the core of the dimensional data warehouse model and are data elements that can be summed, averaged, or mathematically manipulated. Measures are stored in the fact tables.

Oracle Corporation. (2003). Oracle application developer’s guide—10 g Release 1 (10.1), Part Number B10333-02. Retrieved from https://docs.oracle.com/cd/B12037_01/olap.101/b10333/title.htm

Measure When used as a noun in health care, "typically refers to a quantitative value representing some aspect of patient care that may (or may not) be linked to specific performance and QI (quality improvement) initiatives."

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (p. 116). Hoboken, NJ: John Wiley & Sons, Inc.

Measures of Central Tendency

A central or typical value of a set of numbers. The most common measures of central tendency are the arithmetic mean, the median and the mode.

https://en.wikipedia.org/wiki/Central_tendency

Median The "middle" value in the list of numbers. To find the median, your numbers have to be sorted in numerical order.

http://www.purplemath.com/modules/meanmode.htm

Metadata Data that "describes the properties or characteristics of data and the context of that data. Used by developers to create programs, procedures, controls, and queries to manipulate and manage the data in a database."

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.) (p. 7). Upper Saddle River, NJ: Pearson/Prentice Hall.

Metric "Some aspect of healthcare quality or performance to which a quantitative value is attributed for the purposes of monitoring and evaluation." Could be described as a measure with more focus and usually specifies a time period.

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (p. 117). Hoboken, NJ: John Wiley & Sons, Inc.

Mission The reason an organization or program exists, what it does and its overall intention, priorities and goals

http://www.businessdictionary.com/definition/mission-statement.html

Page 19: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 19

TERM DEFINITION SOURCE

Mode The value that occurs most often. If no number is repeated, then there is no mode for the list. If two or more numbers are repeated the same number of times, the data set is considered multimodal.

http://www.purplemath.com/modules/meanmode.htm

MOLAP Stands for multidimensional online analytical processing; "a set of graphical tools that enable a multidimensional view of the data", which can be analyzed using a simple graphical user interface.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.) (p. 467). Upper Saddle River, NJ: Pearson/Prentice Hall.

Mutually Exclusive Categories

Two events are defined to be mutually exclusive if they cannot happen at the same time. In other words, the other event cannot happen. Mutually exclusive events are sometimes referred to as disjointed events.

http://www.investopedia.com/terms/m/mutuallyexclusive.asp

National Quality Forum (NQF)

"A not-for-profit, nonpartisan, membership-based organization that works to catalyze improvements in healthcare. NQF measures and standards serve as a critically important foundation for initiatives to enhance healthcare value, make patient care safer, and achieve better outcomes."

http://www.qualityforum.org/About_NQF/

Natural Language Processing (NLP)

Natural language processing (NLP) applies computer science, linguistics and artificial intelligence techniques for the purpose of achieving human-like language processing for a range of tasks or applications.

https://en.wikipedia.org/wiki/Natural_language_processing

Natural Numbers

"A natural number is a number that occurs commonly and obviously in nature. As such, it is a whole, non-negative number."

http://whatis.techtarget.com/definition/natural-number

Network Database

These databases are similar to a hierarchical database except that each child entity can have more than one parent. Each relationship is called a set and each set contains a parent record or owner and a child or member. They

Shelly, G. B., Cashman, T. J. & Rosenblatt, H. J. (2003). Systems analysis and design (5th ed.). Boston: Course Technology.

Page 20: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 20

TERM DEFINITION SOURCE

are not used widely today but may still exist on some older legacy systems.

Nominal Nominal data is considered the lowest level of measurement. Items are differentiated by a simple naming system but are not measured and have no particular order. Examples include blood type, gender, nationality, etc.

http://changingminds.org/explanations/research/measurement/types_data.htm

Normal Distribution

A bell-shaped symmetrical, or nearly symmetrical, frequency distribution curve characteristic of many economic, natural, social, and other real world phenomenon such as IQ scores and height variation within a population.

https://en.wikipedia.org/wiki/Normal_distribution

Numerical "These data have meaning as a measurement, such as a person’s height, weight, IQ, or blood pressure; or they’re a count... Statisticians also call numerical data quantitative data."

http://www.dummies.com/education/math/statistics/types-of-statistical-data-numerical-categorical-and-ordinal/

Object Oriented Database

Data and their relationships are modeled in a single structure known as an object. These databases were designed to handle complex types of data such as X-ray images, ultrasound images, MRI scans and electrocardiograms.

Rob, P., & Coronel, P. (2004). Database systems: Design, implementation & management (6th ed.). Boston, MA: Course Technology.

Objectives "A specific result that a person or system aims to achieve within a time frame and with available resources. In general, objectives are more specific and easier to measure than goals. Objectives are basic tools that underlie all planning and strategic activities."

http://www.businessdictionary.com/definition/objective.html

OLAP Stands for online analytical processing; computer processing that enables a user to easily and selectively extract and view data from different points of view.

http://searchdatamanagement.techtarget.com/definition/OLAP

Ordinal Ordinal data can be rank ordered (1st, 2nd, 3rd, etc.) and categorized, but relative degree of differences cannot be quantified. An example is 'completely agree', 'mostly agree', 'mostly disagree',

http://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm#o

Page 21: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 21

TERM DEFINITION SOURCE

'completely disagree' when measuring opinion.

Outcome Indicators

Indicators that measure the overall system performance. Includes the voice of the patient and results of improvement initiatives.

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (p. 126). Hoboken, NJ: John Wiley & Sons, Inc.

Outlier A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.

https://en.wikipedia.org/wiki/Outlier

Pareto Chart "A type of chart, named after Vilfredo Pareto that contains both bars and a line graph, where categories are represented in descending order by bars and the cumulative total is represented by the line."

https://en.wikipedia.org/wiki/Pareto_chart

PDSA A model for testing a change by developing a plan to test the change (Plan), carrying out the test (Do), observing and learning from the consequences (Study), and determining what modifications should be made to the test (Act).

http://www.ihi.org/resources/pages/tools/plandostudyactworksheet.aspx

Percentile "A measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall."

https://en.wikipedia.org/wiki/Percentile

Performance Improvement (PI)

A pro-active and continuous study of processes (also called Quality Improvement - QI) with the intent to prevent or decrease problems by identifying areas of opportunity and testing new approaches to fix underlying causes of persistent/systemic problems.

https://en.wikipedia.org/wiki/Performance_improvement

Personal Health Records (PHR’s)

"An electronic application used by patients to maintain and manage their health information in a private, secure, and confidential environment."

https://www.healthit.gov/providers-professionals/faqs/what-personal-health-record

Personalized Medicine

Focuses on identifying which approaches will be effective for which

https://en.wikipedia.org/wiki/Personalized_medicine

Page 22: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 22

TERM DEFINITION SOURCE

patients based on genetic, environmental, and lifestyle factors, also referred to as precision medicine.

Pie Chart "A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion."

https://en.wikipedia.org/wiki/Pie_chart

Polar-area Diagrams

Circular chart where the area of each colored wedge, measured from the center, is proportional to the statistic being represented. They were invented by Florence Nightingale to dramatize the extent of needless deaths in British military hospitals during the Crimean War (1854-56). She called such diagrams “coxcombs.”

http://www.ihi.org/resources/pages/tools/plandostudyactworksheet.aspx

Population Health

Health outcomes of a group of individuals, including the distribution of such outcomes within the group.

https://en.wikipedia.org/wiki/Pie_chart#Polar_area_diagram

Preattentive Attributes

An attribute like form, color, or spatial position that draws focus to an element. Preattentive attributes can be powerful when used knowledgeably for visual communication. Graphs and tables usually focus on the attributes of form, color or spatial position. This can be illustrated visually when one object stands out as different from the rest, based on a variation of the attribute.

http://www.storytellingwithdata.com/blog/2011/10/google-example-preattentive-attributes

Preattentive Processing

Preattentive processing happens below the level of consciousness at a very high speed and is tuned to detect a specific set of visual attributes.

https://en.wikipedia.org/wiki/Pre-attentive_processing

Precision Medicine

"An emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person."

https://www.nih.gov/precision-medicine-initiative-cohort-program

Predictive Analytics

"The branch of advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics,

www.predictiveanalyticstoday.com/what-is-predictive-analytics/

Page 23: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 23

TERM DEFINITION SOURCE

modeling, machine learning, and artificial intelligence to analyze current data to make predictions about [the] future."

Prescriptive Analytics

"The area of business analytics (BA) dedicated to finding the best course of action for a given situation. Prescriptive analytics is related to both descriptive and predictive analytics. While descriptive analytics aims to provide insight into what has happened and predictive analytics helps model and forecast what might happen, prescriptive analytics seeks to determine the best solution or outcome among various choices, given the known parameters."

http://searchcio.techtarget.com/definition/Prescriptive-analytics

Primary Key In the context of relational databases, a unique identifier for each distinct row within a table. It can either be a normal attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one record per person) or it can be generated by the DBMS (such as a globally unique identifier).

Rob, P., & Coronel, P. (2004). Database systems: Design, implementation & management (6th ed.). Boston, MA: Course Technology.

Process Indicators

An indicator that "measures how well key components (processes, workflows, steps) are performing.”

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (p. 126). Hoboken, NJ: John Wiley & Sons, Inc.

Proportion A part, share, or number considered in comparative relation to a whole.

http://www.purplemath.com/modules/ratio2.htm

Public Health The science of protecting and improving the health of families and communities through promotion of healthy lifestyles, research for disease and injury prevention and detection and control of infectious diseases.

Winslow, Charles-Edward Amory (1920). The untilled field of public health. Modern Medicine, 2, 183–191. Retrieved from https://en.wikipedia.org/wiki/Personalized_medicine

Page 24: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 24

TERM DEFINITION SOURCE

Qualitative Qualitative data gives descriptive information which can be observed, but not measured, such as color, texture, appearance, etc.

http://www.businessdictionary.com/definition/qualitative-data.html

Quality Improvement (QI)

"Systematic and continuous actions that lead to measurable improvement in health care services and the health status of targeted patient groups."

http://www.hrsa.gov/quality/toolbox/methodology/qualityimprovement/

Quality Improvement Framework

A conceptual model for the analysis of performance and efforts to improve it. Examples include TQM, Six Sigma, Lean and PDSA.

https://www.qualitymeasures.ahrq.gov/expert/expert-commentary/32943

Quantitative Data that can be measured with numbers, such as length, height, cost, etc.

http://www.businessdictionary.com/definition/quantitative-data.html

Quartile One of the values that divides the distribution of a variable into four groups having equal frequencies.

https://en.wikipedia.org/wiki/Quartile

Query "A request for information from a database."

http://www.webopedia.com/TERM/Q/query.html

Query-by-Example (QBE)

"A query method which uses a graphical interface to enable the user to specify the criteria for selecting records."

Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.) (p. 382). Upper Saddle River, NJ: Prentice Hall.

Range "The smallest interval which contains all the data and provides an indication of statistical dispersion, calculated as the difference between the largest and smallest values in a data set."

https://en.wikipedia.org/wiki/Range_(statistics)

Ratio Shows the relative sizes of two quantities; normally expressed as a fraction or decimal.

http://www.businessdictionary.com/definition/ratio.html

Ratio Scale A scale of measurement having a fixed zero value which permits the comparison of differences of values by addition, subtraction, multiplication and division.

Collins English dictionary (2017). Glasgow, GB: Harper Collins Publishers Limited. Retrieved from https://www.collinsdictionary.com/dictionary/english/ratio-scale

Page 25: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 25

TERM DEFINITION SOURCE

Record A row in a database table. Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.). Upper Saddle River, NJ: Prentice Hall.

Regression Line "A straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x."

https://faculty.etsu.edu/gardnerr/1530/Chapter5.pdf

Relational Database

Most common type of database. In a relational database, tables are connected via primary and foreign keys.

Rob, P., & Coronel, P. (2004). Database systems: Design, implementation & management (6th ed.). Boston, MA: Course Technology.

Relationship Describes how two or more entities or tables are related within the database; can be one to one (1:1), one to many (1:M) and many to many (M:M).

Shelly, G. B., Cashman, T. J. & Rosenblatt, H. J. (2003). Systems analysis and design (5th ed.). Boston: Course Technology.

Relative Numbers

Relative numbers or values are dependent on other numbers. In other words, they are relative to other (absolute) numbers. Most often, those other numbers are not even given. For example, 2 in 5 cards drive too fast on this road. You still do not know the precise number of cars that drove too fast.

http://www.dr-aart.nl/Statistics-absolute-and-relative.html

Return on Investment (ROI)

"A performance measure used to evaluate the efficiency of an investment or to compare the efficiency of a number of different investments. ROI measures the amount of return on an investment relative to the investment’s cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment, and the result is expressed as a percentage or a ratio."

http://www.investopedia.com/terms/r/returnoninvestment.asp

Page 26: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 26

TERM DEFINITION SOURCE

Run Chart "A line graph of data plotted over time. By collecting and charting data over time, you can find trends or patterns in the process."

http://www.pqsystems.com/qualityadvisor/DataAnalysisTools/run_chart.php

Scatterplot "Scatter plots are similar to line graphs in that they use horizontal and vertical axes to plot data points. However, they have a very specific purpose. Scatter plots show how much one variable is affected by another. The relationship between two variables is called their correlation."

http://mste.illinois.edu/courses/ci330ms/youtsey/scatterinfo.html

Secondary Axis In some scenarios, a primary axis of each type (a single X-axis and a single Y-axis) is not enough, and it's necessary to use a secondary axis. A secondary axis is useful when comparing two value sets with two distinct data ranges that share a common category.

https://msdn.microsoft.com/en-us/library/dd207018.aspx

Short Term Memory

The type of memory that acts as a kind of “scratch-pad” for temporary recall of the information which is being processed at any point in time, and has been referred to as "the brain's Post-it note". It can be thought of as the ability to remember and process information at the same time. It holds a small amount of information (typically around 7 items or even less) in mind in an active, readily-available state for a short period of time (typically from 10 to 15 seconds, or sometimes up to a minute).

http://www.alleydog.com/glossary/definition.php?term=Short-Term%20Memory

Six Sigma "A management philosophy developed by engineer Bill Smith at Motorola in 1986 that utilizes a set of tools and techniques to improve business processes. The philosophy emphasizes setting extremely high objectives, collecting data and analyzing results to a fine degree as a way to reduce defects in products and services."

http://searchcio.techtarget.com/definition/Six-Sigma

Skew A measure of the asymmetry of data in relation to its mean. "Data can be

https://www.mathsisfun.com/data/skewness.html

Page 27: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 27

TERM DEFINITION SOURCE

"skewed", meaning it tends to have a long tail on one side or the other. A negative skew has a long 'tail' on the negative of side of the peak, and some people say it is 'skewed to the left.' A normal distribution is not skewed, and is perfectly symmetrical. And positive skew is when the long tail is on the positive side of the peak, and some people say it is 'skewed to the right'."

Source System "Data management term for an information storage system (commonly implemented on a computer system) that is the authoritative data source for a given data element or piece of information."

https://en.wikipedia.org/wiki/System_of_record

Spread "The dispersion, variability, or scatter denotes how stretched or squeezed a distribution is, commonly measured by variance, standard deviation and interquartile range."

https://en.wikipedia.org/wiki/Statistical_dispersion

Staging Area Area where data is moved after extraction from the source databases so the data can undergo additional processing such as data cleansing and data transformation.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Stakeholder "A person, group, or organization that has an interest or concern in the organization. Stakeholders can affect or be affected by the organization's actions, objectives, or policies."

http://www.businessdictionary.com/definition/stakeholder.html

Standard Deviation

Standard deviation (SD, also

represented by the Greek letter sigma or the Latin letter s) is a measure that is used to quantify the amount of variation or dispersion of a set of data values.

https://en.wikipedia.org/wiki/Standard_deviation

Star Schema Also known as the Dimensional Model; a simple data warehouse model used to store multidimensional data in a relational database structure.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper

Page 28: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 28

TERM DEFINITION SOURCE

Saddle River, NJ: Pearson/Prentice Hall.

Static Extraction The process of capturing of the source data at a point in time and is used to fill the data warehouse initially.

Hoffer, J. A., Prescott, M. B., & McFadden, F. R. (2007). Modern database management (8th ed.). Upper Saddle River, NJ: Pearson/Prentice Hall.

Statistical Process Control (SPC) Chart

A graph used to determine the stability and predictability of a process. A control chart is a line graph which has a central line for the average, an upper line for the upper control limit and a lower line for the lower control limit. By comparing current data to these lines, you can draw conclusions about whether the process variation is consistent (in control) or is unpredictable (out of control, affected by special causes of variation).

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (pp. 154-155). Hoboken, NJ: John Wiley & Sons, Inc.

Statistics "Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data."

https://www.dice.com/skills/Statistics.html

Stem and Leaf Plot

A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf" (usually the last digit), also called a stemplot.

https://www.mathsisfun.com/definitions/stem-and-leaf-plot.html

Stemplot A Stemplot is a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf" (usually the last digit), also called a stem and leaf plot.

https://www.mathsisfun.com/definitions/stem-and-leaf-plot.html

Storytelling The use of stories to present data. "Visualization is powerful, but even more powerful is the ability to connect visuals to tell a story with data."

https://tdwi.org/events/onsite-education/onsite/sessions/business-analytics/data-storytelling-the-new-horizon-in-business-analytics.aspx

Strategic Goals The planned objectives that an organization strives to achieve over the next year, five years, and ten years. They reflect the analysis you do that

http://www.businessdictionary.com/definition/strategic-goals.html

Page 29: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 29

TERM DEFINITION SOURCE

starts with creating a vision, a role statement and a mission statement, and then your analysis of your environment, strengths, weaknesses, opportunities, and threats.

Structure Indicators

Indicators that measure how care is organized. The stable elements and infrastructure of a health care delivery system.

Strome, Trevor L. (2013). Healthcare analytics for quality and performance improvement (pp. 56-57). Hoboken, NJ: John Wiley & Sons, Inc.

Structured Data Data with a high level of organization, such as information in a relational database. When information is highly structured and predictable, search engines can more easily organize and display it.

https://brightplanet.com/2012/06/structured-vs-unstructured-data/

Structured Query Language

A query language that uses English like statements to assist in creating a query in a database. Referred to as SQL.

Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.). Upper Saddle River, NJ: Prentice Hall.

Succinct "Characterized by clear, precise expression in a few words; concise and terse."

https://www.wordnik.com/words/succinct

Text Analytics Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into useful business intelligence. Text analytics processes can be performed manually, but the amount of text-based data available to companies today makes it increasingly important to use intelligent, automated solutions.

http://www.clarabridge.com/text-analytics/

Time Series Data

A time-series relationship is nothing more than a series of quantitative values that measure how something changed with time. Examples include a change in year, quarter, month, week, day, hour, minute or even second. Graphs are commonly used to display time series data because they display patterns and trends in value over time.

http://www.businessdictionary.com/definition/time-series-data.html

Page 30: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 30

TERM DEFINITION SOURCE

Timetable "A schedule listing the times at which certain events, such as arrivals and departures at a transportation station, are expected to take place.”

http://www.thefreedictionary.com/timetable

TQM/TQI A management approach to long–term success through customer satisfaction in which members of an organization participate in improving processes, products, services, and the culture in which they work. Stands for Total Quality Management/Total Quality Improvement.

http://asq.org/learn-about-quality/total-quality-management/overview/overview.html

Transactional Database

Also known as operational database; a database that contains current data used in day to day business processes with a focus on data input, supports concurrent processing of multiple transactions and allows users to read, add and modify records.

Rob, P., & Coronel, P. (2004). Database systems: Design, implementation & management (6th ed.). Boston, MA: Course Technology.

Trend "A pattern of gradual change in a condition, output, or process, or an average or general tendency of a series of data points to move in a certain direction over time, represented by a line or curve on a graph."

http://www.businessdictionary.com/definition/trend.html

Univariate Involving one variate or variable quantity.

https://en.wikipedia.org/wiki/Univariate

Unstructured Data

Information that either does not have a pre-defined data model or is not organized in a pre-defined manner, such as free text.

https://en.wikipedia.org/wiki/Unstructured_data

Validation Rule "A rule that allows the database management system to enforce the integrity constraints."

Capron, H. L., & Johnson, J. A. (2004). Computers: Tools for an information age (8th ed.) (p. 382), Upper Saddle River, NJ: Prentice Hall.

Validity One of the V's of Big Data; refers to the accuracy or veracity of the data.

http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

Value Based Care

"The intersection of cost and quality. Value-based initiatives shift the care

https://www.wellcentive.com/what-is-value-based-care/

Page 31: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 31

TERM DEFINITION SOURCE

delivery focus from volume to value and redefine financial incentives toward reduced costs. In this model, physicians must think about the entire patient experience among all care settings and between episodic visits."

Value Stream Maps

"A Lean manufacturing or Lean Enterprise technique used to document, analyze, and improve the flow of information or materials required to produce a product or service for a customer."

https://www.isixsigma.com/dictionary/value-stream-mapping/

Variable In statistical language, a variable is "any characteristic, number, or quantity that can be measured or counted." Also known as a data item. There are many ways variables can be described.

http://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+language+-+what+are+variables

Variance "A numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa."

http://stattrek.com/statistics/dictionary.aspx?definition=variance

Variation "A measure that describes how spread out or scattered a set of data. It is also known as measures of dispersion or measures of spread."

http://www.icoachmath.com/math_dictionary/measure_of_variation.html

Variety One of the V's of Big Data; refers to the number of different types of data.

http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

Velocity One of the V's of Big Data; refers to the speed of data processing required.

http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

Vision A vision is an organization's road map, indicating both what the company wants to become and guiding transformational initiatives by setting a defined direction for the company's growth.

http://www.businessdictionary.com/definition/vision-statement.html

Visual Perception

"Visual perception is the ability to interpret the surrounding environment

https://en.wikipedia.org/wiki/Visual_perception

Page 32: Introduction to Data Analytics Glossary · 2017. 5. 30. · Introduction to Data Analytics Glossary 3 TERM DEFINITION SOURCE Business Rules "Precise, non-ambiguous descriptions of

Introduction to Data Analytics Glossary 32

TERM DEFINITION SOURCE

by processing information that is contained in visible light. The resulting perception is also known as eyesight, sight, or vision (adjectival form: visual, optical, or ocular)."

Volume One of the V's of Big Data; refers to the amount of data.

http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

Wisdom "An extrapolative and non-deterministic, non-probabilistic process."

Ackoff, R.L. (1989). From Data to Wisdom. Journal of Applied Systems Analysis, 16, 3-9.

Workflow Workflow is the series of activities that are necessary to complete a task.

http://searchcio.techtarget.com/definition/workflow

Working Memory

"The term working memory is often used interchangeably with short-term memory, although technically working memory refers more to the whole theoretical framework of structures and processes used for the temporary storage and manipulation of information, of which short-term memory is just one component." Two fundamental characteristics of working memory are: it is temporary and it has limited storage capacity.

http://www.human-memory.net/types_short.html

X Axis "The line on a graph that runs horizontally (left-right) through zero."

https://www.mathsisfun.com/definitions/x-axis.html

Y Axis "The line on a graph that runs vertically (up-down) through zero."

https://www.mathsisfun.com/definitions/y-axis.html