Chapter 2 Exploring Data with Graphs and Numerical Summaries

Agresti/Franklin Statistics, 1 of 63

Chapter 2Exploring Data with Graphs and

Numerical Summaries

Learn ….The Different Types of Data

The Use of Graphs to Describe Data

The Numerical Methods of Summarizing Data

Section 2.1

What are the Types of Data?

In Every Statistical Study:

Questions are posedCharacteristics are observed

Characteristics are Variables

A Variable is any characteristic that is recorded for subjects in the study

Variation in Data

The terminology variable highlights the fact that data values vary.

Example: Students in a Statistics Class

Variables:• Age• GPA• Major• Smoking Status• …

Data values are called observations

Each observation can be:

• Quantitative

• Categorical

Categorical Variable Each observation belongs to one of a set of

categories

Examples:• Gender (Male or Female)• Religious Affiliation (Catholic, Jewish, …)• Place of residence (Apt, Condo, …)• Belief in Life After Death (Yes or No)

Quantitative Variable Observations take numerical values

Examples:• Age• Number of siblings• Annual Income• Number of years of education completed

Graphs and Numerical Summaries

Describe the main features of a variable

For Quantitative variables: key features are center and spread

For Categorical variables: key feature is the percentage in each of the categories

Quantitative Variables

Discrete Quantitative Variables

Continuous Quantitative Variables

Discrete

A quantitative variable is discrete if its possible values form a set of separate numbers such as 0, 1, 2, 3, …

Examples of discrete variables Number of pets in a household Number of children in a family Number of foreign languages spoken

Continuous

A quantitative variable is continuous if its possible values form an interval

Examples of Continuous Variables Height Weight Age Amount of time it takes to complete

an assignment

Frequency Table

A method of organizing data

Lists all possible values for a variable along with the number of observations for each value

Example: Shark Attacks

Example: Shark Attacks What is the variable?

Is it categorical or quantitative?

How is the proportion for Florida calculated?

How is the % for Florida calculated?

Insights – what the data tells us about shark attacks

Identify the following variable as categorical or quantitative:

Choice of diet (vegetarian or non-vegetarian):

a. Categoricalb. Quantitative

Number of people you have known who have been elected to political office:

a. Categoricalb. Quantitative

Identify the following variable as categorical or quantitative:

Identify the following variable as discrete or continuous:

The number of people in line at a box office to purchase theater tickets:

a. Continuousb. Discrete

The weight of a dog:a. Continuousb. Discrete

Identify the following variable as discrete or continuous:

Section 2.2

How Can We Describe Data Using Graphical Summaries?

Graphs for Categorical Data Pie Chart: A circle having a “slice of

pie” for each category

Bar Graph: A graph that displays a vertical bar for each category

Example: Sources of Electricity Use in the U.S. and Canada

Pie Chart

Bar Chart

Pie Chart vs. Bar Chart Which graph do you prefer? Why?

Graphs for Quantitative Data Dot Plot: shows a dot for each

observation

Stem-and-Leaf Plot: portrays the individual observations

Histogram: uses bars to portray the data

Example: Sodium and Sugar Amounts in Cereals

Dotplot for Sodium in Cereals Sodium Data: 0 210 260 125 220 290 210 140 220 200 125

170 250 150 170 70 230 200 290 180

Stem-and-Leaf Plot for Sodium in Cereal

Sodium Data: 0 210260 125220 290210 140220 200125 170250 150170 70230 200290 180

Frequency TableSodium Data: 0 210

260 125220 290210 140220 200125 170250 150170 70230 200290 180

Histogram for Sodium in Cereals

Which Graph? Dot-plot and stem-and-leaf plot:

• More useful for small data sets• Data values are retained

Histogram• More useful for large data sets• Most compact display• More flexibility in defining intervals

Shape of a Distribution Overall pattern

• Clusters?• Outliers?• Symmetric?• Skewed?• Unimodal?• Bimodal?

Symmetric or Skewed ?

Example: Hours of TV Watching

Identify the minimum and maximum sugar values:

a. 2 and 14 b. 1 and 3c. 1 and 15 d. 0 and 16

Consider a data set containing IQ scores for the general public:What shape would you expect a histogram of

this data set to have?a. Symmetricb. Skewed to the leftc. Skewed to the rightd. Bimodal

Consider a data set of the scores of students on a very easy exam in which most score very well but a few score very poorly:

What shape would you expect a histogram of this data set to have?

a. Symmetricb. Skewed to the leftc. Skewed to the rightd. Bimodal

Section 2.3

How Can We describe the Center of Quantitative Data?

Mean The sum of the observations

divided by the number of observations

Median

The midpoint of the observations when they are ordered from the smallest to the largest (or from the largest to the smallest)

Find the mean and medianCO2 Pollution levels in 8 largest nations measured in

metric tons per person: 2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2 a. Mean = 4.6 Median = 1.5 b. Mean = 4.6 Median = 5.8c. Mean = 1.5 Median = 4.6

Outlier An observation that falls well above

or below the overall set of data

The mean can be highly influenced by an outlier

The median is resistant: not affected by an outlier

The value that occurs most frequently.

The mode is most often used with categorical data

Section 2.4

How Can We Describe the Spread of Quantitative Data?

Measuring Spread: Range

Range: difference between the largest and smallest observations

Measuring Spread: Standard Deviation

Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations

Empirical RuleFor bell-shaped data sets:

Approximately 68% of the observations fall within 1 standard deviation of the mean

Approximately 95% of the observations fall within 2 standard deviations of the mean

Approximately 100% of the observations fall within 3 standard deviations of the mean

Parameter and Statistic

A parameter is a numerical summary of the population

A statistic is a numerical summary of a sample taken from a population

Section 2.5

How Can Measures of Position Describe Spread?

Quartiles Splits the data into four parts The median is the second quartile, Q2

The first quartile, Q1, is the median of the lower half of the observations

The third quartile, Q3, is the median of the upper half of the observations

Example: Find the first and third quartiles

Prices per share of 10 most actively traded stocks on NYSE (rounded to nearest $)

2 4 11 12 13 15 31 31 37 47

a. Q1 = 2 Q3 = 47b. Q1 = 12 Q3 = 31c. Q1 = 11 Q3 = 31d. Q1 =11.5 Q3 = 32

Measuring Spread: Interquartile Range

The interquartile range is the distance between the third quartile and first quartile:

IQR = Q3 – Q1

Detecting Potential Outliers

An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile

The Five-Number Summary The five number summary of a

dataset:

• Minimum value• First Quartile• Median• Third Quartile• Maximum value

Boxplot A box is constructed from Q1 to Q3

A line is drawn inside the box at the median

A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier

A line extends outward from the upper end of the box to the largest observation that is not a potential outlier

Boxplot for Sodium DataSodium Data:

0 200 Five Number Summary: 70 210 125 210 Min: 0125 220 Q1: 145 140 220 Med: 200150 230 Q3: 225170 250 Max: 290170 260180 290200 290

Boxplot for Sodium in Cereals

Sodium Data: 0 210

260 125220 290210 140220 200125 170250 150170 70230 200290 180

Z-Score The z-score for an observation measures how far

an observation is from the mean in standard deviation units

An observation in a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3

deviation standardmean -n observatio z

Chapter 2 Exploring Data with Graphs and Numerical Summaries

Documents

Chapter 4 (cont.) Numerical Summaries of Symmetric Data

Topological Summaries: Using Graphs for Chemical Searching and Mining Graphs are a flexible & unifying model Scalable similarity searches through novel

§ 14.3 Numerical Summaries of Data. Numerical Summaries of a Data Set In the last section we looked at ways to graphically represent a data set-- today

SADC Course in Statistics Numerical summaries for quantitative data Module I3 Sessions 4 and 5

Chapter 3 Selected Basic Concepts in Statistics n Expected Value, Variance, Standard Deviation n Numerical summaries of selected statistics n Sampling

Chapter 4 Displaying and Summarizing Quantitative Data Display: Histograms, Stem and Leaf Plots Numerical Summaries: Median, Mean, Quartiles, Standard

Lecture3: HighqualitydataplottinginMATLAB · 2020-02-12 · HighqualitydataplottinginMATLAB Datatypes Dataexists(oriscollected)indiﬀerentforms: Numerical(vectors) Categorical Graphs(networks)

Chapter 2 Describing Data ©. Summarizing and Describing Data Tables and Graphs Tables and Graphs Numerical Measures Numerical Measures

01%2,$#&'()3-4&-$$%/ * · (numerical) tasks. Examples include opening and reading les, complicated mathematical functions, plotting data in graphs and advanced numerical methods

Chapter 3 Graphical and Numerical Summaries of Qualitative Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct graphs

Pg. 268 - 285. Exploratory data analysis describes what data say by using graphs and numerical summaries. What if we want to ask a large group of individuals

Interpreting Data in Graphs - West Contra · PDF file3 MCN 02/13/2012 A great way to introduce line graphs is to have students look at a few graphs without numerical values so they

1 Chapter 2: Exploring Data with Graphs and Numerical Summaries Section 2.1: What Are the Types of Data?

Extracting fuzzy summaries from nosql graph databases · Keywords: Linguistic Summaries, Graph Databases, NoSQL, Fuzzy Graph Mining 1 Introduction Representing data with graphs is

6-1 Numerical Summaries Definition: Sample Mean

Graph mining - lesson 1 Introduction to graphs and networks - … · 2020. 5. 7. · A brief introduction to networks/graphs Visualization Global characteristics Numerical characteristics

Provably Efﬁcient Algorithms for Numerical Tensor Algebra · 2018. 10. 10. · ical methods and computations. Non-numerical algorithms, such as shortest-paths computations on graphs

Numerical Analysis of Quantum Graphs - UCLouvain › HHXIX › Plenaries › Benzi.pdf · Some matrices associated with combinatorial graphs Adjacency matrix A: symmetric N×Nvertex-to-vertex

Graphs What are Graphs? General meaning in everyday math: A plot or chart of numerical data using a coordinate system. Technical meaning in discrete

Project WILD Activities Language Arts Grade K Grasshopper Gravity · labels, lists, graphs, observations, summaries) through drawing or writing; ... Keep records as appropriate --