20
Section 3: Analyzing Data with Fathom _____________________________________________________________________________________________ Learning to Teach Mathematics with Technology: An Integrated Approach Page 1 DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006 Section 3: Analyzing Data with Fathom Summary: Teachers analyze automobile data using Fathom to describe center and spread using dot plots, box plots, histograms. They will examine distributions of univariate data of a quantitative attribute as well as comparison of distributions when a qualitative attribute is added to separate distributions by categories. They will consider pedagogical issues related to the use of various graphical representations, measures of center and spread, and dynamic statistical software. Objectives: Mathematical: Teachers will be able to generate questions to explore given a data set; examine the distribution of a univariate data set using dot plots, box plots, and histograms, including comparing distributions; describe the center and spread of a data set using resistant (median and interquartile range) and nonresistant (mean and standard deviation) measures; develop a conceptual understanding of the usefulness of the standard deviation. Technological: Teachers will be able to use Fathom to create dot plots, box plots, and histograms of univariate data; add a qualitative attribute to an existing graphical distribution of a quantitative attribute, both as a key legend and as a category on the y-axis; plot statistical measures on graphs; compute basic statistics in a summary table. Pedagogical: Teachers will consider the advantages and disadvantages of dynamic linking capabilities and different graphical representations in Fathom; consider how different graphical representations and measures of center and spread can draw attention to similarities and differences when comparing data sets; consider the benefits and drawbacks of tasks to assist students in reasoning about data. Prerequisites: Material discussed in Section 1 of this module Vocabulary: univariate data, bivariate data, interquartile range, deviations, standard deviation, resistant measures, and nonresistant measures. Technology Files: 2006_Vehicles.ftm Emergency Technology Files: 2006_Vehicles_Part_3.ftm Required Materials: Fathom v.2

Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

_____________________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 1

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Section 3: Analyzing Data with Fathom

Summary: Teachers analyze automobile data using Fathom to describe center and

spread using dot plots, box plots, histograms. They will examine distributions of

univariate data of a quantitative attribute as well as comparison of distributions when a

qualitative attribute is added to separate distributions by categories. They will consider

pedagogical issues related to the use of various graphical representations, measures of

center and spread, and dynamic statistical software.

Objectives:

Mathematical: Teachers will be able to

• generate questions to explore given a data set;

• examine the distribution of a univariate data set using dot plots, box plots, and

histograms, including comparing distributions;

• describe the center and spread of a data set using resistant (median and

interquartile range) and nonresistant (mean and standard deviation) measures;

• develop a conceptual understanding of the usefulness of the standard deviation.

Technological: Teachers will be able to use Fathom to

• create dot plots, box plots, and histograms of univariate data;

• add a qualitative attribute to an existing graphical distribution of a quantitative

attribute, both as a key legend and as a category on the y-axis;

• plot statistical measures on graphs;

• compute basic statistics in a summary table.

Pedagogical: Teachers will

• consider the advantages and disadvantages of dynamic linking capabilities and

different graphical representations in Fathom;

• consider how different graphical representations and measures of center and

spread can draw attention to similarities and differences when comparing data

sets;

• consider the benefits and drawbacks of tasks to assist students in reasoning about

data.

Prerequisites: Material discussed in Section 1 of this module

Vocabulary: univariate data, bivariate data, interquartile range, deviations, standard

deviation, resistant measures, and nonresistant measures.

Technology Files: 2006_Vehicles.ftm

Emergency Technology Files: 2006_Vehicles_Part_3.ftm

Required Materials: Fathom v.2

Page 2: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 2

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Section 3: Analyzing Data with Fathom

Data about an observed phenomenon comes in many different forms—often

frequencies, scores, codes, categories, or measurements. In addition, these different

forms of data can be represented in multiple ways. While viewing data in a table may

assist in examining individual cases, graphs and descriptive statistical measures may

help in analyzing and characterizing trends in the whole data set, or the aggregate.

Software tools have made the re-presentation of data in graphs and the calculation of

statistical measures quick and easy. Thus, rather than spending valuable time in

constructing graphical displays or computing measures, software tools facilitate quick

displays and computations that allow for more time to be spent on analyzing the data.

In Sections 1 and 2, we used the software TinkerPlots to assist in analysis of data. In

this Section 3 and 4, we will be using Fathom 2.0 (Key Curriculum Press, 2005.

TinkerPlots and Fathom use a similar interface to allow users to conduct data

analysis. TinkerPlots was designed to encourage users to create graphical displays by

implementing a series of actions, while Fathom allows users to easily create a variety

of standard graphical displays with fewer actions. While TinkerPlots has the

capability to display measures of center on a graph, Fathom includes a whole suite of

tools that can allow users to compute descriptive and inferential statistics. Thus,

Fathom is a much more powerful statistical tool, while TinkerPlots is a powerful tool

for analyzing data in graphical form. Like TinkerPlots, Fathom was created to allow

users to have dynamic control over data—meaning that as you change things in a

document, everything linked to what you are changing will update while you drag.

This linking between tabular data, graphical representations, and statistical measures

can be a powerful tool for exploring data in meaningful ways.

We will start this Section with exploring univariate data (a single attribute in a data

set) and will use what we learn with univariate data to explore bivariate data (two

attributes in a data set).

Page 3: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 3

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Part 1: Asking Questions from Data

Increases in gas prices over the past several years may be one contributing factor to

many automobile manufacturers’ focus on improving vehicle miles per gallon (mpg)

performance and development of alternative types of engines that use a combination

of electricity and gasoline. Many people in America have also revisited the type of

vehicle they own, especially families who have longer commutes to the workplace.

To help us become more informed about the variety of vehicles on the market today,

we have assembled a collection of 41 vehicles manufactured in 2006. Most of the

vehicles (30) were rated as the top fuel economy leaders in the most popular vehicle

classes. This data is depicted in the table on the following page.

Although a typical cycle of data analysis starts with forming questions and then

collecting data to answer the question, textbooks and teachers often use pre-collected

data sets with their students to provide an immediate springboard for exploring a

phenomenon and to begin analyzing data. When students are presented with a given

data set, they need to learn how to examine the data and formulate specific questions

that can be answered knowing the various quantitative and qualitative variables

(called attributes in Fathom) available about each case.

FOCUS ON MATHEMATICS

M-Q1. Review the data in the table. Generate at least four different questions that you

could explore by analyzing this data set.

FOCUS ON PEDAGOGY

P-Q1. Describe two classroom situations, one for which it would be beneficial to use

a pre-collected set of data, and one for which students should be collecting data

themselves. Provide a rationale for the benefits in each situation.

Page 4: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 4

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

2006 Vehicle Data

Mfr Model Class Trans City Hwy AnnFuel Engine Weight

Chevrolet Cargo Van Van Auto 15 20 1940 Standard 4894

Chevrolet Passenger Van Van Auto 15 19 1940 Standard 5295

Ford Escape Fwd Suv Manual 24 29 1270 Standard 3180

Ford Escape Hybrid Fwd Suv Auto 36 31 1000 Hybrid 3627

Ford Focus Wagon Wagon Auto 26 32 1178 Standard 2775

Ford Focus Wagon Wagon Manual 26 34 1138 Standard 2771

Ford Ranger Pickup Truck Auto 21 26 1436 Standard 3028

Ford Ranger Pickup Truck Manual 24 29 1270 Standard 3028

Gmc Savana Cargo Van Van Auto 15 20 1940 Standard 4894

Gmc Savana Passen Van Van Auto 15 19 1940 Standard 5295

Gmc Sierra Hybrid 2wd Truck Auto 18 21 1736 Hybrid 5038

Gmc Sierra Hybrid 4wd Truck Auto 17 19 1835 Hybrid 5357

Honda Accord Sedan Auto 24 34 1178 Standard 3168

Honda Accord Hybrid Sedan Auto 25 34 1176 Hybrid 3589

Honda Civic Hybrid Compact Auto 49 51 660 Hybrid 2875

Honda Insight Compact Auto 57 56 591 Hybrid 1881

Honda Insight Compact Manual 60 66 525 Hybrid 1850

Honda Odyssey Minivan Auto 20 28 1436 Standard 4475

Hyundai Elantra Sedan Manual 27 34 1099 Standard 2784

Hyundai Sonata Sedan Auto 24 33 1221 Standard 3266

Hyundai Sonata Sedan Manual 24 34 1178 Standard 3253

Isuzu Ascender 4wd Suv Auto 22 26 1338 Diesel 4954

Jeep Liberty 4wd Suv Auto 22 26 1338 Diesel 4011

Lexus Rx 330 4wd Suv Auto 18 24 1800 Standard 4065

Lexus Rx 400h 4wd Suv Auto 31 27 1138 Hybrid 4365

Mazda B2300 2wd Truck Auto 21 26 1436 Standard 2994

Mazda B2300 2wd Truck Manual 24 29 1270 Standard 2994

Mazda Tribute 2wd Suv Manual 24 29 1270 Standard 3192

Merc-Benz E320 Cdi Sedan Auto 27 37 1024 Diesel 3835

Mini Mini Cooper Compact Auto 26 34 1242 Standard 2557

Mini Mini Cooper Compact Manual 28 36 1242 Standard 2425

Pontiac Vibe Wagon Manual 30 36 1000 Standard 2700

Saturn Ion Compact Manual 37 44 769 Diesel 2752

Suzuki Aerio Awd Compact Auto 35 42 809 Diesel 2859

Toyota Corolla Matrix Wagon Manual 30 36 1000 Standard 2679

Toyota Prius Sedan Auto 60 51 601 Hybrid 2890

Toyota Scion Xb Wagon Auto 30 34 1066 Standard 2470

Toyota Tacoma 2wd Truck Auto 21 26 1436 Standard 3180

Volkswagen Golf Compact Manual 37 44 769 Diesel 2972

Volkswagen New Beetle Compact Auto 35 42 809 Diesel 2965

Volkswagen New Beetle Compact Manual 37 44 769 Diesel 2884 Mfr: Manufacturer Model: Model name Class: Vehicle classes used to classify by passenger and cargo volume (cars) and gross vehicle weight

rating (trucks). Trans: either Automatic or Manual Transmission City: estimated MPG in City driving Hwy: estimated MPG in Highway driving

AnnFuel: Estimated annual fuel cost assuming 15,000 miles per year (55% city and 45% hwy) and average fuel price Engine: Standard (accepts

unleaded gas), Diesel (accepts diesel), or Hybrid (runs part on electricity and part on unleaded fuel) Weight: Weight of vehicle, including standard

equipment and all fluids, but no passengers, cargo, or optional equipment Data retrieved from 2006 Fuel Economy Guide

http://www.fueleconomy.gov/feg/download.shtml

Page 5: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 5

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Part 2: Examining Univariate Distributions

To explore the vehicle data using Fathom, open the 2006_Vehicles.ftm file. When

you open the file, you should see one icon: the

collection icon 2006 Vehicles .

Double clicking on the collection icon opens the

inspect collection window which provides a

view of the values for the attributes for each

case (shown in Figure 3.1). The name of each

attribute in the data set will be listed in pink

with one attribute per row. The inspection

window contains 41 data cards, one for each of

the cases in the data set. The data cards are

useful for examining each individual case.

However, to do analysis on the whole data set, it

is helpful to view the data set in a table.

To view a collection of data as a table:

1. click on the Collection icon to select the 2006 vehicle

collection.

2. From the object shelf, drag and drop a New Case Table

into the document.

3. Click and drag a corner of the case table to resize it.

2006 Vehicles

Mfr Model Class Trans City Hwy AnnFuel Engine Weight

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Chevrolet Cargo Van Van Auto 15 20 1940 Standard 4894

Chevrolet Passenger Van Van Auto 15 19 1940 Standard 5295

Ford Escape Fw d Suv Manual 24 29 1270 Standard 3180

Ford Escape Hybrid Fw d Suv Auto 36 31 1000 Hybrid 3627

Ford Focus Wagon Wagon Auto 26 32 1178 Standard 2775

Ford Focus Wagon Wagon Manual 26 34 1138 Standard 2771

Ford Ranger Pickup Truck Auto 21 26 1436 Standard 3028

Ford Ranger Pickup Truck Manual 24 29 1270 Standard 3028

Gmc Savana Cargo Van Van Auto 15 20 1940 Standard 4894

Gmc Savana Passen Van Van Auto 15 19 1940 Standard 5295

Gmc Sierra Hybrid 2w d Truck Auto 18 21 1736 Hybrid 5038

Gmc Sierra Hybrid 4w d Truck Auto 17 19 1835 Hybrid 5357

Honda Accord Sedan Auto 24 34 1178 Standard 3168

Honda Accord Hybrid Sedan Auto 25 34 1176 Hybrid 3589

Figure 3. 3

Tech Tip:

Different cases can be

viewed in the

inspection window by

clicking the right

arrow in the bottom

left corner of the

window. The number

41 indicates that there

are a total of 41 cases

in the collection.

Tech Tip:

If the Case Table does

not show the data, drag

and drop the name of

the collection onto the

body of the case table.

Figure 3. 1

Figure 3. 2

Page 6: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 6

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

The first question we are going to examine about the 2006 vehicle data set is,

“How do these automobiles typically perform in their gas mileage when driving in

the city?”

In order to answer this question, we need a measurable attribute of the automobiles

that can be used to characterize performance in gas mileage when driving in the city.

The attribute that provides a measure of this characteristic is City, which gives the

estimated mpg reported by the US Environmental Protection Agency based on their

lab testing. When asking questions about a phenomenon, students may have difficulty

determining how to collect a measurable attribute that can be used to answer to the

question. This same difficulty can occur when students have access to a pre-collected

data set and want to ask questions about the phenomenon. They may ask questions for

which no quantitative or qualitative attribute in the data is helpful in answering.

To answer our question, it would be useful to view the distribution of the City mpg

graphically. To construct graphs in Fathom, a user must place an attribute on a given

axis. This action will populate the graph with the data associated with this attribute.

The purposeful placement of an attribute onto an axis can help students connect the

numerical data to the graphical representation. The default graph in Fathom is a dot

plot.

To view data graphically,

1. click and drag the Graph object from the

object shelf. The graph will be blank.

2. Click and drag the attribute label (City)

in the Case Table and drop it onto the x-

axis in the graph where it reads “Drop

an attribute here”.

We currently have three representations of our data set: 1) Collection (shown as cards

in the inspection window), 2) case table, and 3) a dot plot. These representations of

data are linked together. This allows a user to locate a case across multiple

representations. In addition, changes in data in one representation will be

automatically changed in all representations of the data.

To change a data value,

1. from the case table, click on the row number for a case (e.g. to choose the

Ford Ranger Pickup, click on the number 7 to highlight that case row).

2. To change the data value graphically, click on the red data icon and drag it to

the left or right. Notice the change in the corresponding numerical value in the

table.

Tech Tip:

You can change the

scale of the axis by

clicking and dragging

the axis. When the

hand is vertical, this

will translate the axis.

When the hand is

horizontal, dragging

will dilate the scale.

Tech Tip:

You can undo a few

changes by selecting

the Undo command

(ctrl-z) from the Edit

menu.

Figure 3. 4

Page 7: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 7

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Since the 2006 vehicle data should be a fixed data set, we need to revert the data to its

original values. In Fathom, a data icon in a graph can be dragged to change its value;

however, it is possible to prevent a user from changing the data value. In the case of

the 2006 vehicle data, this would be wise.

To revert a collection,

1. select the2006 Vehicle collection object.

2. From the File menu, choose Revert Collection.

To prevent changes in a collection by dragging data icons,

1. select any of the open objects (e.g., Collection, Table, Graph) in the

workspace,

2. Under the Collection menu, choose Prevent Changing Values in Graphs.

Although we want to keep the data set fixed, we can still take advantage of the linked

capabilities between the case table and the graph to answer a few questions about the

vehicles performance for City mpg. The linking of these representations allows

students to explore individual cases while also considering the case with the entire

aggregate. Since many students initially are interested in and focus on individual

cases, it can be helpful to ask questions about individual cases that also allow students

to consider the relative position of these cases to the aggregate.

FOCUS ON MATHEMATICS

M-Q2. By clicking on the data icons on the graph, find which vehicles are at the low

and high ends of the distribution.

M-Q3. The Volkswagon New Beetle with Automatic transmission is a trendy favorite

for many Americans. By clicking on the case row for this vehicle in the case table,

use the graph to describe the New Beetle’s standing in City mpg relative to the other

vehicles.

M-Q4. There appears to be a cluster of 4 vehicles with a City mpg above 45. Clicking

and dragging a selection box around those data icons will highlight the vehicles in the

case table. Examine these 4 cases carefully. List two or three attributes these vehicles

have in common.

FOCUS ON PEDAGOGY

P-Q2. What are the advantages and disadvantages of having the representations

dynamically linked when working with a data set?

Page 8: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 8

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

P-Q3. The linking of multiple representations in software like Fathom allows one to

simultaneously view the distribution of an entire data set while focusing on individual

cases. How might this feature help or hinder students’ analysis of the data?

Two other graphical representations often used to display quantitative attributes of

univariate data are histograms and box plots (also called box-and-whisker plots).

Viewing the data in these different representations may illuminate or obscure

different aspects of the distribution.

Drag down two more empty Graph objects into the workspace and drag and

drop the City attribute onto the x-axis of each graph. To assist in comparing the

three different representations, we are going to change one graph to be a box plot and

one to be a histogram.

To create a box plot,

1. from the drop down menu in the top

right corner of the graph window,

select the Box Plot option.

To create a histogram:

1. from the drop down menu in the top

right corner of the graph window,

select the Histogram option.

To adjust the bin width in a histogram:

1. point to a vertical boundary for one bar in the

histogram. The cursor will change to a double

arrowed line.

2. Either click and drag to adjust the bin width

dynamically, or double click and enter a value

for the binAlignment and binWidth (see

Figure 3.6, in our example we can start the

first bin at 15 and have a width of 5).

The distribution of City mpg is shown in Figure 3.7 as a dot plot, box plot, and

histogram. If you click on a case or select a range of cases in any one the graphs, the

corresponding cases will also be highlighted in the other graphs.

Figure 3. 5

Figure 3. 6

Page 9: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 9

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Figure 3. 7

FOCUS ON MATHEMATICS

M-Q5. Compare the representation of the City data in the three graphs in Figure 3.7.

What characteristics of the distribution are more noticeable or are hidden in each

representation?

M-Q6. By only examining the graphs, what would you characterize as a typical City

mpg for these automobiles?

FOCUS ON PEDAGOGY

P-Q4. How can examining a distribution using three different linked graphical

representations be a help or hindrance for students?

P-Q5. How could students use the box plot to describe the center and spread of the

City mpg?

P-Q6. Describe how you could help students understand why the median is not

located in the center of the middle 50% of the data.

Although the median is displayed in the box plot, it may be helpful to display the

location of the median and mean on the graphs. Overlaying a statistical measure on a

graphical representation can provide students with a visual way of conceptualizing

the location of the measure in relationship to the entire aggregate. This can help

students understand better how the value of the measure represents the entire data set

and how its location is related to the distribution of data values.

Page 10: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 10

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

To add a vertical line representing a measure

to a graph:

1. with the graph window selected,

choose the Graph menu and select the

Plot Value option.

2. A formula editor window will appear.

In the textbox to the right of “Value=”

type in the function to compute the

statistical measure. For our example,

we will want to use mean(City) and

median(City).

You can add the mean and median measure to each of the three graphs. Figure 3.9

displays both measures overlaid on the dot plot.

Figure 3. 9

FOCUS ON MATHEMATICS

M-Q7. Do either of the measures of center, mean or median, best represent a typical

City mpg for these automobiles? Defend your choice or provide an alternative way of

representing the typical City mpg.

Tech Tip:

When typing formulas

in the Formula Editor,

if Fathom recognizes

the function, the text

turns blue. If the name

of an attribute is

recognized as one in

the data set, the text

turns pink.

Figure 3. 8

Page 11: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 11

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Part 3: Comparing Distributions Using Center and Spread1

Thus far, we have explored the City mpg for the entire aggregate of vehicles. It is

obvious from our analysis that some types of vehicles may have better City mpg than

others. In particular, we previously noticed that the four cases considered as outliers

were all Hybrid engines. Our data set contains vehicles of three different Engine

types: Standard, Diesel, and Hybrid. When students make an observation like this

about a data set, it often prompts them to explore a new question. This is an important

feature of EDA—analysis of data leads to more questions, which leads to further

analysis. Consider the following question:

Which type of engines give vehicles the best fuel economy in the city?

To examine this question, we need to use two attributes in the data set: City mpg and

Engine type. We now have a question that needs us to use bivariate data with one

quantitative attribute (City) and one qualitative attribute (Engine). Having students

examine one quantitative and one qualitative attribute together in a data set can

provide a transition into the working with bivariate data (two attributes) to answer a

question.

One way to begin examining the data with attention to the two attributes is to overlay

the qualitative attribute on top of the dot plot of the distribution of the City mpg. This

action will recolor the icons according to the categories of the qualitative attribute and

display a legend explaining the coloring.

To overlay a legend attribute to a graph:

1. click and drag the name of an

attribute form the case table and

point to the interior of the plot

window. Directions will appear as

shown in Figure 3.10. You only

need use the Shift or Ctrl keys if it

is not clear which type of attribute

you are dragging, or if you want to

purposely use an attribute a

specific way (e.g., if the categories

of a qualitative attribute have been

entered using numeric codes such

as 1, 2, 3, you may have to use the Shift key to force Fathom to recognize the

data at categorical).

2. Release the mouse and notice the appearance of the legend and that different

shapes and colors are represented (see Figure 3.11). If the legend attribute is

1 The technology file “2006_Vehicles_Part_3.ftm” is available for students to use for Part 3 if they

were unable to complete Part 2 with the technology.

Figure 3. 10

Page 12: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 12

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

qualitative, shapes and colors will be used, if the attribute is quantitative, a

color gradient will appear (we will explore this in a later section).

Figure 3. 11

FOCUS ON MATHEMATICS

M-Q8. Viewing Figure 3.11, what can you say about the City mpg for each of the

three Engine types?

FOCUS ON PEDAGOGY

P-Q7. How can overlaying a categorical (qualitative) attribute on a dot plot of a

numerical (quantitative) attribute influence students’ ability to examine data?

The graph in Figure 3.12 is good way for students to begin to coordinate two

attributes in a data set, and thus is a first step in learning to conduct bivariate data

analysis where one variable is quantitative and the other is qualitative. In Fathom,

students can also place the qualitative attribute on the y-axis and separate the data into

distinct categories. In our example, we can drag and drop the attribute Engine

onto the y-axis. This will allow us to view the distribution of City mpg for each

engine type separately (Figure 3.12)

Page 13: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 13

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Figure 3. 12

FOCUS ON MATHEMATICS

M-Q9. What similarities and differences do you notice about the distributions of City

mpg for each of the Engine types?

M-Q10. Examine the location of the mean and median in the three distributions.

Explain the relative location of the mean and median to each other in the three

distributions.

Although dot plots are useful, changing the graphical representation to another form

may highlight different aspects of the distribution. Change the graphical display

from a dot plot to a box plot (See Figure 3.13).

Figure 3. 13

Tech Tip:

To remove a

legend attribute

from a graph,

click on the plot

window and from

the Graph menu,

select Remove

Legend Attribute.

Page 14: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 14

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

FOCUS ON MATHEMATICS

M-Q11. What characteristics of the distributions beyond the measures of center are

highlighted when viewed as box plots?

FOCUS ON PEDAGOGY

P-Q8. How can examining the statistical measures of mean and median along with

the dot plot or box plot display of the distribution for each engine type assist students

in reasoning about center and spread when comparing the three groups?

P-Q9. How could you use the data to help students understand why in each of the

three box plots in Figure 3.13 the whiskers are not the same length?

In addition to comparing distributions graphically and displaying measures on a

graph, it is also helpful to use technology to compute and display the exact values of

several statistical measures. A summary table is useful in computing these statistics.

To create a Summary Table with several statistical measures,

1. drag down an empty summary object.

2. Click and drag a quantitative attribute

(City mpg) to the summary table. Once

the cursor is over the summary table, a

down arrow and a right arrow appear.

Drop the quantitative attribute below the

down arrow.

3. By default, the measure computed and

displayed is the mean. There are three ways to

add more measures. From the Summary

menu, you could select Add Formula, Add

Basic Statistics, or Add Five-Number

Summary. For our example, choose Add Five-

Number Summary. You will likely have to

resize the Summary table window.

4. You can also add a qualitative attribute to the

Summary table to recompute the statistics for

each separate category. In our example, we

want to drag drop the attribute Engine next to

the right arrow. Again, you will likely have to

resize the window to view the statistical

measures for each category.

Figure 3. 14

Figure 3. 15

Page 15: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 15

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Figure 3. 16

Now we have two powerful tools to help us analyze and compare the distributions of

City mpg for the different Engine types. We can change the graphical display to show

dot plots, box plots or histograms or use the Summary Table to compute additional

statistical measures.

FOCUS ON MATHEMATICS

M-Q12. Use the graphical displays and the statistical measures to compare the

distributions of the City mpg for the three Engine types. Which type of engines give

vehicles the best fuel economy in the city? Justify your reasoning.

FOCUS ON PEDAGOGY

P-Q10. What are some of the key features of this vehicle data set that make it useful

in helping students attend to important ideas of center and spread when comparing

data sets?

Asking students to compare distributions has been shown to be a useful technique for

helping students transition from considering data as individual cases to paying

attention to data as an aggregate. In addition, tasks that ask students to compare

distributions can help them consider characteristics such as shape and spread as useful

complements to measures of center.

Page 16: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 16

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Part 4: Understanding Spread of a Distribution

When representing data in a box plot, students can focus on the median as a measure

of center and the interquartile range (IQR) as a measure of the middle 50% of the

data, represented as the “box”. Thus, the IQR can help describe the spread of a data

set and is useful to consider in concert with the median as a measure of center.

When we use means to compare centers, then it does not make sense to use

interquartile ranges, which are computed using the medians, to analyze spread.

Rather, a different measure of spread, the standard deviation, is often used. This

measure of spread takes into consideration how each data point deviates from the

mean.

Consider the diagram in Figure

3.17. There are five data points

shown with values {3, 5, 11, 12,

14}. The vertical red line

represents the location of the

mean, which has a value of 9.

From each data point, there is a

horizontal black line from that

point to the mean, representing

how much the value of that point

deviates from the mean. There are

five values for the deviations {-6,

-4, +2, +3, +5}. Notice that the

sum of the deviations from the

mean is zero.

The standard deviation is a way of

describing how the data points

typical deviate from the mean.

However, since some of the

deviation values are positive while

others are negative, it is not helpful

to simply find the sum or the mean

of these deviations. One method

that can be used to eliminate the

negative deviations is to square

each deviation. Once deviations

from the mean are squared, their

sum will no longer be zero. The squared deviations are represented as the area of the

gray squares in the diagram with values {36, 16, 4, 9, and 25}.

Pedagogy Tip:

A detailed

discussion of the

IQR can be found in

Section 1, Part 4.

Figure 3. 17

Page 17: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 17

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

Two common measures that are used for describing the spread or dispersion of data

around the mean are variance and standard deviation, both of which are based on

the mean of the squared deviations. The variance is the mean of the squared

deviations and can be found by dividing the sum of the squared deviations by n (if

you are working with the entire population) or n-1 (if you are working with a

sample)2. In order to have a measure of spread that is on the same scale as the original

data, we can take the square root of this mean. This will standardize the measure,

resulting in the measure called the standard deviation. By default, Fathom will

compute standard deviations and variances based on a sample. However, there are

formulas in Fathom that can be used to compute these measures based on a

population if so desired.

The median and interquartile range are considered resistant measures because they

are based on ranks in data and not numerical values. Therefore, they are not strongly

influenced by outliers. The mean and the standard deviation are considered

nonresistant measures because they are based on numerical values of each data

point. Therefore, a numerical value well outside of the range of most of the data will

affect each of these measures.

FOCUS ON MATHEMATICS

M-Q13. What does the magnitude of the standard deviation tell you about the

dispersion of the data points in relationship to the mean?

M-Q14. Consider the following formulas for computing the variance (s2) and

standard deviation (s) for data in a sample of size n where x represents the mean and

ix is the ith data value.

1

)(1

2

2= =

n

xx

s

n

i

i

1

)(1

2

= =

n

xx

s

n

i

i

Explain what each part of the formula represents with respect to the diagram in Figure

3.17 and the explanation above.

M-Q15. Explain why the 2006 Vehicle data are considered a sample rather than a

population.

2 When finding the variance and standard deviation of a population, we divide by n. However, most

data sets are a sample of the population. If we compute the variance for a sample in the same way that

we compute the variance of a population, we will have a biased estimator of the population variance.

That is, if we took all possible samples of n members and calculated the variance by dividing by n and

took the mean of those variances, this value would not be equal to the true value of the population

variance. Fortunately the correction for this bias is remarkably simple. To correct for this bias, we

divide by n-1 rather than n when we have a sample.

Page 18: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 18

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

M-Q16. Consider the distributions and location of the mean City mpg for each of the

three Engine types. Which engine type do you predict will have the largest standard

deviation? The smallest? Explain your reasoning based on how the data values

deviate from the mean for each Engine type.

M-Q17. Use a summary table to find the value of the standard deviation of the City

mpg for each of the three Engine types. What do these values tell you about the

spread of the City mpg? Do the calculations match your predictions?

FOCUS ON PEDAGOGY

P-Q11. Students are often introduced to the standard deviation through instruction on

how to compute its value based on the formulas shown in M-Q14. What is the benefit

of using a diagram such as the one in Figure 3.17 to help students conceptualize

standard deviation as a measure that describes typical deviation from the mean?

P-Q12. What are the advantages or drawbacks of having students examine several

distributions with the means indicated as in M-Q14 and asking them to predict

magnitude of a standard deviation before using Fathom to compute the exact values?

Page 19: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 19

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

SUGGESTED ASSIGNMENTS

H-Q1 (Mathematical)

Use Fathom to create graphical displays and compute statistical measures to compare

the distributions of the Highway mpg for the three Engine types. Which type of

engines give vehicles the best fuel economy on the highway? Justify your reasoning.

H-Q2 (Mathematical and Pedagogical)

The mean absolute deviation is often introduced in middle school as an introductory

measure of spread. While the mean absolute deviation is easy to compute, the

behavior of the absolute value function make it a more difficult measure to use when

conducting more complex statistical analyses and is therefore infrequently used in

high school and college. Instead of using squaring as a method to eliminate the

negative deviations, the mean absolute deviation is computed by finding the

absolute value of each deviation from the mean and then finding the mean of these

values. Consider the collection of 9 cases with a mean of 5 shown in the table and dot

plot below.

a) What is the value of the mean absolute deviation (MAD) for this data

set?

b) What does the value of the MAD indicate about the spread of the data?

c) How would you need to change the values in the data set so that the

mean remains 5 but the MAD increases to 24/9?

d) Describe the benefits and drawbacks of using the mean absolute

deviation and the benefits and drawbacks of using the standard

deviation with middle and/or high school mathematics students.

Page 20: Section 3: Analyzing Data with Fathom - math.coe.uga.edumath.coe.uga.edu/olive/emat3500s07/Section_3.pdfSection 3: Analyzing Data with Fathom _____ Learning to Teach Mathematics with

Section 3: Analyzing Data with Fathom

__________________________________________________________________________________

Learning to Teach Mathematics with Technology: An Integrated Approach Page 20

DRAFT MATERIALS DO NOT DISTRIBUTE Modified 9/22/2006

HQ-3. (Pedagogical)

Compare the pedagogical benefits and drawbacks of using Fathom and TinkerPlots to

explore univariate data with respect to the following points:

• The organization of data in a collection

• The linking of representations

• The representations available and the construction of graphs

• Use of color

• The ability to display measures on a graph

• Calculation of measures

H-Q4. (Pedagogical)

When is it advantageous to use the median and interquartile range as summary

measures? Mean and standard deviation? When examining a distribution, how can

you assist students in deciding if resistant or nonresistant measures are appropriate?