69
Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Embed Size (px)

Citation preview

Page 1: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Teaching Data Analysis as an Investigative

Process with Census at SchoolRebecca Nichols and Martha Aliaga

American Statistical Association

Page 2: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School ProgramFree international classroom project that engages students in grades 4-12 in statistical problem solving

Students complete an online survey, analyze their class census data, and compare their class results with random samples of participating students in the United States and other countries.

The project began in the United Kingdom in 2000 and includes Australia, Canada, New Zealand, South Africa, Ireland, Japan, and now the United States.

Teach statistical concepts in the Common Core Standards, measurement, graphing, data analysis, and statistical problem solving in context of students’ own data and data from their peers in the participating countries

www.amstat.org/censusatschool

Page 3: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Problem SolvingGuidelines for Assessment and Instruction in StatisticsEducation (GAISE) Report: A Pre-K–12 Curriculum Framework

1. Formulate Questions Clarify the problem at hand Formulate one (or more) questions that can be answered with data

2. Collect Data Design a plan to collect appropriate data Employ the plan to collect the data

3. Analyze Data Select appropriate graphical and numerical methods Use the methods to analyze the data

4. Interpret results Interpret the analysis (in context) Relate the interpretation to the original question

Source: www.amstat.org/education/gaise

Page 4: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Common Core State Standards for Mathematics

Source: corestandards.org

Page 5: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Measurement & Data – Grades 4 & 5Grade 4 (4.MD)

Measurement and Data Strand – Common Core State Standards

• Solve problems involving measurement and conversion of measurements from a larger unit to a smaller unit.

• Represent and interpret data.

Grade 5 (5.MD)Measurement and Data

• Convert like measurement units within a given measurement system.

• Represent and interpret data.

The U.S. Census at School questionnaire includes measurement questions (measuring height, arms pan, and foot length in centimeters, finger length in millimeters, etc.) and opportunities to represent and interpret real student data.

Source: corestandards.org

Page 6: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – Grade 6 (6.SP)Develop understanding of statistical variability

1. Recognize a statistical question as one that anticipates variability in the data related to the question and accounts for it in the answers. For example, “How old am I?” is not a statistical question, but “How old are the students in my school?” is a statistical question because one anticipates variability in students’ ages.

2. Understand that a set of data collected to answer a statistical question has a distribution which can be described by its center, spread, and overall shape.

3. Recognize that a measure of center for a numerical data set summarizes all of its values with a single number, while a measure of variation describes how its values vary with a single number.

Source: corestandards.org

Page 7: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – Grade 6 (6.SP)Develop understanding of statistical variability

1. Recognize a statistical question as one that anticipates variability in the data related to the question and accounts for it in the answers. For example, “How old am I?” is not a statistical question, but “How old are the students in my school?” is a statistical question because one anticipates variability in students’ ages.

2. Understand that a set of data collected to answer a statistical question has a distribution which can be described by its center, spread, and overall shape.

3. Recognize that a measure of center for a numerical data set summarizes all of its values with a single number, while a measure of variation describes how its values vary with a single number.

Source: corestandards.org

Page 8: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – Grade 6 (6.SP)Summarize and describe distributions

4. Display numerical data in plots on a number line, including dot plots, histograms, and box plots.

5. Summarize numerical data sets in relation to their context, such as by: Reporting the number of observations.

Describing the nature of the attribute under investigation, including how it was measured and its units of measurement.

Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or mean absolute deviation), as well as describing any overall pattern and any striking deviations from the overall pattern with reference to the context in which the data were gathered.

Relating the choice of measures of center and variability to the shape of the data distribution and the context in which the data were gathered.

Source: corestandards.org

Page 9: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – Grade 7 (7.SP)Use random sampling to draw inferences about a population

1. Understand that statistics can be used to gain information about a population by examining a sample of the population; generalizations about a population from a sample are valid only if the sample is representative of that population. Understand that random sampling tends to produce representative samples and support valid inferences.

2. Use data from a random sample to draw inferences about a population with an unknown characteristic of interest. Generate multiple samples (or simulated samples) of the same size to gauge the variation in estimates or predictions. For example, estimate the mean word length in a book by randomly sampling words from the book; predict the winner of a school election based on randomly sampled survey data. Gauge how far off the estimate or prediction might be.

Source: corestandards.org

Page 10: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – Grade 7 (7.SP)Draw informal comparative inferences about two populations3. Informally assess the degree of visual overlap of two numerical data distributions with similar variabilities, measuring the difference between the centers by expressing it as a multiple of a measure of variability. For example, the mean height of players on the basketball team is 10 cm greater than the mean height of players on the soccer team, about twice the variability (mean absolute deviation) on either team; on a dot plot, the separation between the two distributions of heights is noticeable.

4. Use measures of center and measures of variability for numerical data from random samples to draw informal comparative inferences about two populations. For example, decide whether the words in a chapter of a seventh-grade science book are generally longer than the words in a chapter of a fourth-grade science book.

Note: Grade 7 also includes probability standards

Source: corestandards.org

Page 11: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – Grade 8 (8.SP)Investigate patterns of association in bivariate data

1. Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.

2. Know that straight lines are widely used to model relationships between two quantitative variables. For scatter plots that suggest a linear association, informally fit a straight line, and informally assess the model fit by judging the closeness of the data points to the line.

3. Use the equation of a linear model to solve problems in the context of bivariate measurement data, interpreting the slope and intercept. For example, in a linear model for a biology experiment, interpret a slope of 1.5 cm/hr as meaning that an additional hour of sunlight each day is associated with an additional 1.5 cm in mature plant height.

Source: corestandards.org

Page 12: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – Grade 8 (8.SP)Investigate patterns of association in bivariate data

4. Understand that patterns of association can also be seen in bivariate categorical data by displaying frequencies and relative frequencies in a two-way table. Construct and interpret a two-way table summarizing data on two categorical variables collected from the same subjects. Use relative frequencies calculated for rows or columns to describe possible association between the two variables. For example, collect data from students in your class on whether or not they have a curfew on school nights and whether or not they have assigned chores at home. Is there

evidence that those who have a curfew also tend to have chores?

Source: corestandards.org

Page 13: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – High SchoolInterpreting Categorical & Quantitative Data (S-ID)Summarize, represent, and interpret data on a single count or

measurement variable

1. Represent data with plots on the real number line (dot plots, histograms, and box plots).

2. Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets.

3. Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

4. Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that there are data sets for which such a procedure is not appropriate. Use calculators, spreadsheets, and tables to estimate areas under the normal curve.

Source: corestandards.org

Page 14: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – High SchoolInterpreting Categorical & Quantitative Data (S-ID)Summarize, represent, and interpret data on two categorical and

quantitative variables

5. Summarize categorical data for two categories in two-way frequency tables. Interpret relative frequencies in the context of the data (including joint, marginal, and conditional relative frequencies). Recognize possible associations and trends in the data.

6. Represent data on two quantitative variables on a scatter plot, and describe how the variables are related.

a. Fit a function to the data; use functions fitted to data to solve problems in the context of the data. Use given functions or choose a function suggested by the context. Emphasize linear, quadratic, and exponential models.

b. Informally assess the fit of a function by plotting and analyzing residuals.

c. Fit a linear function for a scatter plot that suggests a linear association.

Source: corestandards.org

Page 15: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – High SchoolInterpreting Categorical & Quantitative Data (S-ID)Interpret linear models

7. Interpret the slope (rate of change) and the intercept (constant term) of a linear model in the context of the data.

8. Compute (using technology) and interpret the correlation coefficient of a linear fit.

9. Distinguish between correlation and causation.

Source: corestandards.org

Page 16: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – High SchoolMaking Inferences & Justifying Conclusions (S-IC)Understand and evaluate random processes underlying statistical

experiments

1. Understand statistics as a process for making inferences about population parameters based on a random sample from that population.

2. Decide if a specified model is consistent with results from a given data-generating process, e.g., using simulation. For example, a model says a spinning coin falls heads up with probability 0.5. Would a result of 5 tails in a row cause you to question the model?

Source: corestandards.org

Page 17: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistics & Probability – High SchoolMaking Inferences & Justifying Conclusions (S-IC)Make inferences and justify conclusions from sample surveys,

experiments, and observational studies

3. Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each.

4. Use data from a sample survey to estimate a population mean or proportion; develop a margin of error through the use of simulation models for random sampling.

5. Use data from a randomized experiment to compare two treatments; use simulations to decide if differences between parameters are significant.

6. Evaluate reports based on data.

Source: corestandards.org

Page 18: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School Program

www.amstat.org/censusatschool

Page 19: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School ProgramFree international classroom project that engages students in grades 4-12 in statistical problem solving

Students complete an online survey, analyze their class census data, and compare their class results with random samples of participating students in the United States and other countries.

The project began in the United Kingdom in 2000 and includes Australia, Canada, New Zealand, South Africa, Ireland, Japan, and now the United States.

Teach statistical concepts in the Common Core Standards, measurement, graphing, data analysis, and statistical problem solving in context of students’ own data and data from their peers in the participating countries

www.amstat.org/censusatschool

Page 20: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School Program Students complete a brief online survey (classroom census)

13 international questions plus additional U.S. questions 15-20 minute computer session

Analyze your class results Use teacher password to gain immediate access to class data Formulate questions of interest that can be answered with Census at

School data, collect/select appropriate data, analyze the data with appropriate graphs and numerical summaries, internet the results, and make appropriate conclusions in context relating to the original questions

Compare your class with samples from the U.S. and other countries

Download a random sample of Census at School data from U.S. students Download a random sample from participating international students

International lesson plans are available, along with instructional webinars and other free resources

www.amstat.org/censusatschool

Page 21: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School Program

www.amstat.org/censusatschool

Page 22: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School – Student Section

www.amstat.org/censusatschool

Page 23: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School – Student Section

www.amstat.org/censusatschool

Page 24: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School – Teacher Section

www.amstat.org/censusatschool

Page 25: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School – Resources

www.amstat.org/censusatschool

Page 26: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

US Census at School Random Sampler

www.amstat.org/censusatschool

Page 27: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

International Random Sampler

www.amstat.org/censusatschool

Page 28: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Census at School Formulate statistical questions of interest that can be

answered with the Census at School data. Collect/select appropriate Census at School data and write

down the variable names and type for this investigation. Analyze the data.  Include appropriate graphs and numerical

summaries for the corresponding variables. Interpret the results and make appropriate conclusions in

context.  Be sure to justify your results using your graphs and numerical summaries and relate your interpretation to the original question.

For a demonstration of this process and software resources (some free) to analyze the data, watch the Census at School webinars posted under Resources at www.amstat.org/censusatschool.

Page 29: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Formulating a Statistical QuestionCommon Core Standards – Grade 6 (6.SP)

1. Recognize a statistical question as one that anticipates variability in the data related to the question and accounts for it in the answers. For example, “How old am I?” is not a statistical question, but “How old are the students in my school?” is a statistical question because one anticipates variability in students’ ages.

A well-written statistical question anticipates answers that will vary and includes:Population of interestMeasurement of interest

Example from Common Core: How old are the students in my school?Population of interest: Students in my schoolMeasurement: Age (measured in years)Student ages will vary

Page 30: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Formulate Statistical QuestionsExample Statistical Questions with Census at School

How much time per week do students participating in U.S. Census at School spend on the computer?

Population of interest: Students (grades 4-12) participating in U.S. Census at SchoolMeasurement: Time on the computer each week (measured in hours)Time (hours per week) will vary by student What is the favorite sport /activity of students in Australia

participating in Census at School?Population of interest: Students in Australia participating in U.S. Census at SchoolMeasurement: Favorite sport (baseball, basketball, bowling, etc.)Favorite sport/activity will vary by student Is there a difference between the reaction times of boys and girls

(participating in U.S. Census at School)?Populations/groups of interest: Boys and girls participating in U.S. Census at SchoolMeasurement: Reaction time (measured in seconds)Reaction times will vary by student

Page 31: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Formulate Statistical QuestionsExample Statistical Questions with Census at School

Is there a relationship between height and arm span for students participating in Census at School?

Population of interest: Students participating in U.S. Census at SchoolMeasurements: Height (measured in cm) and arm span (measured in cm)Measured heights and arm spans will vary by student Does the preferred superpower of U.S. Census at School students

differ by gender?Populations/groups of interest: Boys and girls participating in U.S. C@SMeasurement: Preferred superpower (categories: fly, freeze time, invisibility, super strength, telepathy)Preferred superpower will vary by student Does travel time to school vary by country for the students

participating in Census at School?Populations/groups of interest: Students participating in C@S in the various countriesMeasurement: Travel time to school (measured in minutes)Measured heights and arm spans will vary by student

Page 32: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Collect/Select Census at School Data Download your class data Download random samples of other students in the U.S.

participating in Census at School with the online U.S. Census at School Random Sampler

Download random samples of students in other countries participating in Census at School with the online International Census at School Random Sampler

Select the variables of interest needed to investigate the statistical question and write down the variable names and type for this investigation

Because the Census at School data comes from real students and is entered online by these students, there will be some mistakes and inappropriate data values in the data sets. Before analyzing your data and making conclusions, make sure to take time to observe and clean your data. Create a graph for each variable of interest to look for inappropriate or unusual values that do not make sense in context of the variable and original survey question.

Page 33: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Analyze & Interpret C@S DataFor the statistical questions: Create appropriate graphs Create numerical summaries Apply the concepts in the Common Core State

Standards (or state standards) with real student data Interpret the results and make appropriate conclusions in

context.  Be sure to justify your results using your graphs and numerical summaries and relate your interpretation to the original question.

We will review appropriate graphs, numerical summaries, and interpretations in context of Census at School data.

Page 34: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Review: Variables Variable – Any characteristic whose value may change from one

individual/object to another.

Quantitative (Numerical, Measurement) Variable: Census at School Examples:

Age: 13 years Height: 138 cm (measured in cm since it is an international program) Travel time to school: 15 minutes

Categorical (Qualitative) Variable: Census at School Examples:

Country: Australia/Canada/New Zealand/South Africa/United Kingdom/USA

Gender: Male/Female Handed: Right handed/Left handed/Ambidextrous

Why does the type of variable matter?

Page 35: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Graphical Displays by Type of Variable One Quantitative Variable

Dot plot Stem & leaf plot Histogram Box & whisker plot

(box plot)

One Categorical Variable: Bar graph (preferred) Circle graph/Pie chart

20

40

60

80

100

120

140

160

180

Handed

Ambidextrous Left-handed Right-handed

count

CensusAtSchool_Random_Sample_8_all_en_INTBar Chart

Variable: Height_cm 15 : 13 15 : 6779 16 : 1222233344444 16 : 56677999 17 : 00234 17 : 556667777 18 : 234 18 : 679 15:1 = 151 cm

Height_cm

150 155 160 165 170 175 180 185 190

US Census at School Sample Dot Plot

Height_cm

150 155 160 165 170 175 180 185 190

US Census at School Sample Box Plot

3

6

9

Height_cm

150 160 170 180 190

US Census at School Sample Histogram

Page 36: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Bar Charts vs. Histograms

•Horizontal axis is categorical.

•Shape, center, and spread have no meaning in a bar chart.

•Space usually separates each bar.

•Horizontal axis is quantitative.

•Histograms are described by shape, center, and spread.

•Bars are usually adjacent.

20

40

60

80

100

Height_cm

0 50 100 150 200 250

CensusAtSchool_Random_Sample_8_all_en_INTHistogram

How tall are you without your shoes on? Answer to the nearest cm.

Are you right-handed, left-handed or ambidextrous?

20

40

60

80

100

120

140

160

180

Handed

Ambidextrous Left-handed Right-handed

count

CensusAtSchool_Random_Sample_8_all_en_INTBar Chart

20

40

60

80

100

Height_cm

0 50 100 150 200 250

CensusAtSchool_Random_Sample_8_all_en_INTHistogram

How tall are you without your shoes on? Answer to the nearest cm.

Are you right-handed, left-handed or ambidextrous?

20

40

60

80

100

120

140

160

180

Handed

Ambidextrous Left-handed Right-handed

count

CensusAtSchool_Random_Sample_8_all_en_INTBar Chart

Page 37: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Numerical Summaries by Type of Variable Quantitative data:

Mean (measure of center appropriate for symmetric data) Median (measure of center not as influenced by skewness &

outliers as the mean) Standard deviation (measure of spread appropriate when using

mean as a measure of center, similar to mean absolute deviation) Interquartile range (measure of spread appropriate when using

median as a measure of center, less influenced by skewness and outliers)

Range (measure of spread) Others

Categorical data: Counts/frequencies Percentages Modal category

Use the graphs and numerical summaries appropriate for your grade level(s).

Page 38: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs for One Quantitative Variable (Histogram, Dot Plot, Stem & Leaf Plot, Box & Whisker Plot)Central Cluster

Bell

Left Cluster

Skewed Right

Right Cluster

Skewed Left

Two Clusters

Bimodal

Flat

Uniform

Shape

76543210-1-2-3 100%90%80%70% 655545352515

Center

-3 3 -9 90

40

30

20

10

0

Fre

quen

cy

00 6-6

Spread

Page 39: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (One Quantitative Variable) Shape (bell-shaped, skewed, flat/uniform, etc.) Center (mean, median) Spread/Variability (standard deviation, mean

absolute deviation, interquartile range, range) Possible outliers

Height_cm

0 20 40 60 80 100 120 140 160 180 200 220

International Census at School Random Sample Dot Plot

Page 40: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (One Quantitative Variable) Shape Center Spread/Variability Possible outliers

These values (0, 1 and 20 cm) fall outside the overall pattern and are not plausible heights (the world’s shortest person is 50 cm).

Page 41: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (One Quantitative Variable) Shape

symmetric, bell-shaped

Center Spread/Variability

Possible outliers

Page 42: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (One Quantitative Variable) Shape Center

mean ≈ median since shape approx. symmetric

Spread/Variability Possible outliers

Page 43: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (One Quantitative Variable) Shape Center Spread/Variability (standard deviation, interquartile

range, range) Range – distance between the maximum and minimum values

The outliers were not included in this case since they are not possible height values

.

Page 44: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (One Quantitative Variable) Shape Center Spread/Variability (standard deviation,

interquartile range, range) Interquartile range = Q3 – Q1, range of the middle 50% of

values

Page 45: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (One Quantitative Variable) Shape Center Spread/Variability (standard deviation,

interquartile range, range) Standard Deviation – an average distance of observations

from the mean (similar to mean absolute deviation)

Page 46: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs for One Quantitative Variable Census at School Example Variable: Height

Shape

Center

Spread

Bell-shaped, symmetric

Mean height = 157.6 cmMedian height = 158 cm

Standard deviation of heights = 13.5 cm (an average distance from the mean height is 13.5 cm)Interquartile range of heights = Q3 – Q1 = 167 – 149 = 18 cm (range of the middle 50% of heights is 18 cm)Range of heights = Max – Min= 200 – 116 = 84 cm

5

10

15

20

25

30

35

Height_cm

120 140 160 180 200

International Census at School Random SampleHistogram

5

10

15

20

25

30

35

Height_cm

120 140 160 180 200

mean = 157.602

median = 158

International Census at School Random SampleHistogram

Page 47: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs for One Quantitative Variable Census at School Example: Importance of Saving Energy Rating Scale 0 to 1000Shape

Center

Spread

5

10

15

20

25

30

35

40

45

Importance_saving_enery

0 200 400 600 800 1000

International Census at School Random Sam...Histogram

Left-skewed

Median rating = 699 Mean rating = 678.2

Standard deviation of ratings = 282.1 (an average distance from the mean rating is 282.1)Interquartile range of ratings = Q3-Q1 = 999–500= 499 (range of the middle 50% of ratings is 499)Range of ratings =Max-Min= 1000–0= 1000

10

20

30

40

0 200 400 600 800 1000

Importance_saving_enery

median = 699

mean = 678.204

International Census at School Random Sam...Histogram

Page 48: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs for One Quantitative Variable Census at School Example Variable: Travel Time to School

Shape

Center

Spread

Right-skewed

Median travel time = 15 minutes Mean travel time = 18.1 minutes

Standard deviation = 14.3 minutes (an average distance from the mean travel time is 14.3 minutes)Interquartile Range = Q3-Q1= 25–7= 18 minutes (range of the middle 50% of travel times is 18 minutes)Range =Max-Min= 70–1= 69 minutes

5

10

15

20

25

30

35

40

45

Travel_time_to_School

0 10 20 30 40 50 60 70 80

International Census at School Random SampleHistogram

5

10

15

20

25

30

35

40

45

Travel_time_to_School

0 10 20 30 40 50 60 70 80

median = 15

mean = 18.105

International Census at School Random SampleHistogram

Page 49: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Graphical Displays for Two Variables Two Quantitative Variables

Scatter plot

Two Categorical Variables Tables Two variable bar charts

One Quantitative and One Categorical Variable Side by side dot plots Side by side box & whisker plots (works well for multiple groups –

strength of box plots) Back to back stem & leaf plots (Compare distribution of

quantitative variable for two groups)

Page 50: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Two Quantitative Variables Scatter Plot Examples:

Is there a relationship between height (cm) and arm span (cm)?

Is there a relationship between age (years) and travel time to school (minutes)?

0

10

20

30

40

50

60

70

80

Ageyears

0 2 4 6 8 10 12 14 16 18 20 22

International Census at School Random SampleScatter Plot

100

120

140

160

180

200

220

Height_cm

150 155 160 165 170 175 180 185 190

US Census at School Sample Scatter Plot

Page 51: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (Two Quantitative Variables) Interpreting Scatter Plots

Form (linear, curved, etc.) Direction (positive, negative, etc.) Strength (strong, weak, etc.) Note possible outliers Form: Linear (appropriate to model with a

line)Direction: Positive (those who are taller tend to have larger arm spans and those who are shorter tend to have smaller arm spans)Strength: Moderately strong (points fairly tight about the linear form)Correlation Coefficient r = 0.80 (positive and moderately strong relationship since correlation near +1)Possible outlier: There is an individual with a shorter arm span that expected based on her height.

Association does not imply causation.

100

120

140

160

180

200

220

Height_cm

150 155 160 165 170 175 180 185 190

US Census at School Sample Scatter Plot

100

120

140

160

180

200

220

Height_cm

150 155 160 165 170 175 180 185 190

Armspan_cm = 1.72Height_cm - 119; r2 = 0.64

US Census at School Sample Scatter Plot

Numerical Summary:Correlation Coefficient r = 0.80

Page 52: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Interpreting Graphs (Two Quantitative Variables) Interpreting Scatter Plots

Form (linear, curved, etc.) Direction (positive, negative, etc.) Strength (strong, weak, etc.) Note possible outliers

0

10

20

30

40

50

60

70

80

Ageyears

0 2 4 6 8 10 12 14 16 18 20 22

International Census at School Random SampleScatter Plot Form: No apparent formDirection: No apparent association, possibly slightly positiveStrength: WeakCorrelation Coefficient r = 0.19 (slightly positive and very weak association since correlation near 0)

0

10

20

30

40

50

60

70

80

Ageyears

0 2 4 6 8 10 12 14 16 18 20 22

Travel_time_to_School = 1.14Ageyears + 3.7; r2 = 0.036

International Census at School Random SampleScatter Plot

Numerical Summary:Correlation Coefficient r = 0.19Note: Correlation Coefficient is not part of the Common Core Standards for Middle School

Page 53: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Two Categorical Variables Two-way Table & Two-variable Bar Chart Example:

Frequency of preferred superpower (categorical variable) by gender (categorical variable)

Because there are more females than males in this sample (92 versus 58), we need to be careful comparing the frequencies of gender across the superpower categories. It is more meaningful to compare percentages than frequencies since the count of males and females differ in this sample of school children participating in Census at School.

Census at School Sample

RowSummary

Column Summary

Gender

Male

Gender

Female

Fly

Freeze time

Invisibility

Super strength

Telepathy

Superpower

30 11

19 17

13 11

4 14

26 5

92 58

41

36

24

18

31

150

S1 = count

GenderFemale Male

510

1520

2530

3540

45

Fly Freeze time Invisibility Super strength Telepathy

Superpower

count

Census at School Sample Bar Chart

Page 54: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Two Categorical Variables Two-way Table & Two-variable Bar Chart Example:

Percentage of preferred superpower (categorical variable) by gender (categorical variable)

Use proportions/percentages to investigate association between two categorical variables. If the gender percentages are the same across the superpower categories, we do not have evidence of association. Since they do differ in this case, we have evidence of potential association between gender and superpower preference.

GenderFemale Male

0

20

40

60

80

100

Fly Freeze time Invisibility Super strength Telepathy

Superpower

row Proportion •

Census at School Sample Bar Chart

Perc

enta

ge o

f G

en

der

by

Su

perp

ow

er

Pre

fere

nce

Cate

gory

Census at School Sample

RowSummary

Column Summary

Gender

Male

Gender

Female

Fly

Freeze time

Invisibility

Super strength

Telepathy

Superpower

73.1707 26.8293

52.7778 47.2222

54.1667 45.8333

22.2222 77.7778

83.871 16.129

61.3333 38.6667

100

100

100

100

100

100

S1 = row Proportion •

Page 55: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Two Categorical Variables Two-way Table & Two-variable Bar Chart Example:

Percentage of preferred superpower (categorical variable) by gender (categorical variable)

GenderFemale Male

0

20

40

60

80

100

Fly Freeze time Invisibility Super strength Telepathy

Superpower

row Proportion •

Census at School Sample Bar Chart

Perc

enta

ge o

f G

en

der

by

Su

perp

ow

er

Pre

fere

nce

Cate

gory

Census at School Sample

RowSummary

Column Summary

Gender

Male

Gender

Female

Fly

Freeze time

Invisibility

Super strength

Telepathy

Superpower

73.1707 26.8293

52.7778 47.2222

54.1667 45.8333

22.2222 77.7778

83.871 16.129

61.3333 38.6667

100

100

100

100

100

100

S1 = row Proportion •

Of those who prefer super strength, the largest percentage are males. Of those who prefer to fly or telepathy (read minds), the largest percentage are females. Of those who prefer invisibility or to freeze time, the percentages of males and females are more similar.

Page 56: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

One Quantitative and One Categorical Variable Census at School Examples: Travel time to school in minutes

(quantitative variable) by country (categorical variable) Discuss/compare the shape, center, and spread of the

quantitative variable for each category of the categorical variable

Australia

Canada

New Zealand

UK

Travel_time_to_School

0 10 20 30 40 50 60 70 80

International Census at School Random Sample Box Plot

Australia

Canada

New Zealand

UK

Travel_time_to_School

0 10 20 30 40 50 60 70 80

International Census at School Random Sample Dot Plot

Page 57: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Census at School Formulate questions of interest that can be

answered with the Census at School data. Collect/select appropriate Census at School data

and write down the variable names and type for this investigation.

Analyze the data.  Include appropriate graphs and numerical summaries for the corresponding variables.

Interpret the results and make appropriate conclusions in context.  Be sure to justify your results using your graphics and summaries and relate your interpretation to the original question.

Let’s do this together using real C@S data!

Page 58: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigation – Census at School Example Formulate questions of interest that can be

answered with the Census at School data.

Is there a relationship between text messaging (measured as the number of text messages sent yesterday) and computer usage (measured in the estimated number of hours spent on the computer per week)? If so, does this relationship differ for males and females?

Page 59: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Example Collect/select appropriate Census at School data

and write down the variable names and type for this investigation. Variable 1 (x-axis): Number of Text Messages Sent

Yesterday. This is a quantitative variable. Variable 2 (y-axis): Estimated Number of Hours

Using the Computer Each Week. This is a quantitative variable.

Possible Variable 3: Gender. This is a categorical variable.

We will investigate these variables using a sample from the U.S. Census at School data.

Page 60: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Example Analyze the data.  Include appropriate graphs

and numerical summaries for the corresponding variables.

0

10

20

30

40

50

60

0 10 20 30 40

Text_Messages_Sent_Yesterday

USA Census at School Scatter Plot

We can investigate whether there is a relationship between text messaging and computer usage (both quantitative variables) using a scatter plot.

Because this relationship appears linear, it is appropriate to model this relationship with a line.

Remember to investigate form, direction, strength, and possible outliers.

Page 61: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Example Analyze the data.  Include appropriate graphs

and numerical summaries for the corresponding variables.

0

10

20

30

40

50

60

0 10 20 30 40

Text_Messages_Sent_Yesterday

Computer_Use_Hours = 1.04Text_Messages_Sent_Yesterday + 5.2; r2 = 0.67

USA Census at School Scatter Plot Form: Linear (appropriate to model with a line)Direction: Positive (those who sent many text messages yesterday tend to spend many hours per week on the computer)Strength: Moderately strong (points fairly tight about the linear form)Correlation Coefficient r = 0.82 (positive and moderately strong relationship since correlation near +1)Possible outlier: Individual who did not send any text messages yesterday, but spends many hours on the computer each week

Numerical Summary:Correlation Coefficient r = 0.82

Page 62: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Example Interpret the results and make appropriate

conclusions in context.  Be sure to justify your results using your graphics and summaries and relate your interpretation to the original question.

Yes, there appears to be a relationship between text messaging and computer usage in this sample of U.S. Census at School data.

The relationship between the number of text messages sent yesterday and the estimated number of hours typically spent on the computer during the week is positive, meaning that those that send many text messages typically also spend many hours on the computer each week and those that send few text messages typically spend few hours on the computer each week.

Page 63: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Example Based on the graph, the form of the relationship appears

linear and appropriate to model this relationship with a line. Because the relationship appears linear, it is appropriate to compute the correlation coefficient.

The correlation coefficient, which is a numerical summary describing the strength and direction of a linear relationship, is r = 0.82 (to calculate r = 0.82, take the square root of the given r2 value 0.67).

Because r = 0.82 is positive and fairly near positive 1, we know this linear relationship is moderately strong and positive. If the correlation was r = -0.82, we would know the relationship is moderately strong and negative (near -1). If the correlation coefficient were near 0, we would know the linear relationship is very weak or there is no association.

Page 64: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Example The slope of the line (1.04) indicates that for every increase of 1

text message sent yesterday, we would expect the number of estimated hours spent on the computer per week to increase by 1.04 hours.

The y-intercept of 5.2 indicates that for those who sent 0 text messages yesterday, we would expect their estimated computer usage per week to be 5.2 hours (y-intercept is not always meaningful to interpret).

The r2 value of 0.67 indicates that 67% of the variation in y (estimated number of hours spent on the computer per week) can be explained by the relationship between the number of text messages sent yesterday (x) and the estimated number of hours spent on the computer per week (y). (Note: r2 is not part of the Common Core Standards)

Page 65: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigation – Census at School Example If so, does this relationship differ for males

and females?

GenderFemale Male

0

10

20

30

40

50

60

Text_Messages_Sent_Yesterday

0 10 20 30 40

CensusatSchoolParticipantsUSAdata Scatter Plot

To investigation whether this relationship between text messaging and computer usage differs for males and females, we add in the categorical variable gender.

Females are represented by grey circles and males by blue squares.

Page 66: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Example Analyze the data.  Include appropriate graphs and

numerical summaries for the corresponding variables.

Form: Both Linear (appropriate to model males and females separately with two lines)Direction: Both Positive (those who sent many text messages yesterday tend to spend many hours per week on the computer)Strength: Moderately strong for females and strong for males. There are few data points, but the correlation coefficients indicate strong relationships between text messaging and computer usage for both males and females.Possible outlier: Female who did not send any text messages yesterday, but spends many hours on the computer each week

Numerical Summaries:Correlation Coefficient for Females, r = 0.80 Correlation Coefficient for Males, r = 0.95

GenderFemale Male

0

10

20

30

40

50

60

Text_Messages_Sent_Yesterday

0 10 20 30 40

Computer_Use_Hours = 0.891Text_Messages_Sent_Yesterday + 8.4; r2 = 0.64

Computer_Use_Hours = 1.62Text_Messages_Sent_Yesterday - 3.6; r2 = 0.90

Combined Data Scatter Plot

Page 67: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigation – Census at School Example The relationship between text messaging and computer usage is

positive, linear, and at least moderately strong for females and strong for males in this sample.

The correlation coefficient for describing the strength and direction of the linear relationship between text messaging and computer usage for females is 0.80 (to calculate r = 0.80, take the square root of the given r2 value 0.64).

The correlation coefficient for describing the strength and direction of the linear relationship between text messaging and computer usage for males is 0.95 (to calculate r = 0.95, take the square root of the given r2 value 0.90).

The slope of the line (0.89) for females indicates that for every increase of 1 text message sent yesterday, we would expect the number of estimated hours spent on the computer per week to increase by 0.89 hours for females.

The slope of the line (1.62) for males indicates that for every increase of 1 text message sent yesterday, we would expect the number of estimated hours spent on the computer per week to increase by 1.62 hours for males.

Page 68: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

Statistical Investigations - Census at School Formulate questions of interest that can be answered with the

Census at School data. Collect/select appropriate Census at School data and write

down the variable names and type for this investigation. Analyze the data.  Include appropriate graphs and numerical

summaries for the corresponding variables. Interpret the results and make appropriate conclusions in

context.  Be sure to justify your results using your graphics and summaries and relate your interpretation to the original question.

For a demonstration of this process and software resources (some free) to analyze the data, watch the Census at School webinars posted under Resources at www.amstat.org/censusatschool.

Page 69: Teaching Data Analysis as an Investigative Process with Census at School Rebecca Nichols and Martha Aliaga American Statistical Association

ASA K-12 Statistics Education Resources Student Poster and Project Competitions

www.amstat.org/education/posterprojects GAISE Pre-K – 12 Report

www.amstat.org/education/gaise Statistics Teacher Network (STN) newsletter, ASA/NCTM Joint Committee (free)

www.amstat.org/education/stn Free Statistics Education Webinars

http://www.amstat.org/education/webinars New Making Sense of Statistical Studies publication

http://www.amstat.org/education/msss/

Statistics Education Publications http://www.amstat.org/education/publications.cfm

Other websites useful to teachers www.amstat.org/education, visit the K-12 link

Information on careers in statistics www.amstat.org/careers

Statistics Education Web (STEW) – Online peer-reviewed lesson plans for K-12 teachers www.amstat.org/education/stew

www.amstat.org/education