26
SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6 AP Statistics 1 The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning. Stephen Jay Gould (1941 - 2002)

SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

S C A T T E R P L O T S , A S S O C I A T I O N , A N D C O R R E L A T I O N

C H A P 6

AP Statistics1

The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.

Stephen Jay Gould (1941 - 2002)

Page 2: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Relationship between 2 Quantitative Variables

2

In the mid-20th century Dr. Mildred Trotter, a forensic anthropologist, determined relationships between dimensions of various bones and a person’s height. The relationships she found are still in use today in the effort to identify missing persons based on skeletal remains (think C.S.I) .

Femur

One relationship compares the length of the femur to the person’s height.

Measure the length of your femur along the outside of your leg (in inches). Also measure your height (also in inches) if you don’t already know it. Record both at the front of the room.

Page 3: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Draw a Picture, Draw a Picture, Draw a Picture …

3

To display the relationship between two quantitative variables we will use a graphical display known as a scatterplot.

For the femur-height data we just collected, plot the femur length on the horizontal axis and height on the vertical axis (more on which variable goes on which axis later in the lesson).

Make sure to label the axes and provide scale for the axes. Note – you do not need to (and often should not) include the origin in your plot.

Page 4: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Describe the Association between Variables4

When we described the distribution of a single quantitative variable (univariate data) we described the shape, center, spread, and unusual features.

When we describe the association between two quantitative variables (bivariate data), we describe:

1. Direction – positive association or negative association

2. Form – linear, curved, or no particular pattern

3. Strength – amount of scatter around the form

4. Unusual features – outliers or subgroups of data

Describe the association between femur length and height. Don’t forget the W’s (or at least the Who and What) in the description.

Page 5: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

TI Tips (p. 154)5

See this TI Tips for one way to create a list with a name more meaningful than L1, L2, etc.

This TI Tips also has instructions on making a scatterplot.

Page 6: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Roles for Variables6

• Do heavier smokers develop lung cancer at younger ages (than light or non-smokers)?

• Is birth order an important factor in predicting future income?

• Can we estimate a person’s % body fat more simply by just measuring waist or wrist size?

Notice in each situation one variable plays the role of “predictor variable” (aka “explanatory variable”), while the other variable is the “response variable.”

explanatory

response

Page 7: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation7

Correlation is a statistic that measures the strength and direction of a linearassociation between two quantitative variables.

Page 8: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation8

Correlation is a statistic that measures the strength and direction of a linearassociation between two quantitative variables.

Data from the Forensic Anthropology Data Bank (Tennessee)

Femur length (mm)

Height (cm)

Page 9: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation9

Standardize both variables (find their z-scores) , ,x y

x y

x x y yz z

s s

Page 10: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation10

Note the standardizing puts both variables on the same scale, that is a scale with no units (rather than mm and cm). Also note the direction, form, and strength are still the same as the original.

Standardized Variables Original Variables

Page 11: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation11

Correlation is calculated as:1

x yz zr

n

Note: The numerator is the sum of a bunch of products. Recall the product is positive if both numbers are positive or if both are negative.

Notice that most of the standardized points are in the first or third quadrants where both coordinates are positive or negative, hence a positive product.

So the sum of all of the products will be positive, hence the correlation is positive.

Page 12: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation12

For correlation to be appropriate, you must have:

• Two quantitative variables

• Linear association between the variables

See TI Tips on p. 158 for calculator instructions on calculating correlation.

Page 13: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation Properties (see p. 158)13

• The sign of a correlation gives the direction.

• Correlation is always between -1 and +1, inclusive

• It doesn’t matter which variable is x and which is y – correlation is the same either way.

• Correlation has no units.

• Changing scale doesn’t have an affect on correlation.

• Correlation only measures linear relationships.

• If association is linear, correlation tells strength and direction.

• Correlation is sensitive to outliers.

Page 14: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation Properties (see p. 152)14

How strong is strong?

It depends.

In the Natural Sciences (Chem., Phys., Engr., etc.):

• Strong 0.9 or more (-0.9 or “more”)

• Moderate 0.7 to 0.9 (-0.7 to -0.9)

• Weak 0 to 0.7 (0 to -0.7)

In the Social Sciences (Sociology., Psychology., Education., etc.):

• Strong 0.5 or more (-0.5 or “more”)

• Moderate 0.25 to 0.5 (-0.25 to -0.5)

• Weak 0 to 0.25 (0 to -0.25)

Show a scatterplot and use whatever adjectives you want!

Page 15: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Straightening Scatterplots15

Not all scatterplots show a linear pattern. But many statistical analysis techniques require a linear pattern.

So, we straighten up our act …

Page 16: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Straightening Scatterplots16

Camera lenses have an adjustable opening (aperture) whose size is referred to as the f/stop. Changing the aperture alters the amount of light let in through the lens.

Here are the optimal aperture settings for different shutter speeds (in seconds):

Speed 1 1000 1 500 1 250 1 125 1 60 1 30 1 15 1 8

f/stop 2.8 4 5.6 8 11 16 22 32

Page 17: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Straightening Scatterplots17

Transform f/stop by squaring, and create a new plot. Notice the linear association between (f/stop)^2 and shutter speed. See TI Tips p. 162.

Plot f/stop as the response variable (y-axis) and notice the non-linear pattern.

Page 18: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Correlation Causation18

Data from Oldenburg, Germany (beginning in the 1930s)

There is a roughly linear association between the number of storks and the human population (storks bring babies, right?)

Turns out that storks like chimneys – more people, more chimneys, more storks.

Page 19: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Beware the Lurking Variable19

A hidden variable that actually affects the observed relationship between two other variables.

The Japanese eat very little fat and suffer fewer heart attacks than the Americans.

Fat in diet

Heart attacks

Page 20: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Beware the Lurking Variable20

A hidden variable that actually affects the observed relationship between two other variables.

The Mexicans eat a lot of fat and suffer fewer heart attacks than the Americans.

Fat in diet

Heart attacks

Page 21: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Beware the Lurking Variable21

A hidden variable that actually affects the observed relationship between two other variables.

The Japanese drink very little red wine and suffer fewer heart attacks than the Americans.

Red wine in diet

Heart attacks

Page 22: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Beware the Lurking Variable22

A hidden variable that actually affects the observed relationship between two other variables.

The French drink excessive amounts of red wine and suffer fewer heart attacks than the Americans.

Red wine in diet

Heart attacks

Page 23: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Beware the Lurking Variable23

A hidden variable that actually affects the observed relationship between two other variables.

The Germans drink a lot of beer and eat lots of sausages and fats and suffer fewer heart attacks than the Americans.

Beer and fat in diet

Heart attacks

Page 24: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Beware the Lurking Variable24

A hidden variable that actually affects the observed relationship between two other variables.

CONCLUSION: Eat and drink what you like. Speaking English is apparently what kills you!

Page 25: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Beware of Outliers25

Note how the outlier creates the impression of a stronger linear association between the variables than is probably realistic to consider.

Page 26: SCATTERPLOTS, ASSOCIATION, AND CORRELATION CHAP 6

Assignment26

Read Chapter 6

Do Ch 6 exercises #1-15 odd, 19, 25-31 odd, 35, 39