58
The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth © 2011 W.H. Freeman and Company

The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Embed Size (px)

DESCRIPTION

© 2010 Pearson Education 3 Class Exercise – Real Estate, House Prices

Citation preview

Page 1: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

The Practice of Statistics for Business and EconomicsThird EditionDavid S. Moore

George P. McCabeLayth C. AlwanBruce A. CraigWilliam M. Duckworth

© 2011 W.H. Freeman and Company

Page 2: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Examining Relationships Scatterplots

PSBE Chapter 2.1

© 2011 W. H. Freeman and Company

Page 3: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 3

Class Exercise – Real Estate, House Prices

0 1 2 3 4 5 6 7 8 9 -

100

200

300

400

500

600

700

Scatter Plot of Price versusNumber of Bathrooms

Number of Bathrooms

Pric

e (in

$1,

000)

Page 4: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 4

A scatterplot, which plots one quantitative variable against another, can be an effective display for data.

Scatterplots are the ideal way to picture associations between two quantitative variables.

Page 5: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 5

Assigning Roles to Variables in Scatterplots

To make a scatterplot of two quantitative variables, assign one to the y-axis and the other to the x-axis.

Be sure to label the axes clearly, and indicate the scales of the axes with numbers.

Each variable has units, and these should appear with the display—usually near each axis.

Page 6: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 6

Assigning Roles to Variables in Scatterplots

Each point is placed on a scatterplot at a position that corresponds to values of the two variables.

The point’s horizontal location is specified by its x-value, and its vertical location is specified by its y-value variable.

Together, these variables are known as coordinates and written (x, y).

Page 7: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 7

Assigning Roles to Variables in Scatterplots

One variable plays the role of the explanatory or predictor variable, while the other takes on the role of the response variable.

We place the explanatory variable on the x-axis and the response variable on the y-axis.

The x- and y-variables are sometimes referred to as the independent and dependent variables, respectively. In this class, use the terms explanatory or predictor variable (x) and the response variable (y).

Page 8: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 8

Looking at Scatterplots – Diamond PricesCarat Price0.33 10790.33 10790.39 10300.4 11500.41 11100.42 12100.42 12100.46 15700.47 21130.48 21470.51 17700.56 17200.61 25000.62 31160.63 31650.64 26000.7 30800.7 33900.71 34400.71 35300.71 44810.72 45620.75 50690.8 58470.83 4930

Which variable will be the explanatory variable and which will be the response variable?

Page 9: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 9

Looking at Scatterplots – Diamond PricesCarat Price0.33 10790.33 10790.39 10300.4 11500.41 11100.42 12100.42 12100.46 15700.47 21130.48 21470.51 17700.56 17200.61 25000.62 31160.63 31650.64 26000.7 30800.7 33900.71 34400.71 35300.71 44810.72 45620.75 50690.8 58470.83 4930

Page 10: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 10

Looking at Scatterplots

The direction of the association is important.

A pattern that runs from the upper left to the lower right is said to be negative.

A pattern running from the lower left to the upper right is called positive.

Page 11: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 11

Looking at Scatterplots – Diamond PricesCarat Price0.33 10790.33 10790.39 10300.4 11500.41 11100.42 12100.42 12100.46 15700.47 21130.48 21470.51 17700.56 17200.61 25000.62 31160.63 31650.64 26000.7 30800.7 33900.71 34400.71 35300.71 44810.72 45620.75 50690.8 58470.83 4930

Direction?

Positive

Page 12: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 12

Looking at Scatterplots

The second thing to look for in a scatterplot is its form.

If there is a straight line relationship, it will appear as a cloud or swarm of points stretched out in a generallyconsistent, straight form. This is called linear form.

Sometimes the relationship curves gently, while still increasing or decreasing steadily; sometimes it curves sharply up then down.

Page 13: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 13

Looking at Scatterplots – Diamond PricesCarat Price0.33 10790.33 10790.39 10300.4 11500.41 11100.42 12100.42 12100.46 15700.47 21130.48 21470.51 17700.56 17200.61 25000.62 31160.63 31650.64 26000.7 30800.7 33900.71 34400.71 35300.71 44810.72 45620.75 50690.8 58470.83 4930

Form?

Linear

Page 14: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 14

Looking at Scatterplots

The third feature to look for in a scatterplot is the strength of the relationship.

Do the points appear tightly clustered in a single stream or do the points seem to be so variable and spread out that we can barely discern any trend or pattern?

Page 15: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 15

Looking at Scatterplots – Diamond PricesCarat Price0.33 10790.33 10790.39 10300.4 11500.41 11100.42 12100.42 12100.46 15700.47 21130.48 21470.51 17700.56 17200.61 25000.62 31160.63 31650.64 26000.7 30800.7 33900.71 34400.71 35300.71 44810.72 45620.75 50690.8 58470.83 4930

Strength?

Moderately Strong

Page 16: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 16

Looking at Scatterplots

Finally, always look for the unexpected.

An outlier is an unusual observation, standing away from the overall pattern of the scatterplot.

Page 17: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 17

Looking at Scatterplots – Diamond PricesCarat Price0.33 10790.33 10790.39 10300.4 11500.41 11100.42 12100.42 12100.46 15700.47 21130.48 21470.51 17700.56 17200.61 25000.62 31160.63 31650.64 26000.7 30800.7 33900.71 34400.71 35300.71 44810.72 45620.75 50690.8 58470.83 4930

Outliers?

No Outliers

Page 18: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Examining relationshipsMost statistical studies involve more than one variable.

Questions: What individuals do the data describe?

What variables are present and how are they measured?

Are all of the variables quantitative?

Do some of the variables explain or even cause changes in other variables?

Page 19: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Looking at relationships

Start with a graph

Look for an overall pattern and deviations from the pattern

Use numerical descriptions of the data and overall pattern (if appropriate)

Page 20: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Explanatory and response variables

A response variable measures or records an outcome of a study. Also called dependent variable.

An explanatory variable explains changes in the response variable (also called independent variable).

Page 21: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Scatterplot

A scatterplot shows the relationship between two quantitative variables measured on the same individuals.

Typically, the explanatory or independent variable is plotted on the x axis, and the response or dependent variable is plotted on the y axis.

Each individual in the data appears as a point in the plot.

Page 22: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Scatterplot exampleBotnet Bots Spams

Srizbi 315 60

Bobax 185 9

Rustock 150 30

Cutwail 125 16

Storm 85 3

Grum 50 2

Ozdok 35 10

Nucrypt 20 5

Wopla 20 0.06

Spamthru 10 0.035

Here, we have two quantitative variables for each of 10 botnets:

•Number of bots (thousands)

•Spams per day (billions)

We are interested in the relationship between the two variables: How is one affected by changes in the other one?

Page 23: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Botnet Bots Spams

Srizbi 315 60

Bobax 185 9

Rustock 150 30

Cutwail 125 16

Storm 85 3

Grum 50 2

Ozdok 35 10

Nucrypt 20 5

Wopla 20 0.06

Spamthru 10 0.035

Scatterplot example

Page 24: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

ScatterplotsSome plots don’t have clear explanatory and response variables.

Do calories explain sodium amounts?

Page 25: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

ScatterplotsSome plots don’t have clear explanatory and response variables.

Does percent return on Treasury bills explain percent return on common stocks?

Page 26: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Interpreting scatterplots

After plotting two variables on a scatterplot, we describe the relationship by examining the form, direction, and strength of the association. We look for an overall pattern …

Form: linear, curved, clusters, no pattern

Direction: positive, negative, no direction

Strength: how closely the points fit the “form”

… and deviations from that pattern.

Outliers

Page 27: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Form and direction of an association

Linear

Nonlinear

No relationship

Page 28: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Positive association: High values of one variable tend to occur together with high values of the other variable.

Negative association: High values of one variable tend to occur together with low values of the other variable.

Page 29: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

No relationship: X and Y vary independently. Knowing X tells you nothing about Y.

Page 30: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Strength of the association

The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form.

With a strong relationship, you can get a pretty good estimate

of y if you know x.

With a weak relationship, for any x you might get a wide range of

y values.

Page 31: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Strength of the association

This is a weak relationship. For a particular state median household income, you can’t predict the state per capita income very well.

Page 32: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Strength of the association

This is a very strong relationship. The daily amount of gas consumed can be predicted quite accurately for a given temperature value.

Page 33: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Stronger association?

Two scatterplots of the same data.

The straight-line pattern in the lower plot appears stronger because of the surrounding open space.

Page 34: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

How to scale a scatterplot

Using an inappropriate scale for a scatterplot can give an incorrect impression.

Both variables should be given a similar amount of space:• Plot roughly square• Points should occupy all

the plot space (no blank space)

Page 35: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

OutliersAn outlier is a data value that has a low probability of occurrence (i.e., it is

unusual or unexpected).

In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.

Page 36: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

OutliersThe upper-right-hand point here is not an outlier of the relationship—It is what you would expect for this number of bots given the linear relationship between spams per day and bots.

This point is not in line with theothers, so it is an outlier of the relationship.

Not an outlier

Outlier

Page 37: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Outliers

IQ score and Grade Point Average

a)Describe in words what this plot shows.

b)Describe the direction, shape, and strength. Are there outliers?

c) What might explain these people?

Page 38: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Categorical variables in scatterplots

Often, things are not simple and one-dimensional. We need to group the data into categories to reveal trends.

What may look like a positive linear relationship is in fact a series of negative linear associations.

Plotting different habitats in different colors allows us to make that important distinction.

Page 39: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Categorical variables in scatterplotsComparison of men’s and women’s racing records over time. Each group shows a very strong negative linear relationship that would not be apparent without the gender categorization.

Page 40: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Categorical variables in scatterplotsRelationship between lean body mass and metabolic rate in men and women. Both men and women follow the same positive linear trend, but women show a stronger association. As a group, males typically have larger values for both variables.

Page 41: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Categorical explanatory variablesWhen the explanatory variable is categorical, you cannot make a scatterplot, but you can compare the different categories side-by-side on the same graph (boxplots, or mean +/ standard deviation).

Comparison of income (quantitative response variable) for different education levels (five categories).

But be careful in your interpretation: This is NOT a positive association because education is not quantitative.

Page 42: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Examining RelationshipsCorrelation

PSBE Chapter 2.2

© 2011 W.H. Freeman and Company

Page 43: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 45

Understanding Correlation

Correlation ConditionsCorrelation measures the strength of the linear association between two quantitative variables.

Page 44: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Objectives (PSBE Chapter 2.2)

Correlation The correlation coefficient “ r ” r does not distinguish between x and y r has no units of measurement r ranges from -1 to +1 r is strongly affected by influential points an outliers

Page 45: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 47

Understanding Correlation

The ratio of the sum of the product zxzy for every point in the scatterplot to n – 1 is called the correlation coefficient.

1x yz z

rn

Two of the more common alternative formulas for correlation are:

2 2 1 x y

x x y y x x y yr

n s sx x y y

Page 46: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

The correlation coefficient “r”

Bots: x = 99.5, sx = 96.9

Spams per day: y = 13.51 sy = 18.71

= 0.885

Page 47: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 49

Understanding Correlation

Correlation Conditions

Before you use correlation, you must check three conditions:

• Quantitative Variables Condition: Correlationapplies only to quantitative variables.

• Linearity Condition: Correlation measures the strength only of the linear association.

• Outlier Condition: Unusual observations can distort the correlation.

Page 48: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

No matter how strong the association, r does not describe curved relationships.

Correlation only describes linear relationships

Page 49: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 51

Understanding Correlation

Correlation Properties

• The sign of a correlation coefficient gives the direction of the association.

• Correlation is always between –1 and +1.• Correlation measures the strength of the linear association

between the two variables.• Correlation treats x and y symmetrically.• Correlation has no units.• Correlation is not affected by changes in the center or scale

of either variable.• Correlation is sensitive to unusual observations.

Page 50: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

The correlation coefficient “r”

The correlation coefficient is a measure of the direction and strength of a linear relationship.

It is calculated using the mean and the standard deviation of both the x and y variables.

Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.

Page 51: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Facts about correlation

r ignores the distinction between response and explanatory variables

r measures the strength and direction of a linear relationship between two quantitative variables

r is not affected by changes in the unit of measurement

Positive value of r means association between the two variables is positive

Negative value of r means association between the variables is negative

r is always between -1 and +1

r is strongly affected by outliers

Page 52: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

“r” ranges from -1 to +1

Strength: how closely the points follow a straight line.

Direction: is positive when individuals with higher X values tend to have higher values of Y.

Page 53: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

Review example

Estimate r1. r = 1.002. r = -0.943. r = 1.124. r = 0.945. r = 0.21

(in 1000’s)

Page 54: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 56

Understanding Correlation

Correlation Tables

Sometimes the correlations between each pair of variables in a data set are arranged in a table like the one below.

Page 55: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 60

Lurking Variables and Causation

There is no way to conclude from a high correlation alone that one variable causes the other.

There’s always the possibility that some third variable—a lurking variable—is simultaneously affecting both of the variables you have observed.

Page 56: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 61

What Can Go Wrong?

• Don’t say “correlation” when you mean “association.”

• Don’t correlate categorical variables.

• Make sure the association is linear.

• Beware of outliers.

• Don’t confuse correlation with causation.

• Watch out for lurking variables.

Page 57: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 62

What Have We Learned?• Begin our investigation by looking at a scatterplot.

• The sign of the correlation tells us the direction of the association.

• The magnitude of the correlation tells us of the strength of a linear association.

• Correlation has no units, so shifting or scaling the data, standardizing, or even swapping the variables has no effect on the numerical value.

Page 58: The Practice of Statistics for Business and Economics Third Edition David S. Moore George P. McCabe Layth C. Alwan Bruce A. Craig William M. Duckworth

© 2010 Pearson Education 63

What Have We Learned?

To use correlation we have to check certain conditions forthe analysis to be valid:

• Check the Linearity Condition.

• Watch out for unusual observations.

We’ve learned not to make the mistake of assuming that a high correlation or strong association is evidence of a cause-and-effect relationship.