24
Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola

Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Embed Size (px)

Citation preview

Page 1: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-1Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Lecture Slides

Elementary Statistics Twelfth Edition

and the Triola Statistics Series

by Mario F. Triola

Page 2: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-2Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Chapter 10Correlation and Regression

10-1 Review and Preview

10-2 Correlation

10-3 Regression

10-4 Prediction Intervals and Variation

10-5 Multiple Regression

10-6 Nonlinear Regression

Page 3: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-3Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Key Concept

This section presents a method for analyzing a linear relationship involving more than two variables.

We focus on these key elements:

1. Finding the multiple regression equation.

2. The values of the adjusted R2, and the P-value as measures of how well the multiple regression

equation fits the sample data.

Page 4: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-4Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Part 1: Basic Concepts of a Multiple Regression Equation

Page 5: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-5Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Definition

A multiple regression equation expresses a linear relationship between a response variable y and two or more predictor variables

The general form of the multiple regression equation obtained from sample data is

1 2 3( , , ..., )kx x x x

0 1 1 2 2ˆ ... k ky b b x b x b x

Page 6: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-6Copyright © 2014, 2012, 2010 Pearson Education, Inc.

(General form of the multiple regression equation)

n = sample size

k = number of predictor variables

ŷ = predicted value of y

are the predictor variables

Notation

0 1 1 2 2ˆ ... k ky b b x b x b x

1 2 3( , , ..., )kx x x x

Page 7: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-7Copyright © 2014, 2012, 2010 Pearson Education, Inc.

TechnologyUse a statistical software package such as

STATDISK

Minitab

Excel

TI-83/84

StatCrunch

Page 8: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-8Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example

Table 10-4 includes a random sample of heights of mothers, fathers, and their daughters (based on data from the National Health and Nutrition Examination).

Find the multiple regression equation in which the response (y) variable is the height of a daughter and the predictor (x) variables are the height of the mother and height of the father.

Page 9: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-9Copyright © 2014, 2012, 2010 Pearson Education, Inc.

ExampleThe Minitab results are shown here:

Page 10: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-10Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Definition

The multiple coefficient of determination, R2, is a measure of how well the multiple regression equation fits the sample data. A perfect fit would result in R2 = 1.

The adjusted coefficient of determination, R2, is the multiple coefficient of determination modified to account for the number of variables and the sample size.

Page 11: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-11Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Adjusted Coefficient of Determination

Adjusted

where n = sample size k = number of predictor (x) variables

2 2( 1)1 (1 )[ ( 1)]

nR R

n k

Page 12: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-12Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example

The preceding technology display shows the adjusted coefficient of determination as R-Sq(adj) = 63.7%.

When we compare this multiple regression equation to others, it is better to use the adjusted R2 of 63.7%

Page 13: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-13Copyright © 2014, 2012, 2010 Pearson Education, Inc.

P-Value

The P-value is a measure of the overall significance of the multiple regression equation.

The displayed technology P-value of 0.000 is small, indicating that the multiple regression equation has good overall significance and is usable for predictions.

That is, it makes sense to predict the heights of daughters based on heights of mothers and fathers.

The value of 0.000 results from a test of the null hypothesis that β1 = β2 = 0, and rejection of this hypothesis indicates the equation is effective in predicting the heights of daughters.

Page 14: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-14Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Finding the Best MultipleRegression Equation

1. Use common sense and practical considerations to include or exclude variables.

2. Consider the P-value. Select an equation having overall significance, as determined by the P-value found in the computer display.

Page 15: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-15Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Finding the Best MultipleRegression Equation

3. Consider equations with high values of adjusted R2 and try to include only a few variables.

• If an additional predictor variable is included, the value of adjusted R2 does not increase by a substantial amount.

• For a given number of predictor (x) variables, select the equation with the largest value of adjusted R2.

• In weeding out predictor (x) variables that don’t have much of an effect on the response (y) variable, it might be helpful to find the linear correlation coefficient r for each of the paired variables being considered.

Page 16: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-16Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example

Data Set 2 in Appendix B includes the age, foot length, shoe print length, shoe size, and height for each of 40 different subjects.

Using those sample data, find the regression equation that is the best for predicting height.

The table on the next slide includes key results from the combinations of the five predictor variables.

Page 17: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-17Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - Continued

Page 18: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-18Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - Continued

Using critical thinking and statistical analysis:

1.Delete the variable age.

2.Delete the variable shoe size, because it is really a rounded form of foot length.

3.For the remaining variables of foot length and shoe print length, select foot length because its adjusted R2 of 0.7014 is greater than 0.6520 for shoe print length.

4.Although it appears that only foot length is best, we note that criminals usually wear shoes, so shoe print lengths are likely to be found than foot lengths.

Page 19: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-19Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Part 2: Dummy Variables and Logistic Equations

Page 20: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-20Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Dummy Variable

Many applications involve a dichotomous variable which has only two possible discrete values (such as male/female, dead/alive, etc.).

A common procedure is to represent the two possible discrete values by 0 and 1, where 0 represents “failure” and 1 represents “success”.

A dichotomous variable with the two values 0 and 1 is called a dummy variable.

Page 21: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-21Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example

The data in the table also includes the dummy variable of sex (coded as 0 = female and 1 = male).

Given that a mother is 63 inches tall and a father is 69 inches tall, find the regression equation and use it to predict the height of a daughter and a son.

Page 22: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-22Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example

Page 23: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-23Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Example - Continued

Using technology, we get the regression equation:

We substitute in 0 for the height, 63 for the mother, and 69 for the father, and predict the daughter will be 62.8 inches tall.

We substitute in 1 for the height, 63 for the mother, and 69 for the father, and predict the son will be 67 inches tall.

Page 24: Section 10.5-1 Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series

Section 10.5-24Copyright © 2014, 2012, 2010 Pearson Education, Inc.

Logistic Regression

We can use the methods of this section if the dummy variable is the predictor variable.

If the dummy variable is the response variable we need to use a method known as logistic regression.

As the name implies logistic regression involves the use of natural logarithms.

This textbook does not include detailed procedures for using logistic regression (there is a brief example on page 546 of the text).