36
1 ES9 50 40 30 20 10 60 50 40 30 20 10 Weight Height Regression Plot Y = 2.31464 + 1.28722X r = 0.559 Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

  • Upload
    zeal

  • View
    99

  • Download
    3

Embed Size (px)

DESCRIPTION

Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data. Regression Plot. Y = 2.31464 + 1.28722X r = 0.559. 6. 0. 5. 0. 4. 0. Weight. 3. 0. 2. 0. 1. 0. 1. 0. 2. 0. 3. 0. 4. 0. 5. 0. Height. Chapter Goals. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

1

ES9

5040302010

60

50

40

30

20

10

Weight

Height

Regression PlotY = 2.31464 + 1.28722X

r = 0.559

Chapter 3 ~ Descriptive Analysis &Presentation of Bivariate Data

Page 2: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

2

ES9

Chapter Goals

• To be able to present bivariate data in tabular and graphic form

• To gain an understanding of the distinction between the basic purposes of correlation analysis and regression analysis

• To become familiar with the ideas of descriptive presentation

Page 3: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

3

ES9

Three combinations of variable types:

1. Both variables are qualitative (attribute)

2. One variable is qualitative (attribute) and the other is quantitative (numerical)

3. Both variables are quantitative (both numerical)

3.1 ~ Bivariate Data

Bivariate Data: Consists of the values of two different response variables that are obtained from the same population of interest

Page 4: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

4

ES9

TV Radio NPMale 280 175 305Female 115 275 170

Two Qualitative Variables

• When bivariate data results from two qualitative (attribute or categorical) variables, the data is often arranged on a cross-tabulation or contingency table

Example: A survey was conducted to investigate the relationship between preferences for television, radio, or newspaper for national news, and gender. The results are given in the table below:

Page 5: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

5

ES9

Row Totals

760560

Col. Totals 395 450 475 1320

TV Radio NP

Male 280 175 305Female 115 275 170

Marginal Totals

• This table may be extended to display the marginal totals (or marginals). The total of the marginal totals is the grand total:

Note: Contingency tables often show percentages (relative frequencies). These percentages are based on the entire sample or on the subsample (row or column)

classifications.

Page 6: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

6

ES9

• The previous contingency table may be converted to percentages of the grand total by dividing each frequency by the grand total and multiplying by 100

Percentages Based on the Grand Total(Entire Sample)

– For example, 175 becomes 13.3%

TV Radio NP Row TotalsMale 21.2 13.3 23.1 57.6Female 8.7 20.8 12.9 42.4Col. Totals 29.9 34.1 36.0 100.0

1751320

100 133

.

Page 7: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

7

ES9

• These same statistics (numerical values describing sample results) can be shown in a (side-by-side) bar graph:

Illustration

0

5

10

15

20

25

TV Radio NP

Male

Female

Percentages Based on Grand Total

Percent

Media

Page 8: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

8

ES9Percentages Based on Row (Column) Totals

• The entries in a contingency table may also be expressed as percentages of the row (column) totals by dividing each row (column) entry by that row’s (column’s) total and multiplying by 100. The entries in the contingency table below are expressed as percentages of the column totals:

Note: These statistics may also be displayed in a side-by-side bar graph

Page 9: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

9

ES9One Qualitative & One Quantitative Variable

1. When bivariate data results from one qualitative and one quantitative variable, the quantitative values are viewed as separate samples

2. Each set is identified by levels of the qualitative variable

3. Each sample is described using summary statistics, and the results are displayed for side-by-side comparison

4. Statistics for comparison: measures of central tendency, measures of variation, 5-number summary

5. Graphs for comparison: dotplot, boxplot

Page 10: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

10

ES9

Example

Example: A random sample of households from three different parts of the country was obtained and their electric

bill for June was recorded. The data is given in the table below:

• The part of the country is a qualitative variable with three levels of response. The electric bill is a quantitative variable. The electric bills may be compared with numerical and graphical techniques.

Page 11: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

11

ES9

. . : . . . . . . ---+---------+---------+---------+---------+---------+--- Northeast

. :..:. .. ---+---------+---------+---------+---------+---------+--- Midwest

. . . . . . : . . ---+---------+---------+---------+---------+---------+--- West 24.0 32.0 40.0 48.0 56.0 64.0

Comparison Using Dotplots

• The electric bills in the Northeast tend to be more spread out than those in the Midwest. The bills in the West tend to be higher than both those in the Northeast and Midwest.

Page 12: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

12

ES9Comparison Using Box-and-Whisker Plots

Northeast Midwest West

20

30

40

50

60

70

ElectricBill

The Monthly Electric Bill

Page 13: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

13

ES9

Two Quantitative Variables

1. Expressed as ordered pairs: (x, y)

2. x: input variable, independent variabley: output variable, dependent variable

Scatter Diagram: A plot of all the ordered pairs of bivariate data on a coordinate axis system. The input variable x is plotted on the horizontal axis, and the output variable y is plotted on the vertical axis.

Note: Use scales so that the range of the y-values is equal to or slightly less than the range of the x-values. This creates a window that is approximately square.

Page 14: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

14

ES9

Example: In a study involving children’s fear related to being hospitalized, the age and the score each child made

on the Child Medical Fear Scale (CMFS) are given in the table below:

Age (x ) 8 9 9 10 11 9 8 9 8 11CMFS (y ) 31 25 40 27 35 29 25 34 44 19

Age (x ) 7 6 6 8 9 12 15 13 10 10CMFS (y ) 28 47 42 37 35 16 12 23 26 36

Example

Construct a scatter diagram for this data

Page 15: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

15

ES9

• age = input variable, CMFS = output variable

Solution

Child Medical Fear Scale

1514131211109876

50

40

30

20

10

CMFS

Age

Page 16: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

16

ES9

3.2 ~ Linear Correlation

• Measures the strength of a linear relationship between two variables

– As x increases, no definite shift in y: no correlation

– As x increases, a definite shift in y: correlation

– Positive correlation: x increases, y increases

– Negative correlation: x increases, y decreases

– If the ordered pairs follow a straight-line path: linear correlation

Page 17: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

17

ES9

• As x increases, there is no definite shift in y:

Example: No Correlation

302010

55

45

35

Output

Input

Page 18: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

18

ES9

• As x increases, y also increases:

Example: Positive Correlation

55504540353025201510

60

50

40

30

20

Output

Input

Page 19: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

19

ES9

• As x increases, y decreases:

Example: Negative Correlation

Output

Input

55504540353025201510

95

85

75

65

55

Page 20: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

20

ES9

Please Note

Perfect positive correlation: all the points lie along a line with positive slope

Perfect negative correlation: all the points lie along a line with negative slope

If the points lie along a horizontal or vertical line: no correlation

If the points exhibit some other nonlinear pattern: no linear relationship, no correlation

Need some way to measure correlation

Page 21: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

21

ES9

3.1 ~ Bivariate Data

Coefficient of Linear Correlation: r, measures the strength of the linear relationship between two variables

rx x y y

n s sx y

( )( )

( )1

Pearson’s Product Moment Formula:

1 1r

Notes: r = +1: perfect positive correlation r = -1 : perfect negative correlation

Page 22: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

22

ES9

Alternate Formula for r

SS “sum of squ ares for ( )x x” x

x

n 2

2

SS “sum of squ ares for ( )y y” y

y

n 2

2

SS “sum of squares for ( )xy xy” xyx y

n

rxy

x y SS

SS SS( )

( ) ( )

Page 23: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

23

ES9

Example: The table below presents the weight (in thousands of pounds) x and the gasoline mileage (miles per gallon) y for ten different automobiles. Find the linear

correlation coefficient:

Example

x y x2 y2 xy

x y x2 y2 xy

Page 24: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

24

ES9Completing the Calculation for r

SS( )

( ).y y

y

n 2

22

1066530910

1116 9

SS( ) .( . )( )

.xy xyx y

n 1010 9

34 1 30910

42 79

rxy

x y

SS

SS SS

( )

( ) ( )

.

( . )( . )0.

42 79

7 449 1116 947

SS( ) .

( . ).x x

x

n 2

22

123 7334 110

7 449

Page 25: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

25

ES9

Please Note

r is usually rounded to the nearest hundredth

r close to 0: little or no linear correlation

As the magnitude of r increases, towards -1 or +1, there is an increasingly stronger linear correlation between

the two variables

Method of estimating r based on the scatter diagram. Window should be approximately square. Useful for checking calculations.

Page 26: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

26

ES9

3.3 ~ Linear Regression

• Regression analysis finds the equation of the line that best describes the relationship between two variables

• One use of this equation: to make predictions

Page 27: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

27

ES9Models or Prediction Equations

• Some examples of various possible relationships:

Note: What would a scatter diagram look like to suggest each relationship?

y b b x 0 1

y a bx cx 2^

y ( )a bx^

y loga xb^

Linear:

Quadratic:

Exponential:

Logarithmic:

Page 28: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

28

ES9

Method of Least Squares

y• Predicted value:

( ) ( ( ))y y b b x 20 1

2y

• Least squares criterion:

– Find the constants b0 and b1 such that the sum

is as small as possible

b b x 0 1y• Equation of the best-fitting line:

Page 29: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

29

ES9

b b x 0 1y

Illustration• Observed and predicted values of y:

y y

x

y

y

( , )x y

) ( ,x y

y

Page 30: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

30

ES9

The Line of Best Fit Equation

• The equation is determined by:b0: y-intercept

b1: slope

bx x y y

x x

xyx1 2

( )( )

( )

( )( )

SSSS

)( 1

10 xby

n

xbyb

• Values that satisfy the least squares criterion:

Page 31: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

31

ES9

Example: A recent article measured the job satisfaction of subjects with a 14-question survey. The data

below represents the job satisfaction scores, y, and the salaries, x, for a sample of similar individuals:

1) Draw a scatter diagram for this data

2) Find the equation of the line of best fit

Example

Page 32: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

32

ES9

• Preliminary calculations needed to find b1 and b0:

Finding b1 & b0

x y x2 xy

x y x2 xy

Page 33: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

33

ES9

Line of Best Fit

SS( )( )( )

.xy xyx y

n

4009

234 1338

118 75

bxyx1

118 75229 5

5174 SSSS

( )( )

..

0.

b

y b x

n01 133 5174 234

814902

(0. )( )

.

SS( ) .x x

x

n

2

22

7074234

8229 5

Equation of the line of best fit: . 0. x 149 517y Solution 1)

Page 34: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

34

ES9

Scatter Diagram

21 23 25 27 29 31 33 35 37

12

13

14

15

16

17

18

19

20

21

22

JobSatisfaction

Salary

Job Satisfaction SurveySolution 2)

Page 35: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

35

ES9

Please Note

Keep at least three extra decimal places while doing the calculations to ensure an accurate answer

When rounding off the calculated values of b0 and b1, always keep at least two significant digits in the final

answer

The slope b1 represents the predicted change in y per unit increase in x

The y-intercept is the value of y where the line of best fit intersects the y-axis

( , )x y The line of best fit will always pass through the point

Page 36: Chapter 3 ~ Descriptive Analysis & Presentation of Bivariate Data

36

ES9

Making Predictions

1. One of the main purposes for obtaining a regression equation is for making predictions

y2. For a given value of x, we can predict a value of

3. The regression equation should be used to make predictions only about the population from which the sample was drawn

4. The regression equation should be used only to cover the sample domain on the input variable. You can estimate values outside the domain interval, but use caution and use values close to the domain interval.

5. Use current data. A sample taken in 1987 should not be used to make predictions in 1999.