Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 1

Regression

Using Correlation To Make Predictions

Anthony Greene 2

Making a prediction

xy zrz ˆ

To obtain the predicted value of y based on a known value of x and a known correlation.

Note what happens for positive and negative values of r and for high and low values of r and for near-zero values of r.

Anthony Greene 3

Graph of y = 5 – 3 x

Anthony Greene 4

y-Intercept and Slope

For a linear equation y = a + bx, the constant a is the y-intercept and the constant b is the slope.

x and y are related variables

Anthony Greene 5

Straight-line graphs of three linear equations

Y = a + bXa = y-interceptb = slope (rise/run)

Anthony Greene 6

Graphical Interpretation of Slope

The straight-line graph of the linear equation y = a +bx slopes upward if b > 0, slopes downward if b < 0, and is horizontal if b = 0

Anthony Greene 7

Graphical interpretation of slope

Anthony Greene 8

Four data points

Anthony Greene 9

Scatter plot

Anthony Greene 10

Two possible straight-line fits to the data points

Anthony Greene 11

Determining how well the data points in are fit by Line A Vs.Line B

Anthony Greene 12

Least-Squares Criterion

The straight line that best fits a set of data points is the one having the smallest possible sum of squared errors. Recall that the sum of squared errors is error variance.

Anthony Greene 13

Regression Line and Regression Equation

Regression line: The straight line that best fits a set of data points according to the least-squares criterion.Regression equation: The equation of the regression line.

Anthony Greene 14

The best-fit line minimizes the distance between the actual data and the predicted value

Anthony Greene 15

Residual, e, of a data point

Anthony Greene 16

nyxxyS

formula nalcomputatio Or the

We define SSx, SSP and SSy by

Notation Used in Regression and Correlation

Anthony Greene 17

Regression Equation

P 1 and where

The regression equation for a set of n data points is

xy bMMa

Anthony Greene 18

The relationship between b and r

• That is, the regression slope is just the correlation coefficient scaled up to the right size for the variables x and y.

same thehave and becausewhich

Anthony Greene 19

bMMaMbMbxy

recall ˆ

Anthony Greene 20

Criterion for Finding a Regression Line

Before finding a regression line for a set of data points, draw a scatter diagram. If the data points do not appear to be scattered about a straight line, do not determine a regression line.

Anthony Greene 21

Linear regression requires linear data:(a) Data points scattered about a curve (b) Inappropriate straight line fit to the dataHigher order regression equations exist but are outside the range of this course

Anthony Greene 22

Uniform Variance

0102030405060708090100

1 2 3 4 5

Math Proficiency By Grade

Anthony Greene 23

Assumptions for Regression Inferences

Anthony Greene 24

Table for obtaining the three sums of squares for the used car data

Anthony Greene 25

Regression line and data points for used car data

What is a fair asking price for a 2.5 year old car?

71.172ˆ

5.226.2047.195ˆ

26.2047.195ˆ

So since the price unit is $100s, the best prediction is $17,271

Anthony Greene 26

Extrapolation in the used car example

Total sum of squares, SST: The variation in the observed values of the response variable:

Regression sum of squares, SSR: The variation in the observed values of the response variable that is explained by the regression:

Error sum of squares, SSE: The variation in the observed values of the response variable that is not explained by the regression:

Sums of Squares in Regression

xxxyy SSMySSR 22ˆ

yyy SMySST 2

xyyy S

SSyySSE

Anthony Greene 28

Regression Identity

The total sum of squares equals the regression sum of squares plus the error sum of squares. In symbols,

SST = SSR + SSE.

Anthony Greene 29

Graphical portrayal of regression for used cars

y = a + bx

Anthony Greene 30

What sort of things could regression be used for?

Any instance where a known correlation exists, regression can be used to predict a new score. Examples:

1. If you knew that there was a past correlation between the amount of study time and the grade on an exam, you could make a good prediction about the grade before it happened.

2. If you knew that certain features of a stock correlate with its price, you can use regression to predict the price before it happens.

Anthony Greene 31

Regression Example: Low Correlation

0 50 100 150 200 250 300 350weight

Find the regression equation for predicting height based on knowledge of weight. The existing data is for 10 male stats students?

Anthony Greene 32

287.00 75.00 21,525.00 82,369.00 5,625.00

300.00 71.00 21,300.00 90,000.00 5,041.00

255.00 80.00 20,400.00 65,025.00 6,400.00

180.00 69.00 12,420.00 32,400.00 4,761.00

130.00 70.00 9,100.00 16,900.00 4,900.00

215.00 77.00 16,555.00 46,225.00 5,929.00

165.00 71.00 11,715.00 27,225.00 5,041.00

240.00 71.00 17,040.00 57,600.00 5,041.00

160.00 72.00 11,520.00 25,600.00 5,184.00

150.00 65.00 9,750.00 22,500.00 4,225.002,082.00 721.00 151,325.00 465,844.00 52,147.00

Anthony Greene 33

287.00 75.00 21,525.00 82,369.00 5,625.00

300.00 71.00 21,300.00 90,000.00 5,041.00

255.00 80.00 20,400.00 65,025.00 6,400.00

180.00 69.00 12,420.00 32,400.00 4,761.00

130.00 70.00 9,100.00 16,900.00 4,900.00

215.00 77.00 16,555.00 46,225.00 5,929.00

165.00 71.00 11,715.00 27,225.00 5,041.00

240.00 71.00 17,040.00 57,600.00 5,041.00

160.00 72.00 11,520.00 25,600.00 5,184.00

150.00 65.00 9,750.00 22,500.00 4,225.002,082.00 721.00 151,325.00 465,844.00 52,147.00

X Y XY X2 Y2

Anthony Greene 34

287.00 75.00 21,525.00 82,369.00 5,625.00

300.00 71.00 21,300.00 90,000.00 5,041.00

255.00 80.00 20,400.00 65,025.00 6,400.00

180.00 69.00 12,420.00 32,400.00 4,761.00

130.00 70.00 9,100.00 16,900.00 4,900.00

215.00 77.00 16,555.00 46,225.00 5,929.00

165.00 71.00 11,715.00 27,225.00 5,041.00

240.00 71.00 17,040.00 57,600.00 5,041.00

160.00 72.00 11,520.00 25,600.00 5,184.00

150.00 65.00 9,750.00 22,500.00 4,225.002,082.00 721.00 151,325.00 465,844.00 52,147.00

X Y XY X2 Y2

Anthony Greene 35

SSx = x2 - (x)2/n = 465,844-433,472.4 = 32,372

SP = xy - x y/n = 151,325-150, 112.2

b=SP/SSx, so b = 1,213/32,372=0.03

a = (1/n)(y-bx), so a = 0.1(721-60.38) = 66

So, Y=0.03x+66

X Y XY X2 Y2

2,082 721 151,325 465,844 52,147

Anthony Greene 36

Y=0.03x+66^

0 50 100 150 200 250 300 350weight

Anthony Greene 37

Regression Example: High Correlation

Find the regression equation for predicting probability of a teenage suicide attempt based on weekly heroine usage.

0102030405060708090100

1 2 3 4 5 6 7

200020012002

X Y XY X2 Y2

1 0.2 0.2 1 0.04

1 0.31 0.31 1 0.0961

1 0.18 0.18 1 0.0324

2 0.27 0.54 4 0.0729

2 0.38 0.76 4 0.1444

2 0.46 0.92 4 0.2116

3 0.9 2.7 9 0.81

3 0.58 1.74 9 0.3364

3 0.45 1.35 9 0.2025

4 0.84 3.36 16 0.7056

4 0.74 2.96 16 0.5476

4 0.68 2.72 16 0.4624

5 0.85 4.25 25 0.7225

5 0.78 3.9 25 0.6084

5 0.73 3.65 25 0.5329

6 0.88 5.28 36 0.7744

6 0.82 4.92 36 0.6724

6 0.78 4.68 36 0.6084

7 0.92 6.44 49 0.8464

7 0.85 5.95 49 0.7225

7 0.91 6.37 49 0.8281

84 13.51 63.18 420 9.9779

X Y XY X2 Y2

1 0.2 0.2 1 0.04

1 0.31 0.31 1 0.0961

1 0.18 0.18 1 0.0324

2 0.27 0.54 4 0.0729

2 0.38 0.76 4 0.1444

2 0.46 0.92 4 0.2116

3 0.9 2.7 9 0.81

3 0.58 1.74 9 0.3364

3 0.45 1.35 9 0.2025

4 0.84 3.36 16 0.7056

4 0.74 2.96 16 0.5476

4 0.68 2.72 16 0.4624

5 0.85 4.25 25 0.7225

5 0.78 3.9 25 0.6084

5 0.73 3.65 25 0.5329

6 0.88 5.28 36 0.7744

6 0.82 4.92 36 0.6724

6 0.78 4.68 36 0.6084

7 0.92 6.44 49 0.8464

7 0.85 5.95 49 0.7225

7 0.91 6.37 49 0.8281

84 13.51 63.18 420 9.9779

X Y XY X2 Y2

1 0.2 0.2 1 0.04

1 0.31 0.31 1 0.0961

1 0.18 0.18 1 0.0324

2 0.27 0.54 4 0.0729

2 0.38 0.76 4 0.1444

2 0.46 0.92 4 0.2116

3 0.9 2.7 9 0.81

3 0.58 1.74 9 0.3364

3 0.45 1.35 9 0.2025

4 0.84 3.36 16 0.7056

4 0.74 2.96 16 0.5476

4 0.68 2.72 16 0.4624

5 0.85 4.25 25 0.7225

5 0.78 3.9 25 0.6084

5 0.73 3.65 25 0.5329

6 0.88 5.28 36 0.7744

6 0.82 4.92 36 0.6724

6 0.78 4.68 36 0.6084

7 0.92 6.44 49 0.8464

7 0.85 5.95 49 0.7225

7 0.91 6.37 49 0.8281

84 13.51 63.18 420 9.9779Σ

n = 21

SSx = x2 - (x)2/n = 420 - 336 = 84

SP = xy - x y/n = 63.18 – 54.04 = 9.14

b=SP/SSx, so b = 9.14/84 = 0.109

a=(1/n)(y-bx), so a = (1/21)(13.51-9.156) = 0.207

So, Y= 0.109x + 0.207

X Y XY X2 Y2

84 13.51 63.18 420 9.9779Σ

Anthony Greene 42

Why Is It Called Regression?

• For low correlations, the predicted value is close to the mean

• For zero correlations the prediction is the mean• Only for perfect correlations R2= 1.0 do the

predicted scores show as much variation as the actual scores

• Since perfect correlations are rare, we say that the predicted scores show regression towards the mean

Anthony Greene1 Regression Using Correlation To Make Predictions

Documents

Technology, Media & Telecommunications Predictions 2016docs.media.bitpipe.com/.../Deloitte-TMT-predictions-2016.pdf · 2016-01-22 · Technology, Media & Telecommunications Predictions

Anthony J Greene1 The Role of Experience 1.Perceptual Development 2.Effects of Learning and Cognition 3.Development Vs Hardwiring

Case Study Report Software Cradle Co., Ltd. - CFDmesh. As their analysis results obtained in scSTREAM showed an excellent correlation with predictions, the unit passed quali˚ cation

Fatigue Predictions using Statistical Inference within the ...dutiosb.twi.tudelft.nl/~meulen/Fatigue Predictions eindversie.pdf · Fatigue Predictions using Statistical Inference

Chapter 10 Correlation and Regression. 10.2 - Correlation Definitions: Correlation: Linear correlation coefficient (r) :

Anthony J Greene1 AUDITORY PATHWAYS & HEARING. Anthony J Greene2

Anthony Greene1 Advanced ANOVA 2-Way ANOVA Complex Factorial Designs I.The Factorial Design II.Partitioning The Variance For Multiple Effects III.Independent

Modeling Ordered Choicespeople.stern.nyu.edu/.../Readings/OrderedChoiceSurvey.pdf · 2009. 1. 8. · Modeling Ordered Choices William H. Greene1 David A. Hensher2 January, 2009 1Department

Making Predictions: Regression Line Correlation Least ... · Least-Squares Regression Interpreting Correlation and ... Equation of the Least-Squares Regression Line From the data

CONFERENCE PROCEEDINGS · analysis to the stock market to enhance the durability of predictions. This method of correlation analysis is a combination of cross correlation, auto correlation

3. Multiple Correlation 1. Perfect Correlation 2. High Degree of Correlation 3. Moderate Degree of Correlation 4. Low Degree of Correlation 5

Linear Models and Scatter Plots 2015. Objectives Interpret correlation Use a graphing calculator to find linear models and make predictions about data

Library Budget Predictions for 2015 public · E-Journal Budgets predictions • •

Anthony J Greene1 ANOVA: Analysis of Variance 1-way ANOVA

Eamonn Healyhealy.create.stedwards.edu/Chemistry/CHEM4320/PericyclicRxn.pdf · correlation diagrams we have been using are inadequate excited stales and the predictions are less ill

Anthony Greene1 Repeated Measures ANOVA I.Between Subjects Variance Is Partitioned From the Error Variance II.Conceptual Approach III.Using The Source

Review: Novel Physics of Gases Near Carbon …silvinagatica.com/edu/Publications_files/review_jltp.pdfgraphs distinguish individual balls means that the predictions (e.g. correlation

1 Learning computation – making predictions – choosing actions – acquiring episodes – statistics algorithm – gradient ascent (eg of the likelihood) – correlation

Predictions, Hopes and Aspirations for U.S. Healthcare in 2015€¦ · • The real predictions for 2015 • The aspirational predictions for 2015 • Poll questions for your predictions

Weak positive correlation Weak negative correlation Strong positive correlation Strong negative correlation Q. Describe the correlation