26
Scatterplot fitting a line Fitting a Line to a Set of Points x (independent) y ( d e p e n d e n t ) Least squares method Minimize the error term e

Fitting a Line to a Set of Points

Embed Size (px)

DESCRIPTION

Fitting a Line to a Set of Points. Scatterplot  fitting a line. Least squares method Minimize the error term e. y (dependent). x (independent). n. S (y - ŷ) 2. n. S (y i - a - bx i ) 2. min a,b. min a,b. =. i = 1. i = 1. Minimizing the SSE ( Sum of Squared Errors ). n. - PowerPoint PPT Presentation

Citation preview

Page 1: Fitting a Line to a Set of Points

• Scatterplot fitting a line

Fitting a Line to a Set of Points

x (independent)

y (dependent)

• Least squares method

• Minimize the error term e

Page 2: Fitting a Line to a Set of Points

Minimizing the SSE(Sum of Squared Errors)

(y - ŷ)2

i = 1

n

mina,b

n

(yi - a - bxi)2

i = 1

mina,b

=

Page 3: Fitting a Line to a Set of Points

• Least squares method

Finding Regression Coefficients

(xi - x) (yi - y)i = 1

n

b =

(xi - x)2

i = 1

n

a = y - bx

Page 4: Fitting a Line to a Set of Points

Coefficient of Determination (r2)

x

y

(a)

x

y

(b)

• Numerical measure to express the strength of the

relationship

coefficient of determination (r2)

Page 5: Fitting a Line to a Set of Points

Coefficient of Determination (r2)

yy

y

Page 6: Fitting a Line to a Set of Points

Coefficient of Determination (r2)

• Regression sum of squares (SSR)

SSR = (ŷi - y)2

i = 1

n

SST = (yi - y)2

i = 1

n

yy

y

• Total sum of squares (SST)

• Coefficient of determination (R2)

r2 =SSRSST

Page 7: Fitting a Line to a Set of Points

Partitioning the Total Sum of Squares

SST = (yi - y)2

i = 1

n

+ (yi - ŷ)2

i = 1

n

= (ŷi - y)2

i = 1

n

SSTSSE

SSR

yy

ySST = SSR + SSE

Page 8: Fitting a Line to a Set of Points

Regression ANOVA Table

(yi - y)2

i = 1

n

(yi - ŷ)2

i = 1

n

(ŷi - y)2

i = 1

nComponent

Regression(SSR)

Error(SSE)

Total(SST)

Sum of Squares df

1

n - 2

n - 1

Mean Square

SSR / 1

SSE / (n - 2)

F

MSSRMSSE

Page 9: Fitting a Line to a Set of Points

Regression Example

Glyndon Field Sampled Soil Moisture versus TVDI from a 3x3 kernel

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

TVDI (3x3 kernel)

Vo

lum

etri

c S

oil

Mo

istu

re

TVDISoil

Moisture

0.274 0.4140.542 0.3590.419 0.3960.286 0.4580.374 0.3500.489 0.3570.623 0.2550.506 0.1890.768 0.1710.725 0.119

Page 10: Fitting a Line to a Set of Points

Regression Example

Excel

Page 11: Fitting a Line to a Set of Points

Regression ANOVA table

Sum of Degrees of MeanComponent Squares Freedom Square F-

Test

Regression

(SSR)

Error

(SSE)

Total

(SST)

Page 12: Fitting a Line to a Set of Points

A Significance Test for r2

Ftest =r2 (n - 2)

1 - r2

F-distribution with degrees of freedom:

df = (1, n - 2)

=MSSRMSSE

Page 13: Fitting a Line to a Set of Points

Significance of r2 Example

Page 14: Fitting a Line to a Set of Points

Assumptions of Regression

1. The relationship is linear

• y = + x +

• Not linear (scatterplot) transform one or both of the variables

Page 15: Fitting a Line to a Set of Points

Assumptions of Regression

2. The errors have a mean of zero and a constant

variance

• i.e. the errors need to distributed evenly on either side

of the regression line

• The magnitude of their dispersion has to be

reasonably constant for all values of x

• Variation in the errors is larger for some values of x

than others a linear model is not appropriate

Page 16: Fitting a Line to a Set of Points

Assumptions of Regression

3. Residuals

• Independent

• No pattern in the distribution

• Pattern

the model is not effectively capturing some

systematic aspect of the relationship

Another factor cannot be accounted for by this

model

Page 17: Fitting a Line to a Set of Points

Assumptions of Regression

Page 18: Fitting a Line to a Set of Points

Significance Tests for Regression Parameters

• t-tests

significance of individual regression parameters

• Standard error of the estimate

also known as the standard deviation of the residuals

(se):

i = 1

n(yi - ŷ)2

(n - 2)se =

Page 19: Fitting a Line to a Set of Points

Significance Test for Slope (b)

• H0: = 0

se2

(n - 1) sx2

sb =

ttest =bsb

sb is the standard deviation of the slope parameter:

df = (n - 2)

Page 20: Fitting a Line to a Set of Points

Hypothesis Testing - Significance Test for Regression Slope Example

Page 21: Fitting a Line to a Set of Points

Significance Test for Regression Intercept

ttest =asa

where sa is the standard deviation of the intercept:

and degrees of freedom = (n - 2)

se2

n(xi - x)2sa =

xi2

Page 22: Fitting a Line to a Set of Points

Hypothesis Testing - Significance Test for Regression Intercept Example

Page 23: Fitting a Line to a Set of Points

Simple Linear Regression in Excel

• Built-in functions

•SLOPE(array1, array2)

•INTERCEPT(array1, array2)

• Data Analysis Tool

Page 24: Fitting a Line to a Set of Points

S-Plus

TVDI (x)0.2740.5420.4190.2860.3740.4890.6230.5060.7680.725

Theta (y)0.4140.3590.3960.4580.3500.3570.2550.1890.1710.119

TVDI0.4130.2230.8110.5130.6550.3540.1980.7630.6710.424

Page 25: Fitting a Line to a Set of Points
Page 26: Fitting a Line to a Set of Points