Fixing problems with the model Transforming the data so that the simple linear regression model is...

Fixing problems with the model

Transforming the data so that the simple linear regression model is

okay for the transformed data.

Options for fixing problems with the model

• Abandon simple linear regression model and find a more appropriate – but typically more complex – model.

• Transform the data so that the simple linear regression model works for the transformed data.

Abandoning the model

• If not linear: try a different function, like a quadratic (Ch. 7) or an exponential function (Ch. 13).

• If unequal error variances: use weighted least squares (Ch. 10).

• If error terms are not independent: try fitting a time series model (Ch. 12).

• If important predictor variables omitted: try fitting a multiple regression model (Ch. 6).

• If outlier: use robust estimation procedure (Ch. 10).

Choices for transforming the data

• Transform X values only.

• Transform Y values only.

• Transform both the X and the Y values.

Transforming the X values only

• Appropriate when non-linearity is the only problem – normality and equal variance okay – with the model.

• Transforming the Y values would likely change the well-behaved error terms into badly-behaved error terms.

Memory retention

time prop1 0.845 0.7115 0.6130 0.5660 0.54120 0.47240 0.45480 0.38720 0.361440 0.262880 0.205760 0.1610080 0.08

• Subjects asked to memorize a list of disconnected items. Asked to recall them at various times up to a week later

• Predictor time = time, in minutes, since initially memorized the list.

• Response prop = proportion of items recalled correctly.

Example 1

Fitted line plot

10000 5000 0

S = 0.152284 R-Sq = 57.1 % R-Sq(adj) = 53.2 %

prop = 0.525870 - 0.0000557 time

Regression Plot

Example 1

Residual vs. fits plot

0.50.40.30.20.10.0

Fitted Value

Residuals Versus the Fitted Values(response is prop)

Example 1

Normal probability plot

P-Value (approx): > 0.1000R: 0.9751W-test for Normality

N: 13StDev: 0.145801Average: -0.0000000

0.30.20.10.0-0.1-0.2

Normal Probability Plot

Example 1

Transform the X values

time prop log10_time1 0.84 0.000005 0.71 0.6989715 0.61 1.1760930 0.56 1.4771260 0.54 1.77815120 0.47 2.07918240 0.45 2.38021480 0.38 2.68124720 0.36 2.857331440 0.26 3.158362880 0.20 3.459395760 0.16 3.7604210080 0.08 4.00346

Change (“transform”) the predictor time to log10(time).

Example 1

Fitted line plot using transformed X values

0 1 2 3 4

log10time

prop = 0.846415 - 0.182427 log10timeS = 0.0233881 R-Sq = 99.0 % R-Sq(adj) = 98.9 %

Regression Plot

Example 1

Residuals vs. fits plot using transformed X values

0.90.80.70.60.50.40.30.20.1

Fitted Value

Residuals Versus the Fitted Values(response is prop)

Example 1

Normal probability plotusing transformed X values

N: 13StDev: 0.0223924Average: -0.0000000

0.030.00-0.03

Example 1

Predicting new proportion

Estimated regression function:

timeY 10log182.0846.0ˆ

Therefore, we predict the proportion of words recalled after 1000 minutes is:

30.03182.0846.0ˆ

1000log182.0846.0ˆ10

Example 1

Predicting new proportion

Example 1

Predicted Values for New Observations

New Fit SE Fit 95.0% CI 95.0% PI1 0.299 0.00765 (0.282, 0.316) (0.245, 0.353)

Values of Predictors for New Observations

New Obs log10tim1 3.00

We can be 95% confident that a person will recall between 24.5% and 35.3% of the words after 1000 minutes.

Transforming the Y values only

• Appropriate when non-normality and/or unequal variances are the problems.

• The transformation on Y may also help to “straighten out” a curved relationship.

Gestation time and birth weight for mammals

Mammal Birthwgt GestationGoat 2.75 155Sheep 4.00 175Deer 0.48 190Porcupine 1.50 210Bear 0.37 213Hippo 50.00 243Horse 30.00 340Camel 40.00 380Zebra 40.00 390Giraffe 98.00 457Elephant 113.00 670

• Predictor Birthwgt = birth weight, in kg, of mammal.

• Response Gestation = number of days until birth

Example 2

Fitted line plot

0 50 100

Birthwgt

Gestation = 187.084 + 3.59137 BirthwgtS = 66.0943 R-Sq = 83.9 % R-Sq(adj) = 82.1 %

Regression Plot

Example 2

Residual vs. fits plot

600500400300200

Fitted Value

Residuals Versus the Fitted Values(response is Gestatio)

Example 2

N: 11StDev: 62.7025Average: -0.0000000

500-50-100

Example 2

Transform the Y values

Mammal Birthwgt Gestation log10GestGoat 2.75 155 2.19033Sheep 4.00 175 2.24304Deer 0.48 190 2.27875Porcupine 1.50 210 2.32222Bear 0.37 213 2.32838Hippo 50.00 243 2.38561Horse 30.00 340 2.53148Camel 40.00 380 2.57978Zebra 40.00 390 2.59106Giraffe 98.00 457 2.65992Elephant 113.00 670 2.82607

Change (“transform”) the response Gestation to log10(Gestation).

Example 2

Fitted line plot using transformed Y values

0 50 100

Birthwgt

log10Gest = 2.29256 + 0.0045211 BirthwgtS = 0.0939425 R-Sq = 80.3 % R-Sq(adj) = 78.1 %

Regression Plot

Example 2

Residual vs. fits plotusing transformed Y values

2.3 2.4 2.5 2.6 2.7 2.8

Fitted Value

Residuals Versus the Fitted Values(response is log10Gest)

Example 2

Normal probability plotusing transformed Y values

N: 11StDev: 0.0891217Average: -0.0000000

0.10.0-0.1

Example 2

Predicting new gestation Estimated regression function:

BirthwgtestG 0045.029.2)ˆ(log10

Therefore, since:

515.2500045.029.2)ˆ(log10 estG

we predict the gestation length of another mammal at 50 kgs to be:

3.3271010ˆ 515.2)ˆ(log10 estGestG

Example 2

Predicting new gestation

Example 2

Predicted Values for New Observations

New Fit SE Fit 95.0% CI 95.0% PI1 2.5186 0.0306 (2.4494, 2.5878) (2.2951, 2.7421)

Values of Predictors for New Observations

New Birthwgt1 50.0

3.19710 2951.2

2.55210 7421.2

We can be 95% confident that the gestation length for a new mammal at 50 kgs will be between 197.3 and 552.2 days.

Transforming both the X and Y values

• Appropriate when the error terms are not normal, have unequal variances, and the function is not linear.

• Transforming the Y values corrects the problems with the error terms (and may help the non-linearity).

• Transforming the X values corrects the non-linearity.

Diameter (inches) and volume (cu. ft.) of 70 shortleaf pines

Example 3

5 15 25

Diameter

Volume = -41.5681 + 6.83672 DiameterS = 9.87485 R-Sq = 89.3 % R-Sq(adj) = 89.1 %

Regression Plot

Residuals vs. fits plot

Example 3

100500

Fitted Value

Residuals Versus the Fitted Values(response is Volume)

Example 3

P-Value (approx): < 0.0100R: 0.9409W-test for Normality

N: 70StDev: 1.02852Average: 0.0085024

543210-1-2

Transform the Y values onlyDiameter Volume logVol 4.4 2.0 0.69315 4.6 2.2 0.78846 5.0 3.0 1.09861 5.1 4.3 1.45862 5.1 3.0 1.09861 5.2 2.9 1.06471 5.2 3.5 1.25276 5.5 3.4 1.22378 5.5 5.0 1.60944 5.6 7.2 1.97408 5.9 6.4 1.85630 5.9 5.6 1.72277 7.5 7.7 2.04122 7.6 10.3 2.33214… and so on …

Transform response volume to loge(volume)

Example 3

Fitted line plotusing transformed Y values

5 15 25

Diameter

logVol = 0.451703 + 0.239531 DiameterS = 0.322919 R-Sq = 90.5 % R-Sq(adj) = 90.4 %

Regression Plot

Example 3

Residuals vs. fits plotusing transformed Y values

654321

Fitted Value

Residuals Versus the Fitted Values(response is logVol)

Example 3

Normal probability plotusing transformed Y values

P-Value (approx): < 0.0100R: 0.9610W-test for Normality

N: 70StDev: 1.01888Average: -0.0077969

10-1-2-3

Example 3

Transform both the X and Y valuesDiameter Volume logDiam logVol 4.4 2.0 1.48160 0.69315 4.6 2.2 1.52606 0.78846 5.0 3.0 1.60944 1.09861 5.1 4.3 1.62924 1.45862 5.1 3.0 1.62924 1.09861 5.2 2.9 1.64866 1.06471 5.2 3.5 1.64866 1.25276 5.5 3.4 1.70475 1.22378 5.5 5.0 1.70475 1.60944 5.6 7.2 1.72277 1.97408 5.9 6.4 1.77495 1.85630 5.9 5.6 1.77495 1.72277 7.5 7.7 2.01490 2.04122 7.6 10.3 2.02815 2.33214… and so on …

Transform predictor diameter to

loge(diameter)

Transform response volume to loge(volume)

Example 3

Fitted line plotusing transformed X and Y values

Example 3

1.5 2.0 2.5 3.0

logDiam

logVol = -2.87179 + 2.56442 logDiamS = 0.170263 R-Sq = 97.4 % R-Sq(adj) = 97.3 %

Regression Plot

Residual plot using transformed X and Y values

Example 3

Fitted Value

Residuals Versus the Fitted Values(response is logVol)

Normal probability plot using transformed X and Y values

Example 3

N: 70StDev: 1.00930Average: -0.0028401

210-1-2

Transformation strategies

Effects of transformations

• Transforming the Y values corrects the problems with the error terms – and may simultaneously help non-linearity.

• Transforming the X values can only correct non-linearity.

• If form of the relationship between x and y is known, then it may be possible to find a linearizing transformation analytically.

• Fitting a regression model empirically generally requires trial and error – try different transformations to see which does best.

Finding a linearizing transformation analytically

Knowing functional relationship is of the power form

If the relationship between x and y is of the power form:

taking log of both sides transforms it into a linear form:

xy eee logloglog

Knowing functional relationship is of the exponential form

If the relationship between x and y is of exponential form:

taking log of both sides transforms it into a linear form:

xy ee loglog

Finding a transformation by trial and error

Family of power transformations

The most common transformation involves transforming the response by taking it to some power λ. That is:

yy Most commonly, for interpretation reasons, λ is a number between -1 and 2, such as -1, -0.5, 0, 0.5, (1), 1.5, and 2.

When λ = 0, the transformation is taken to be the log transformation. That is:

yy elog

Effect of loge transformation

10005000

Natural log function

Effect of loge transformation

543210

Natural log function

Some guidelines for specifying λ

• To make smaller values more spread out, use a smaller λ.

• To make larger values more spread out, use a larger λ.

Possible transformations

Variance stabilizing transformations

Common variance stabilizing transformations

If the response is a Poisson count, so that the variance is proportional to the mean, use the square root transformation:

yyy 21

If the response is a binomial proportion, use the arcsine square root transformation:

pp ˆsinˆ 1

Common variance stabilizing transformations

If the variance is proportional to the mean squared, use the natural log transformation:

yy elog

If the variance is proportional to the mean to the fourth power, use the reciprocal transformation:

Transforming data in Minitab

• Select Calc >> Calculator …• In box labeled “Store result in variable,”, tell

Minitab in which column (variable) you want the transformed data stored.

• Type (input) the expression for the desired transformation in the box labeled Expression. (Use the available functions.)

• Select OK. The data will appear in the column of the worksheet that you specified.

Fixing problems with the model Transforming the data so that the simple linear regression model is...

Documents

GymSports NZ Incorporated Anti-Match-Fixing and Sports ... · GymSports NZ Incorporated Anti-Match-Fixing and ... persons as required by Schedule 1 of Sport NZ’s model NSO Anti-Match-Fixing

I'm okay, you're okay: Navigating Challenging Content Strategy Conversations

1 2 OKAY to start payment process OKAY Downtown Parking 101

Transcription ICANN61 San Juan New gTLD Subsequent ... · fix whatever it needs fixing. Okay, I’m told we’re good to go. ((Crosstalk)) Cheryl Langdon-Orr: We’re good with the

Everything Will Be Okay

Okay, we have a

11 MR. TEITELBAUM: Okay

Fixing security by fixing software development

Guide to fixing the classic Westinghouse Model 556A AM tabletop tube radio

PDF - Okay Plus Group

Fixing the Curve: Improving Major League Baseball Pitch ... · Fixing the Curve: Improving Major League Baseball Pitch Classification with Model-Based Clustering December 17th, 2018

okay - hiltner.english.ucsb.edu

Generative model and fixing guidelines for modular

Let’s Play tag!. Okay!

new folder yeah okay

Fly, Okay? Illustration

So Not Okay

Statements 18: Okay Mountain

Fixing the Broken Storage Model: A Storage Hypervisor At Last

Okay Computer