View
217
Download
1
Category
Tags:
Preview:
Citation preview
Chapter 10Re-expressing Data: Get it
Straight!!
*Straightening Relationships*Goals of Re-Expression
*Ladder of Powers
Straightening Relationships
To use a linear model, the scatterplot must be straight enough• Check scatterplot AND residual plot
We have the ability to straighten data so that we can use a linear model for scatterplots that do not satisfy the straight enough condition
MPG and Weight
A Hummer weighs about 6000 pounds. What is the predicted MPG?
MPG vs Gallons/100 Miles Change 25 mpg into gallons/100
miles
Scatterplot: gal/100 miles and weight
Revisit the Hummer What is the predicted fuel efficiency
for a Hummer? (6000 lbs)
The new model predicts that a 6000 lbs Hummer would get 9.7 gallons/100 miles
Convert that back into MPG
Not Sold?? You regularly use re-expression
What units do you use to talk about how fast you went on a bike?
What units do you use to talk about how fast you run?
Goals of Re-Expressing 1) Make the distribution of a variable
more symmetric• easier to compare centers• if its unimodal you could perhaps use the
Normal Model
Goals Make the spread of several groups
more alike• groups with similar spreads are easier to
compare• centers may be different
Goals Make the form of a scatterplot more
nearly linear• linear models are easier to describe
Goals Make the scatter in a scatterplot
spread out evenly rather than following a fan shape• having an even scatter is a condition of
many methods in Stats (we will see later)
Ladder of Powers Use to systematically re-express data
The farther you move from 1 (original data) the greater the effect on the data
Certain re-expressions work better for different types of data
Ladder of Powers Power
Name Comment
2 y2 unimodal distributions that are skewed to the left
1 y data that can be both positive and negative and continue without bond; less likely for re-expression
1/2 counted data
0 logy measurements that can NOT be negative; values that grow by percentages (salaries, populations); if the data has zeros add a small constant to each value
-1/2 -1/ uncommon; changing the sign to take the negative of the reciprocal square root preserves the direction
-1 -1/y ratios of two quantities (mpg); change the sign if you want to preserve the direction; if there are zeros, add a small constant to all values
Plan B: Attack of the Logs Try taking the logs of BOTH the x
values and the y values.
Model Name
x-axis y-axis Comment
Exponential x log(y) “0” power from the ladder
Logarithmic log (x) y wide range of x-values, scatterplot descending rapidly at the left and trailing to the right
Power log (x) log (y) when you are in between powers on the ladder
Example Let’s try to predict the shutter speed
based off the f/stop of a cameras lens.
Enter data
Shutter Speed
1/1000
1/500
1/250
1/125 1/60 1/30
1/15
1/8
f/stop 2.8 4 5.6 8 11 16 22 32
What Can Go Wrong? Don’t expect your model to be
perfect Don’t choose a model based on R2
alone• always check the residual plot
Watch out for scatterplots that change direction
Watch out for negative values Rescale years Don’t stray too far from the ladder
Recommended