3.2 Power Point

Embed Size (px)

Citation preview

  • 8/7/2019 3.2 Power Point

    1/35

    Getting a Line onGetting a Line onthe Patternthe Pattern

  • 8/7/2019 3.2 Power Point

    2/35

    Summarizing the PatternSummarizing the Pattern

    Linear relationships between twoLinear relationships between twoquantitative variables are quite common.quantitative variables are quite common.

    Correlation (Correlation (rr) measures the direction and) measures the direction andstrength of a relationship.strength of a relationship.

    When a scatterplot shows a linearWhen a scatterplot shows a linearrelationship, we would like to summarizerelationship, we would like to summarizethe overall pattern by drawing a line on thethe overall pattern by drawing a line on thescatterplot. This is very similar to using ascatterplot. This is very similar to using adensity curve to approximate the shape ofdensity curve to approximate the shape of

    a histogram.a histogram.

  • 8/7/2019 3.2 Power Point

    3/35

    Special SituationsSpecial Situations

    A regression line summarizes theA regression line summarizes therelationship between two variables, butrelationship between two variables, but

    only in a specific setting:only in a specific setting: when one of thewhen one of thevariables helps explain or predict thevariables helps explain or predict theotherother..

    Regression, unlike correlation, requiresRegression, unlike correlation, requiresthat we have anthat we have an explanatory variableexplanatory variableand a response variable.and a response variable.

  • 8/7/2019 3.2 Power Point

    4/35

  • 8/7/2019 3.2 Power Point

    5/35

    InterceptIntercept--Slope Form???Slope Form???

    In algebra we useIn algebra we use y = mx + by = mx + b as theas the

    general form of a line.general form of a line.

    In statistics, we use:In statistics, we use: y = a + bxy = a + bx

    yy--interceptintercept slopeslope

  • 8/7/2019 3.2 Power Point

    6/35

    Interpretations are Key!Interpretations are Key!

    The slope of a regression line is anThe slope of a regression line is animportant numerical description of theimportant numerical description of the

    relationship between two variables.relationship between two variables.

    Although the yAlthough the y--intercept is crucial inintercept is crucial in

    graphing the regression line, it is onlygraphing the regression line, it is onlystatistically meaningful whenstatistically meaningful whenxxcan takecan takevalues close to zero.values close to zero.

  • 8/7/2019 3.2 Power Point

    7/35

    Pinching PagesPinching Pages

    An example:An example:

    # of pages# of pages thickness (in mm)thickness (in mm)

    co er + 50co er + 50 6 mm6 mm

    co er + 100co er + 100 11 mm11 mm

    co er + 150co er + 150 15 mm15 mm

    co er + 200co er + 200 18 mm18 mm

    co er + 250co er + 250 22 mm22 mm

  • 8/7/2019 3.2 Power Point

    8/35

    Does the plot look linear?Should it?Does the plot look linear?Should it?

    Find the slope and y-intercept.

    What does the slope tell you?

    What does the y-intercept tell you?

    0

    5

    10

    15

    20

    pages

    0 40 80 120 160 200 240

    Collection 1 Scatter Plot

    thickness_in_mm = 0.0780pa es 2.7; r2 = 0.99

    0

    5

    10

    15

    20

    pages

    0 40 80 120 160 200 240

    Collection 1 Scatter Plot

  • 8/7/2019 3.2 Power Point

    9/35

    Interpret the slope of each line:Interpret the slope of each line:

    Regularunleaded gasoline costs $3.Regularunleaded gasoline costs $3.

    per gallon.per gallon.

    Your car averages 32 miles per gallon.Your car averages 32 miles per gallon.

  • 8/7/2019 3.2 Power Point

    10/35

    Radiologists IncomeRadiologists Income

    Estimate the slope.Estimate the slope.

    Estimate theEstimate the yy--intercept.intercept.

    Write the equationWrite the equationof the line.of the line.

    In statistics, though,In statistics, though,the equation isthe equation is

    Use the equation toUse the equation topredict apredict aradiologists incomeradiologists income

    in 1 .in 1 .

  • 8/7/2019 3.2 Power Point

    11/35

    Careful With Predictions!Careful With Predictions!

    Making the assumption that the linear trendMaking the assumption that the linear trendcontinues can be very risky. This type ofcontinues can be very risky. This type ofprediction is calledprediction is called extrapolationextrapolation making amaking aprediction when the value ofprediction when the value ofxxfalls outside thefalls outside therange of actual data. Such predictions are oftenrange of actual data. Such predictions are oftennot accurate.not accurate.

    InterpolationInterpolation making a prediction when themaking a prediction when thevalue ofvalue ofxxfalls inside the range of the datafalls inside the range of the data isissafer!safer!

  • 8/7/2019 3.2 Power Point

    12/35

    Suppose you know the value ofSuppose you know the value ofxxand use a line toand use a line topredict the corresponding value ofpredict the corresponding value ofyy. You know. You knowyour prediction foryour prediction foryywont be exact, but you hopewont be exact, but you hopethat the error will be small.that the error will be small.

    The prediction error is the difference between theThe prediction error is the difference between theobserved value ofobserved value ofyy((yy) and the predicted value of) and the predicted value ofyy( ).( ).

    Youusually dont know what that error is, or elseYouusually dont know what that error is, or elseyou wouldnt need to use the line to predict theyou wouldnt need to use the line to predict thevalue ofvalue ofyy. You do, however, know the errors for. You do, however, know the errors for

    the points used to construct the line. These arethe points used to construct the line. These are

    y

    Residuals

  • 8/7/2019 3.2 Power Point

    13/35

    Residual:Residual:

    observed value ofobserved value ofyy predicted value ofpredicted value of

    yy--y

  • 8/7/2019 3.2 Power Point

    14/35

    Finding ResidualsFinding ResidualsDisplay 3.2 shows the mean net income for doctors who wereDisplay 3.2 shows the mean net income for doctors who wereboardboard--certified in family practice for the years 1 82certified in family practice for the years 1 82-- 2. The2. Theequation of the fitted line is:equation of the fitted line is:

    wherewherexxis the number of years after 1 andis the number of years after 1 and is the incomeis the income

    in thousands of dollars. Inin thousands of dollars. In1 88, the mean net income1 88, the mean net incomewas $ 4, . What is thewas $ 4, . What is theresidual for the year 1 88?residual for the year 1 88?

    x46.455.298 !

    y

  • 8/7/2019 3.2 Power Point

    15/35

    DiscussionDiscussion

    1.1. Interpret theI

    nterpret the yy--intercept of the regression line in theintercept of the regression line in theprevious example. Does this make sense?previous example. Does this make sense?

    2.2. If a residual is large and negative, where is the pointIf a residual is large and negative, where is the pointlocated with respect to the line?located with respect to the line?

    3.3. What does it mean if the residual is zero?What does it mean if the residual is zero?

    4.4. If someone said that they had fit a line to a set of dataIf someone said that they had fit a line to a set of datapoints and all their residuals were positive, what wouldpoints and all their residuals were positive, what wouldyou say to them?you say to them?

  • 8/7/2019 3.2 Power Point

    16/35

  • 8/7/2019 3.2 Power Point

    17/35

    Properties of the LSR LineProperties of the LSR Line

    The fact that the sum of the squared errorsThe fact that the sum of the squared errors(SSE) is as small as possible means that(SSE) is as small as possible means thatfor this line, these properties also hold:for this line, these properties also hold:

    The sum (and mean) of all the residualsThe sum (and mean) of all the residualsis .is .

    The variation in the residuals is as smallThe variation in the residuals is as smallas possible.as possible.

    The line passes through .The line passes through .

    The line has slope whereThe line has slope where1b 1

    x

    sb r

    s!

    ),( yx

  • 8/7/2019 3.2 Power Point

    18/35

    Here are the fat and calorie contents, respectively, forHere are the fat and calorie contents, respectively, forounces of three kinds of pizza:ounces of three kinds of pizza:

    ( , 3 ) (11, 3 ) (13,31 )( , 3 ) (11, 3 ) (13,31 )

    a.a. Compute the LSR line.Compute the LSR line.

    b.b. Interpret the slope in context.Interpret the slope in context.

    c.c. Interpret theInterpret the yy--intercept in context.intercept in context.

  • 8/7/2019 3.2 Power Point

    19/35

    Reading Computer OutputReading Computer Output

    All we need to know for now

    slopey-intercept

  • 8/7/2019 3.2 Power Point

    20/35

    More Computer OutputMore Computer Output

  • 8/7/2019 3.2 Power Point

    21/35

    One MoreOne More

  • 8/7/2019 3.2 Power Point

    22/35

    Does Fidgeting Keep YouSlim?Does Fidgeting Keep YouSlim?

    Obesity is a growing problem around the world.Obesity is a growing problem around the world.Here is an account of a study that sheds someHere is an account of a study that sheds somelight on gaining weight. Some people dont gainlight on gaining weight. Some people dont gain

    weight even when they overeat. Perhapsweight even when they overeat. Perhapsfidgeting and other nonfidgeting and other non--exercise activity (NEA)exercise activity (NEA)explains whyexplains why some people may spontaneouslysome people may spontaneouslyincrease NEA when fed more. Researchersincrease NEA when fed more. Researchersdeliberately overfed 1 young healthy adults fordeliberately overfed 1 young healthy adults for

    8 weeks. They measured fat gain (in kg) and, as8 weeks. They measured fat gain (in kg) and, asan explanatory variable, change in energy usean explanatory variable, change in energy use(in calories) from activity other than deliberate(in calories) from activity other than deliberateexerciseexercise fidgeting, daily living, and the like.fidgeting, daily living, and the like.

  • 8/7/2019 3.2 Power Point

    23/35

    Data:Data:

    NEANEAchangechange

    (cal):(cal):

    -- 44 -- 77 --22 1313 143143 1 11 1 2424 33

    Fat gainFat gain(kg):(kg):

    4.24.2 3.3. 3.73.7 2.72.7 3.23.2 3.3. 2.42.4 1.31.3

    NEANEAchangechange(cal):(cal):

    3 23 2 473473 4848 33 7171 88 22

    Fat gainFat gain

    (kg):(kg):

    3.83.8 1.71.7 1.1. 2.22.2 1.1. .4.4 2.32.3 1.11.1

  • 8/7/2019 3.2 Power Point

    24/35

  • 8/7/2019 3.2 Power Point

    25/35

    Scatterplot and

    Residual Plot

  • 8/7/2019 3.2 Power Point

    26/35

    What is it?What is it?

  • 8/7/2019 3.2 Power Point

    27/35

  • 8/7/2019 3.2 Power Point

    28/35

    One Thing to Look For:One Thing to Look For:

    The residual plot should show no obviousThe residual plot should show no obviouspattern (such as a a curve.)pattern (such as a a curve.)

  • 8/7/2019 3.2 Power Point

    29/35

    Another Thing to Look For:Another Thing to Look For:

    The residual plot should not show anThe residual plot should not show anincreasing (or decreasing) spread about theincreasing (or decreasing) spread about theline asline asxxincreases (fanincreases (fan--shaped.) This wouldshaped.) This wouldindicate that the prediction ofindicate that the prediction ofyywould bewould beless accurate for large (or small) values ofless accurate for large (or small) values ofxx..

  • 8/7/2019 3.2 Power Point

    30/35

    How Much Error Are YouHow Much Error Are YouComfortable With?Comfortable With?

    Almost all of theAlmost all of theresiduals areresiduals arebetweenbetween -- .7.7

    and .7. Forand .7. Forthesetheseindividuals, theindividuals, thepredicted fatpredicted fat

    gain from thegain from theLSR line isLSR line iswithin .7 kg ofwithin .7 kg oftheir actual fattheir actual fat

    gain during thegain during the

  • 8/7/2019 3.2 Power Point

    31/35

    So?So?

    An error of .7 kg may sound pretty good,An error of .7 kg may sound pretty good,but if you consider that the subjects onlybut if you consider that the subjects onlygained between .4 and 4.2 kg, .7 is agained between .4 and 4.2 kg, .7 is a

    relatively large error compared to therelatively large error compared to theactual fat gain for an individual.actual fat gain for an individual.

    The largest residual, 1. 4 corresponds toThe largest residual, 1. 4 corresponds toa prediction error of 1. 4 kg. This subjectsa prediction error of 1. 4 kg. This subjectsactual fat gain was 3.8 kg, but theactual fat gain was 3.8 kg, but theregression line predicted a fat gain of onlyregression line predicted a fat gain of only2.1 kg. Thats a pretty large error,2.1 kg. Thats a pretty large error,especially from the patients perspective!especially from the patients perspective!

  • 8/7/2019 3.2 Power Point

    32/35

    How Big is Too Big?How Big is Too Big?

    A commonly used measure of typicalA commonly used measure of typicalprediction error is the standard deviationprediction error is the standard deviationof the residuals:of the residuals:

    For the NEA and fat gain data,For the NEA and fat gain data, ss .74 ,.74 ,

    so researchers would need to decide ifso researchers would need to decide ifthey are comfortable using this linearthey are comfortable using this linearmodel to make predictions that might bemodel to make predictions that might beconsistently off by .74 kg.consistently off by .74 kg.

    2

    2

    residualss n!

  • 8/7/2019 3.2 Power Point

    33/35

    The Coefficient ofDeterminationThe Coefficient ofDetermination

    A residual plot is a graphical way of seeingA residual plot is a graphical way of seeinghow well a linearmodel fits the data, buthow well a linearmodel fits the data, butthere is a numerical tool too. Thethere is a numerical tool too. Thecoefficient of determination (coefficient of determination (rr22) tells us) tells ushow well the LSR line predicts values ofhow well the LSR line predicts values ofthe response variablethe response variable yy..

  • 8/7/2019 3.2 Power Point

    34/35

    y yy y

    y y

    What?!?

    LSR line

  • 8/7/2019 3.2 Power Point

    35/35

    What is Explained?What is Explained?

    For the fat gain and NEA data,

    We say . % of the variation in fat gain isexplained by the LSR line relating fat gainand NEA.

    2 19.4575 7.6630.606

    19.4575

    SST SSEr

    SST

    ! ! !